Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: SIA2 services from the registry are not searchable. #450

Closed
bsipocz opened this issue Jun 15, 2023 · 23 comments
Closed

BUG: SIA2 services from the registry are not searchable. #450

bsipocz opened this issue Jun 15, 2023 · 23 comments

Comments

@bsipocz
Copy link
Member

bsipocz commented Jun 15, 2023

It was confirmed that this misbehaviour shows up in several heasarch SIA2 services so it's not an irsa related problem

import pyvo

from astropy.coordinates import SkyCoord
coords = SkyCoord('150.01d 2.2d', frame='icrs')

image_services = pyvo.regsearch(servicetype='sia2')
irsa_seip = [s for s in image_services if 'irsa' in s.ivoid and 'seip' in s.ivoid][0]

seip_results = irsa_seip.search((coords, 0.01))
---------------------------------------------------------------------------
E10                                       Traceback (most recent call last)
File ~/munka/devel/pyvo/pyvo/registry/regtap.py:747, in RegistryResource.search(self, *args, **keys)
    746 try:
--> 747     return self.service.search(*args, **keys)
    748 except ValueError:
    749     # I blindly assume the ValueError comes out of get_interface.
    750     # But then that's likely enough.

File ~/munka/devel/pyvo/pyvo/registry/regtap.py:715, in RegistryResource.service(self)
    714     return self._service
--> 715 self._service = self.get_service(None, True)
    716 return self._service

File ~/munka/devel/pyvo/pyvo/registry/regtap.py:701, in RegistryResource.get_service(self, service_type, lax)
    657 """
    658 return an appropriate DALService subclass for this resource that
    659 can be used to search the resource using service_type.
   (...)
    698     opening a web browser on the access URL.
    699 """
    700 return self.get_interface(service_type, lax, std_only=True
--> 701                           ).to_service()

File ~/munka/devel/pyvo/pyvo/registry/regtap.py:344, in Interface.to_service(self)
    341     raise ValueError("PyVO has no support for interfaces with"
    342                      f" standard id {self.standard_id}.")
--> 344 return service_class(self.access_url)

File ~/munka/devel/pyvo/pyvo/dal/sia2.py:182, in SIA2Service.__init__(self, baseurl, session)
    181 self.query_ep = None  # service query end point
--> 182 for cap in self.capabilities:
    183     # assumes that the access URL is the same regardless of the
    184     # authentication method except BasicAA which is not supported
    185     # in pyvo. So pick any access url as long as it's not
    186     if cap.standardid.lower() == SIA2_STANDARD_ID.lower():

File ~/munka/devel/astropy/astropy/utils/decorators.py:837, in lazyproperty.__get__(self, obj, owner)
    836 if val is _NotFound:
--> 837     val = self.fget(obj)
    838     obj_dict[self._key] = val

File ~/munka/devel/pyvo/pyvo/dal/vosi.py:96, in CapabilityMixin.capabilities(self)
     94 @lazyproperty
     95 def capabilities(self):
---> 96     return vosi.parse_capabilities(self._capabilities().read)

File ~/munka/devel/pyvo/pyvo/io/vosi/endpoint.py:137, in parse_capabilities(source, pedantic, filename, _debug_python_based_parser)
    132 with iterparser.get_xml_iterator(
    133     source,
    134     _debug_python_based_parser=_debug_python_based_parser
    135 ) as iterator:
    136     return CapabilitiesFile(
--> 137         config=config, pos=(1, 1)).parse(iterator, config)

File ~/munka/devel/pyvo/pyvo/io/vosi/endpoint.py:346, in CapabilitiesFile.parse(self, iterator, config)
    345     else:
--> 346         vo_raise(E10, config=config, pos=pos)
    348 super().parse(iterator, config)

File ~/munka/devel/astropy/astropy/io/votable/exceptions.py:124, in vo_raise(exception_class, args, config, pos)
    123     config = {}
--> 124 raise exception_class(args, config, pos)

E10: None:3:15: E10: File does not appear to be a VOSICapabilities file

During handling of the above exception, another exception occurred:

DALServiceError                           Traceback (most recent call last)
Cell In [1], line 9
      6 image_services = pyvo.regsearch(servicetype='sia2')
      7 irsa_seip = [s for s in image_services if 'irsa' in s.ivoid and 'seip' in s.ivoid][0]
----> 9 seip_results = irsa_seip.search(coords)

File ~/munka/devel/pyvo/pyvo/registry/regtap.py:751, in RegistryResource.search(self, *args, **keys)
    747     return self.service.search(*args, **keys)
    748 except ValueError:
    749     # I blindly assume the ValueError comes out of get_interface.
    750     # But then that's likely enough.
--> 751     raise dalq.DALServiceError(
    752         f"Resource {self.ivoid} is not a searchable service")

DALServiceError: Resource ivo://irsa.ipac/spitzer/images/seip is not a searchable service

If this is of any help

In [4]: irsa_seip
Out[4]: ('ivo://irsa.ipac/spitzer/images/seip', 'vs:catalogservice', 'SEIP', 'Spitzer Enhanced Imaging Products', 'research', 'The Spitzer Science Center and IRSA have released a set of Enhanced Imaging Products (SEIP) from the Spitzer Heritage Archive. These include Super Mosaics (combining data from multiple programs where appropriate) and a Source List of photometry for compact sources. The primary requirement on the Source List is very high reliability -- with areal coverage, completeness, and limiting depth being secondary considerations. The SEIP include data from the four channels of IRAC (3.6, 4.5, 5.8, 8 microns) and the 24 micron channel of MIPS. The full set of products for the Spitzer cryogenic mission includes around 42 million sources.', 'https://irsa.ipac.caltech.edu/data/SPITZER/Enhanced/SEIP', 'SSC and IRSA', 'survey#archive', '', '', nan, 'infrared', ['https://irsa.ipac.caltech.edu/SIA?COLLECTION=spitzer_seip&'], ['ivo://ivoa.net/std/sia2'], ['vs:paramhttp'], ['std'])
@msdemlei
Copy link
Contributor

msdemlei commented Jun 16, 2023 via email

@bsipocz
Copy link
Member Author

bsipocz commented Jun 16, 2023

(I mention in passing that because for that particular service, IRSA
does not register their VOSI capabilities, it still would have
failed, but that then is really IRSA's fault).

@zoghbi-a sees the same issue for multiple heasarc services, so if multiple archives faulting on this, this maybe a shortcoming of the requirements. I don't know, haven't dived into the exact details but was surprised that things don't work (==> we need to make sure one of each service type is being tested in pyvo, knowing that something doesn't work is better than surprises I suppose)

@zoghbi-a
Copy link
Contributor

I can confirm that this is present in many services. This an example code:

ivoid = 'ivo://astron.nl/__system__/siap2/sitewide'
srv = pyvo.regsearch(ivoid=ivoid)[0]
srv.get_service('sia2')

These are examples of services producing the same error

'ivo://astron.nl/__system__/siap2/sitewide'
'ivo://au.csiro/casda/sia2'
'ivo://nasa.heasarc/suzamaster'
...

For the irsa case, if I put the capabilities url by hand (https://irsa.ipac.caltech.edu/SIA/capabilities?COLLECTION=spitzer_seip&) instead of the one being constructed from the url (https://irsa.ipac.caltech.edu/SIA?COLLECTION=spitzer_seip&/capabilities), the issue disappears.

@bsipocz
Copy link
Member Author

bsipocz commented Jun 16, 2023

Do we know whether this is an issue for any other tools out there?

If the capabilities are in fact not required for a service to be registered (a very widespread nature of this issue suggests this, too), then it look to me that (a2) could be a way ahead or within pyvo to insert it into the URL as Adbu suggests above.
Raising a user-facing warning at this point I don't think is useful. I'm sure archives would be willing to do to register the capabilities if that would be strictly required. If it's not a requirement, then OTOH, pyvo should just work ™️

@msdemlei
Copy link
Contributor

msdemlei commented Jun 19, 2023 via email

@msdemlei
Copy link
Contributor

msdemlei commented Jun 19, 2023 via email

@bsipocz
Copy link
Member Author

bsipocz commented Jun 20, 2023

cc @andamian

@andamian
Copy link
Contributor

Sorry for the late reply.

It looks to me that irsa_seip.search() is trying to instantiate the SIA2Service with an access URL instead of a base URL. If that's the case, we could add an additional argument to the constructor to skip parsing the capabilities when that argument is present. Or, if we teach our users to get the access urls directly from the registries then we can ignore the capabilities endpoint completely and deprecate baseulr in the constructor in favour of access_url. This model works fine for SIA2 (at least for now) but could be ambiguous for services with multiple endpoints.

As for the authentication, there was a time when we, at the CADC, had to have a different access points for Basic Authentication and the rest of auth methods. It's not the case anymore since we've stopped supporting that which means that direct access URLs should be more manageable now. As a side note, the entire pyvo.auth module is right now based on the capabilities endpoint as it aims at mapping the supported authentication mechanisms to the appropriate access URLs. It's probably due for an update although there no agreed solution right now.

@msdemlei
Copy link
Contributor

msdemlei commented Jun 27, 2023 via email

@bsipocz
Copy link
Member Author

bsipocz commented Jul 5, 2023

@andamian - do you have time on the fix for this or shall I? I would really like to have a new bugfix version out by the end of the month that fixes all these smallish SIA2 related bugs (basically all the bugs that we found except the 'image' regsearch, though if we have a solution/workaround in for that, too, that would be the most wonderful)

@bsipocz bsipocz added this to the v1.4.2 milestone Aug 2, 2023
@bsipocz
Copy link
Member Author

bsipocz commented Aug 2, 2023

OK, so this is the only blocker remaining for 1.4.2, so I added the milestone.

@msdemlei
Copy link
Contributor

msdemlei commented Aug 3, 2023 via email

@bsipocz
Copy link
Member Author

bsipocz commented Aug 3, 2023

This bug is when one does a 'sia2' servicetype registry search. So it's more than annoyance, basically renders the registry unusable for SIA2.

As for the for the 'image' and 'spectrum' deprecation, that should never go in a bugfix release.
I see where the suggestion is coming from, but as I see most people think they are needed, users are not expected to use ssa, sia, etc when they search, and until there is a clearly good alternative IMO we shouldn't deprecate. Besides, we have a lot of edu material that use the aliases so first those needs to be updated, too.

@msdemlei
Copy link
Contributor

msdemlei commented Aug 4, 2023 via email

@bsipocz
Copy link
Member Author

bsipocz commented Aug 16, 2023

I got bogged down in the weeds of trying to fix this, but at this point it's not worth using is as a blocker for the other fixes. So, I'll go ahead and release 1.4.2, and bump this into the next bugfix milestone.

@bsipocz bsipocz modified the milestones: v1.4.2, v1.4.3 Aug 16, 2023
@msdemlei
Copy link
Contributor

I ran into this again, and I feel we should really fix it; it's an embarrassment if SIA2 discovery doesn't work. And since what's registered is access_url-s rather than TAP-like "base URLs", we'll need a constructor accepting these.

@andamian, do you have a plan for that? Or should I come up with a PR myself?

@bsipocz
Copy link
Member Author

bsipocz commented Oct 20, 2023

I second that this should be fixed sooner rather than later as in fact the IRSA SIA is not at all usable from the registry atm.

I tried to come up with a PR myself, but it would have required to nuke a lot of the current code and I didn't want to do that without input from Adrian.

@andamian
Copy link
Contributor

I'm sorry, I've totally dropped the ball on this. Unfortunately, I'm swamped with other work at the moment and don't have time to do this PR right now - maybe by next week.

@msdemlei solution VanillaSIA2Service should work but my preference is to make the SIA2Service class more resilient to that URL. If the spec requires query and capabilities end points to be siblings, it could be easy to search for capabilities either in the current path or the parent to figure out the context.

@msdemlei
Copy link
Contributor

msdemlei commented Oct 26, 2023 via email

@andamian
Copy link
Contributor

andamian commented Oct 26, 2023

I've had a quick look at the code and I'm trying to determine exactly what the issue is here. The SIA2Service class states that it can take a base url or an access point url. Both SIA2Service('https://ws.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/sia/v2query') and SIA2Service('https://ws.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/sia') work for me. In fact, I can put any sibling (such as the SIA query end point) and it still works. It doesn't work with v2query? but that could be fixed if we consider that access URLs ending in ? are valid.

@msdemlei , if I understand correctly, what you are proposing is a class that ignores capabilities. But the capabilities end point is required in SIA2 so to provide code that circumvents our own standards doesn't seem right to me. If you think that capabilities have no use in SIA2 (like it's the case I think for DataLink) then the correct way to address this is to propose a change of the standard. My opinion is that this code should follow the standards.

The capabilities end point is actually used at the CADC. We have one SIA service and the capabilities advertise the access urls for V1 and V2 of the standard. We still advertise the supported auth methods there since there's currently no viable alternative.

This is my rather limited view from the service perspective. I'm aware that I lack any expertise with the registry and don't know how things look from there. I hope to fill in some gaps at the registry sessions at the upcoming Interop.

@msdemlei
Copy link
Contributor

msdemlei commented Oct 26, 2023 via email

@andamian
Copy link
Contributor

I could be persuaded to make parsing of capabilities optional in SIA2Service class, i.e. if the provided URL is a sibling of capabilities, just assume is the correct query end point but making a separate class for that adds unnecessary confusion if this is part of the public API. Users will need to know which class to instantiate when the only difference is a small implementation detail.

At the moment, the workflow with auth integration could check the user credentials in the sessions against the ones associated with the query end point in capabilities and potentially provide a better feedback than the simple brute force 403 error. The big problem for me here is that oftentimes the credentials are obtain to works for specific authorities/domains. How can the client find out that and prevent leaks of credentials to other domains? Not such a big problem with certificates but with bearer tokens, which essentially act as short term user/passwords.

@msdemlei
Copy link
Contributor

msdemlei commented Oct 30, 2023 via email

@bsipocz bsipocz modified the milestones: v1.4.3, v1.5 Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants