-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: SIA2 services from the registry are not searchable. #450
Comments
So... while I give you that I probably should not unconditionally
catch ValueError-s n RegistryResource.search (but please note that
I'd like to get rid of that method anyway; we should teach people to
use get_service explicitly, as even now most records already have
multiple capabilities), this is problem is unrelated to registry
operations (phewy!).
On Thu, Jun 15, 2023 at 11:11:17PM +0000, Brigitta Sipőcz wrote:
import pyvo
from astropy.coordinates import SkyCoord
coords = SkyCoord('150.01d 2.2d', frame='icrs')
image_services = pyvo.regsearch(servicetype='sia2')
irsa_seip = [s for s in image_services if 'irsa' in s.ivoid and 'seip' in s.ivoid][0]
seip_results = irsa_seip.search(coords)
[...]
E10: None:3:15: E10: File does not appear to be a VOSICapabilities file
A shorter reproducer is:
import pyvo
svc = pyvo.dal.SIA2Service(
"https://irsa.ipac.caltech.edu/SIA?COLLECTION=spitzer_seip&")
At that point we go down a rabbit hole (does someone reading this
feel responsible for the SIA2 code? This would be the moment for the
people who discarded by objections to SIA2 to come forward...).
SIA2Service, because of the way we had for a while planned to do
authentication, iterates through the service's VOSI capabilities *on
construction* (sia2.py#182 ff). It does not catch exceptions when
doing that, although all that only become relevant with auth. When
there's something wrong with the capabilities, the whole thing blows
up.
For robustness with the overwhelming majority of SIA2 services
(that don't require auth) I'd say there should be some exception
catching (and probably downgrading to warnings) going on here. SIA2
itself works just perfectly well without ever looking at VOSI
capabilities.
Regrettably, the rabbit hole goes deeper. Consider EndpointMixin in
pyvo.dal.vosi, which has a _get_endpoint method trying the following
URIs when trying to find VOSI endpoints:
'{baseurl}/{endpoint}'.format(baseurl=self.baseurl,
endpoint=endpoint),
url_sibling(self.baseurl, endpoint)
I'm impressed it's that diligent, I have to say (kudos to Adrian).
Anyway, in this case, where we have access_url-embedded URL
parameters, that works out to
'https://irsa.ipac.caltech.edu/SIA?COLLECTION=spitzer_seip&/capabilities',
'https://irsa.ipac.caltech.edu/capabilities?COLLECTION=spitzer_seip&'
The first clearly is absurd. The second URI *could*, return the
capabilities requested, but at least right now it's a 404, and I
frankly would object to site-global capabilities somehow influenced
by URL parameters (though I don't think we're forbidding that
anyhwere).
The capabilities actually *are* at
https://irsa.ipac.caltech.edu/SIA/capabilities?COLLECTION=spitzer_seip&
which we might add as another guess, but... I have a bad feeling
about that. There aren't any clear rules *right now* for how to find
capabilities from an access URL outside of TAP, and
$ python -c "import this" | grep temptation
In the face of ambiguity, refuse the temptation to guess.
So... here's what I think we should do:
(a1) downgrade the failure on bad capabilities to a warning, or, IMHO
better,
(a2) remove the capabilities thing entriely from the constructor of
SIA2Service if CADC (who I think are behind that experiment) agrees
that auth negotiation won't happen like that any more.
(b) It would be wonderful if IRSA could stop relying on on built-in
parameters in access URLs. They are trouble, although admittedly not
outright illegal. And it's really nice *if* we can guess capability
URIs by saying "try the capabilities sibling of the access URL" and
it works.
(c) The "right" way to figure out capability URIs for us would be to
run a RegTAP query along the lines of
select access_url from rr.capability natural join rr.interface
where standard_id='...vosi#availability'
and ivoid=<whatever>
(I mention in passing that because for that particular service, IRSA
does not register their VOSI capabilities, it still would have
failed, but that then is really IRSA's fault).
We *might* put that RegTAP query into the VOSI mixin to use when all
else fails, but at least it shouldn't be called by default, perhaps
only when all other guesses have failed (and given that nothing but
TAP really makes good use of VOSI caps, I wouldn't even bother with
that).
For one, I'm not so wild about having one RegTAP query per pyVO
service constructed worldwide. But perhaps more importantly, it would
make an ivoid (and hence a registry entry) effectively mandatory for
our service objects. While I'm a big fan of nudging people to
register, I'd say that'd be going to far. Our service objects should
keep on being constructable with just an access URL, no?
|
@zoghbi-a sees the same issue for multiple heasarc services, so if multiple archives faulting on this, this maybe a shortcoming of the requirements. I don't know, haven't dived into the exact details but was surprised that things don't work (==> we need to make sure one of each service type is being tested in pyvo, knowing that something doesn't work is better than surprises I suppose) |
I can confirm that this is present in many services. This an example code: ivoid = 'ivo://astron.nl/__system__/siap2/sitewide'
srv = pyvo.regsearch(ivoid=ivoid)[0]
srv.get_service('sia2') These are examples of services producing the same error
For the irsa case, if I put the capabilities url by hand ( |
Do we know whether this is an issue for any other tools out there? If the capabilities are in fact not required for a service to be registered (a very widespread nature of this issue suggests this, too), then it look to me that (a2) could be a way ahead or within pyvo to insert it into the URL as Adbu suggests above. |
On Fri, Jun 16, 2023 at 12:05:04PM -0700, Brigitta Sipőcz wrote:
Do we know whether this is an issue for any other tools out there?
Probably not. I think pyvo is afflicted mainly because it was a test
bed for CADC's auth experiments.
If the capabilities are in fact not required for a service to be
registered (a very widespread nature of this issue suggests this,
Ah, it's one of these silly pieces of standardese where people
thought something should be required but then actually didn't need
it, which is why in the end it doesn't even work. In this case,
there is a formal requirement to have capabilities but no recipe just
where it should be.
Given that:
too), then it look to me that (a2) could be a way ahead or within
pyvo to insert it into the URL as Adbu suggests above.
Adrian, do we have green light for removing the capabilities
experiment from SIA2 again? Or if you still need it, can we perhaps
hide it from the general public by some @prototype decorator?
|
On Fri, Jun 16, 2023 at 11:54:47AM -0700, Abdu Zoghbi wrote:
I can confirm that this is present in many services. This an example code:
ivoid = 'ivo://astron.nl/__system__/siap2/sitewide'
srv = pyvo.regsearch(ivoid=ivoid)[0]
srv.get_service('sia2')
Uh. That's an interesting one. For that, pyvo actually *could* find
the right capability URI if it tried the
url_sibling(self.baseurl, endpoint)
from line 28 of pyvo.dal.vosi first. As things are, it tries
'{baseurl}/{endpoint}'
first, which in this case (and according to the old requirement of
ending GET-table URIs with a question mark or an ampersand depending)
works out to
<https://vo.astron.nl/__system__/siap2/sitewide/siap2.xml?/capabilities>.
That's a SIA2 query, albeit a somewhat odd one. Since SIA2 I think
does not say anything about rejecting unknown parameters, the
underlying software (ahem) follows ancient (and bad) VO practice of
just ignoring them. So, what comes back is a VOTable with a lot of
matches, and parsing that as a capability document fails.
I would seem to me that regardless what we do with SIA2Service we
need to re-think the capability URL guessing here (and perhaps talk
to GWS and DAL to get a clear statement out of them how they think we
should be doing this). Blindly attaching "/capabilities" to an access
URL that will very often end in a question mark or even an ampersand
certainly is not a good idea. Also, by DALI rules I'd strongly suggest
we should try the sibling first.
|
cc @andamian |
Sorry for the late reply. It looks to me that As for the authentication, there was a time when we, at the CADC, had to have a different access points for Basic Authentication and the rest of auth methods. It's not the case anymore since we've stopped supporting that which means that direct access URLs should be more manageable now. As a side note, the entire |
On Mon, Jun 26, 2023 at 11:59:54AM -0700, Adrian wrote:
It looks to me that `irsa_seip.search()` is trying to instantiate
the `SIA2Service` with an access URL instead of a base URL. If
that's the case, we could add an additional argument to the
constructor to skip parsing the capabilities when that argument is
present. Or, if we teach our users to get the access urls directly
from the registries then we can ignore the `capabilities` endpoint
The access URL does come directly from the registry here -- and
clearly everyone registers "the" {query} endpoint's URL as access URL
(I frankly didn't even pause to think about it, but that may be
professional blindness -- it's simply the say all the other
S-services work).
completely and deprecate `baseulr` in the constructor in favour of
`access_url`. This model works fine for SIA2 (at least for now) but
could be ambiguous for services with multiple endpoints.
Ah. I frankly hadn't noticed that SIA2Service wants to be
constructed with a base URL, either (see above on professional
blindness)...
Again, all the other XService-s are constructed with what's in the
Registry as access_url (which in TAP happens to be a, well, base
URL). It wodld hence be lovely if I didn't have to special-case
SIA2, and so I'd really like a VanillaSIA2Service (say) class that
does take an access URL in the constructor for registry use.
Since Sect 2. of SIA2 says:
All {query} resources must be siblings of the VOSI-capabilities
resource; this limitation enables a client with just the URL for an
SIA {query} resource (e.g. from a Datalink service descriptor) to
find the VOSI-capabilities...
VanillaSIA2Service *could* just strip a segment of the access URL and
then up-call to the current SIA2Service, but due to the vagaries of
endpoint guessing I'd rather not do that.
Instead, I think the ideal implementation would be to have
VanillaSIA2Service to accept the access_url and not touch
capabilities, wheres SIA2Service keeps its current interface but
becomes basically a factory that parses the capabilities to find
the/a access URL and then constructs VanillaSIA2Service.
be more manageable now. As a side note, the entire `pyvo.auth`
module is right now based on the `capabilities` endpoint as it aims
at mapping the supported authentication mechanisms to the
appropriate access URLs. It's probably due for an update although
there no agreed solution right now.
Yeah -- I think this whole thing is another indication that it's nice
if we can do without mandatory parsing of capabilities... Aw, the
cursed auth thing...
|
@andamian - do you have time on the fix for this or shall I? I would really like to have a new bugfix version out by the end of the month that fixes all these smallish SIA2 related bugs (basically all the bugs that we found except the 'image' regsearch, though if we have a solution/workaround in for that, too, that would be the most wonderful) |
OK, so this is the only blocker remaining for 1.4.2, so I added the milestone. |
On Wed, Aug 02, 2023 at 03:57:50PM -0700, Brigitta Sipőcz wrote:
OK, so this is the only blocker remaining for 1.4.2, so I added the milestone.
Slightly related to this SIA2 business is also
#449 -- and I think we should
really tell people that the shouldn't use image and spectrum as
servicetypes before there's more code using that.
Since it won't break anything: Can't it go into 1.4.2, too?
|
This bug is when one does a As for the for the |
On Thu, Aug 03, 2023 at 04:54:46PM -0700, Brigitta Sipőcz wrote:
As for the for the `'image'` and `'spectrum'` deprecation, that
should never go in a bugfix release.
I see where the suggestion is coming from, but as I see most people
think they are needed, users are not expected to use `ssa`, `sia`,
etc when they search, and until there is a clearly good alternative
IMO we shouldn't deprecate. Besides, we have a lot of edu material
that use the aliases so first those needs to be updated, too.
The problem is: they don't do what people (can reasonably) expect
them to do, and I don't see a (reasonable, let alone
backwards-compatible) way to make them do what people can expect
them to do. They are just misfeatures introduced before we realised
what the introduction of obscore actually means -- and before we had
experience with a major version step.
Would you be less disinclined to merge the PR if we didn't raise the
Warnings but just deprecated in prose? You see, it'd be great if we
at least did *something* to not accumulate more code out there that
uses these misfeatures. And sure, it'd be great if we could update
educational material that's out there to not show them any more – can
you point me at some so I can start bugging authors?
(and apologies for hijacking this issue; I suppose we should continue
this discussion over at #449)
|
I got bogged down in the weeds of trying to fix this, but at this point it's not worth using is as a blocker for the other fixes. So, I'll go ahead and release 1.4.2, and bump this into the next bugfix milestone. |
I ran into this again, and I feel we should really fix it; it's an embarrassment if SIA2 discovery doesn't work. And since what's registered is access_url-s rather than TAP-like "base URLs", we'll need a constructor accepting these. @andamian, do you have a plan for that? Or should I come up with a PR myself? |
I second that this should be fixed sooner rather than later as in fact the IRSA SIA is not at all usable from the registry atm. I tried to come up with a PR myself, but it would have required to nuke a lot of the current code and I didn't want to do that without input from Adrian. |
I'm sorry, I've totally dropped the ball on this. Unfortunately, I'm swamped with other work at the moment and don't have time to do this PR right now - maybe by next week. @msdemlei solution |
On Wed, Oct 25, 2023 at 03:34:12PM -0700, Adrian wrote:
that URL. If the spec requires query and capabilities end points to
be siblings, it could be easy to search for `capabilities` either
in the current path or the parent to figure out the context.
Mhh, I'm still staunchly in favour of not attempting to parse
capabilities unless we really need them; it's just one extra thing that
can break. On top, since the capabilities endpoint generally has no
function for SIA2, it will break often (Markus' Interoperability Rule
20: "If it's not used by clients, it's broken").
The way our current SIA2Service works, that's not possible because
the query endpoint's name is unpredictable. So, I really
think we should have a separate service object with all the actual
query functionality that, in analogy with the other S-services, is
constructed just with the access URL (consistency is always nice).
The current SIA2Service would then just be a factory function
fiddling out the access url from the capabilities and then
constructing the "plain" service object.
The NormalSIA2Service then wouldn't have to bother with capabilities
at all, which I think is rather gratifying.
Oh, and: does CADC plan on using capabilities for auth negotiation in
the future?
|
I've had a quick look at the code and I'm trying to determine exactly what the issue is here. The @msdemlei , if I understand correctly, what you are proposing is a class that ignores capabilities. But the The This is my rather limited view from the service perspective. I'm aware that I lack any expertise with the registry and don't know how things look from there. I hope to fill in some gaps at the registry sessions at the upcoming Interop. |
On Thu, Oct 26, 2023 at 10:29:01AM -0700, Adrian wrote:
@msdemlei , if I understand correctly, what you are proposing is a
class that ignores capabilities. But the `capabilities` end point
is required in SIA2 so to provide code that circumvents our own
standards doesn't seem right to me. If you think that
Well... It's just that in the normal workflow, SIA2 has no need to
look at capabilities, and so I'd prefer to leave it alone, if only in
the spirit of the golden rule of interoperability ("be strict in what
you produce but lenient in what you expect"; we're not a validator,
after all, and the SIA2 validators that exist don't look at
capabilities either (yet) for all I know, so we have to expect them
to be broken in general for SIA2). That we save an HTTP request if
we don't use the capabilities is a welcome side effect.
`capabilities` have no use in SIA2 (like it's the case I think for
`DataLink`) then the correct way to address this is to propose a
Well, that's another can of worms that nobody really wants to open;
see http://ivoa.net/documents/caproles/ -- and that's another reason
not to rely on capabilities without a strong reason.
change of the standard. My opinion is that this code should follow
the standards.
Sure; but "following a standard" doesn't mean we should require more
of services than we need to get *our* job done.
The `capabilities` end point is actually used at the CADC. We have
one `SIA` service and the `capabilities` advertise the access urls
for V1 and V2 of the standard. We still advertise the supported
auth methods there since there's currently no viable alternative.
Well, SIA2Service doesn't need the V1/V2 dispatch; with VO
integration, you make that choice at the discovery stage, and by the
time SIA*2*Service is created, that choice is over.
So... what's the workflow on auth integration? Can we perhaps only
look at the capabilities if people have passed in some credentials?
Or when the inquire whether auth is possible and if so, which?
|
I could be persuaded to make parsing of At the moment, the workflow with auth integration could check the user credentials in the |
On Thu, Oct 26, 2023 at 12:47:00PM -0700, Adrian wrote:
I could be persuaded to make parsing of `capabilities` optional in
`SIA2Service` class, i.e. if the provided URL is a sibling of
`capabilities`, just assume is the **correct** query end point but
In the scenario relevant here, it's usually what the data providers
gave in their registry record; if that's wrong, there is nothing we
can fix (and it would stand to reason that what you get from
capabilities -- essentially, an excerpt of the registry record -- is
then wrong, too).
making a separate class for that adds unnecessary confusion if this
is part of the public API. Users will need to know which class to
instantiate when the only difference is a small implementation
detail.
Yes, I give you two classes are not pretty, but then some magic
heuristically probing whether what is passed in is the parent of both
capabilities and {query} or whether it's the normal access URL IMHO
is worse. And it's not pretty either that SIA2Service works
differently from all the other S*Service classes (that are all
constructed with access_url).
So, I think there's no way around *some* sort of uglyness.
What about a classmethod from_access_url on SIA2Service with an
explanation that that's what you ought to use when you want to work
SIA2Service to work like the other S*Service classes? This could
then switch off capabilities parsing (at least until we have a case
that needs it or until there's an interoperable auth standard that
needs it).
Breaking the current behaviour and making the argument an access_url
by default is probably not an option, is it?
to works for specific authorities/domains. How can the client find
out that and prevent leaks of credentials to other domains? Not
such a big problem with certificates but with bearer tokens, which
essentially act as short term user/passwords.
I realise auth is behind the current design, but if we can separate
the API question from auth considerations as much as possible, it
certainly would help a lot to solve it. Auth has been an unsolved
problem for so many years, and I fear if we delay fixing SIA2Service
until we've figured out auth will keep SIA2Service in its broken (for
my -- registry-related -- purposes) state for a long, long time.
|
It was confirmed that this misbehaviour shows up in several heasarch SIA2 services so it's not an irsa related problem
If this is of any help
The text was updated successfully, but these errors were encountered: