-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: add possibility to return full list of services #505
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #505 +/- ##
==========================================
+ Coverage 80.28% 80.32% +0.04%
==========================================
Files 52 52
Lines 6152 6175 +23
==========================================
+ Hits 4939 4960 +21
- Misses 1213 1215 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should aim to keep the current behaviour and make lax
opt-in, but I let the others to chime in.
On Wed, Dec 06, 2023 at 08:37:00AM -0800, Manon Marchand wrote:
The in the registry module, the lax mode of `get_service` and
`get_interface` was returning arbitrarily the first candidate of
the list. We ***@***.***) and I would like if this could return
the full list instead. This would allow the users to chose which
service to use. This PR implements this.
I think we should avoid having something return a sequence that
before returned a single item, and I think interfaces that sometimes
return a sequence and sometimes an item are quite likely evil
(remember cgi.FieldStorage?).
Given that, I'm not a fan of this PR.
But let's take a step back: Where do you need that behaviour? At
the moment, it is my conviction that we should avoid multi-interface
capabilities. The original plan had been to use them when you have
endpoints for two or more versions of a standard. That, I claim, was
not a good plan: simultaneous interfaces for minor versions make no
sense, interfaces for different major versions should sit in
different capabilities (they have different standard ids).
So... what's your use case for multiple interfaces in one capability?
If so, can we perhaps derive a principled method to select the
"right" interface?
|
b86cff4
to
54bd363
Compare
Hi, Some catalogues contains several tables having positions (so multiple conesearch) . Each conesearch is described in 1 interface and 1 capability : see the OAI record provided by CDS: The issue stays that pyvo registry returns a unique conesearch for these catalogue (eg: B/corot):
I don't see any way to execute a conesearch on the 2 tables (the upper code select 1). May be, I don't use pyvo correctly? |
Ah indeed! It's very much our fault because VizieR entries in the registry are catalogs and not tables. So a same entry might have n conesearch capabilities associated to n tables from a single catalog. They are all valid, you just interrogate different things with them. Writing this made me change my mind. Let's not change behavior for services that correctly declare tables one by one. Could we (pretty please) add "all_services" and "all_interfaces" keywords that would be false by default? This would be very useful for us and shouldn't impact the others. I updated the commit to implement this. @gilleslandais : this does not fully solve our problem since we'd still need to find a way to link services to tables... But as soon as we can access the full list we can do something downstream. Maybe a wrapper in astroquery.vizier ? |
54bd363
to
a92c925
Compare
Very true. The new implementation returns an item when |
On Thu, Dec 07, 2023 at 04:26:41AM -0800, Manon Marchand wrote:
Very true. The new implementation returns an item when `all_***` is
False and a list when `all_***` is True. This could still be an
issue... An other alternative could be two new methods
`get_all_services` and `get_all_interfaces`?
I'd prefer the extra methods, but I'd still say we need to first
define what we are trying to achieve. For get_all_interfaces I still
don't see a use case, and as long as that's the case, I'm against
having such a function.
On the other hand, the problem with the VizieR resources that have
several cone search capabilities that work over different tables is
something we need to solve. But again I'm not *terribly* enthused by
get_all_services; how would a user pick the service "they want"?
For folks reading this unsure what this is about, perhaps have a look
at how TOPCAT does this. As an example, look at the cone searches
for J/A+A/384/81 (randomly picked: "ESO Imaging Seurve CDF-S point
sources") in its Cone Search window.
What TOPCAT does when you select that resource is show a list of the
access URLs it found, with a "Description" column, which it probably
takes from rr.capability.description. In our example, after a bit of
canned text, that description tells users what it is they'd get:
either QSOs or WDs or low mass stars.
I'd frankly say that based on what should be in the subject field of
the Registry record(s), these three should probably have been created
as three different VO Resources, but I give you it's probably too
late to change that now for several tens of thousands of VizieR
resources, not to mention that I'd not want to fix the resource
descriptions to produce something readable.
So, given the real-world constraints, the question is: how do we
expose something at least as good as what TOPCAT does in pyVO? And I
think there are three pieces to that question:
(a) how do people learn that there are multiple different
"sub-services"?
(b) how can they make an informed choice which one to query?
(c) how do they communicate that choice to pyVO?
As to (a), I think we should seriously consider making lax False by
default.
from pyvo import registry
res = registry.search(ivoid="ivo://cds.vizier/j/a+a/384/81")[0]
print(res.get_service("scs", lax=False))
correctly alerts users that "give me *the* SCS service" isn't a valid
request on this particular resource. As long as std_only is default
True in get_service (and thus we filter out WebBrowser interfaces
spuriously inserted into standard capabilities, something people
shouldn't do), I don't think we'd break a lot that isn't already
broken -- but we should certainly investigate that a bit more closely.
As to (b), I think we'd need a method
get_(something)_of_type(service_type) -> list[(something).
The question is: What would "something" be? A plain XService as it
is now would I think not be a good choice, in particular because that
is lacking the all-important capability/description that's the only
thing telling people whether they'll get QSOs or WDs.
But: We could try to fix the XService classes to admit that
description when they are constructed from registry records and to
prominently display that in their repr if given; I actually think I'd
like that as a general feature. Let's call that option (b1).
[Future perspective: Perhaps we could even make that descripton
attribute a property (or have a get_description method?) and try to
pull it from VOSI capabilities if it wasn't passed in during
construction? ]
The alternative, b2, would be to return intermediate objects, perhaps
of class Capability, that only model the content of vr:Capability
plus possible extensions
(http://docs.g-vo.org/schemadoc/schemas/VOResource_xsd/complexTypes/Capability.html).
We happen to have such a thing for TAPRegExt, but I think there's
nothing for Capability itself or the other Capability extensions
(from SimpleDALRegExt). There is actually something to be said for
that (e.g., perhaps people would like to inspect extended SSAP
metadata from within pyvo without custom code:
http://docs.g-vo.org/schemadoc/schemas/SSA-v1_2_xsd/complexTypes/SimpleSpectralAccess.html).
But it feels as if that's a lot of work, and actually, most of the
extended metadata isn't readily available to a hypothetical
get_capabilities_of_type unless there was VOSI capabilities on the
resource [background: RegTAP relegates much of it to rr.res_detail,
and some of it isn't even mandatory in the mapping].
If we go for (b1), we have another advantage: (c) becomes trivial,
because people already have service objects that they can immediately
use.
So... thanks for reading up to here. Any thoughts on all of this?
Implementations, perhaps? I think this sounds a lot worse than it
actually is, at least if we postpone the
self-description-from-capabilities part for now.
|
Thanks for the detailed answer! I'd like to write an implementation of suggestions a/b1 It's a bit more than I expected to do in the beginning so I'm switching to a draft state until ready. |
Thanks for tackling this! Feel free to call me in at any time.
|
498409f
to
d858193
Compare
I think this addresses our points. |
On Tue, Dec 12, 2023 at 01:06:27AM -0800, Manon Marchand wrote:
I think this addresses our points.
That's already looking very good, thanks.
Some DALServices children classes (SSA, SCS, SIA, SLA) already had
a way of retrieving a description from the result. I chose that the
description coming from the abstract DALService appears if it was
provided and otherwise keep the description of the output as it was
the case before.
But I'm not quite sure I'm too fond of this. Consider this code:
from pyvo import dal, registry
res = registry.search(ivoid="ivo://cds.vizier/j/a+a/384/81")[0]
svc1 = res.list_services("scs")[0]
print(svc1.description, svc1.baseurl)
print("*****"
svc2 = dal.SCSService(svc1.baseurl)
print(svc2.description, svc2.baseurl)
This, right now and for me, printed:
Cone search capability for table J/A+A/384/81/lms (Low Mass Stars candidates) http://vizier.cds.unistra.fr/viz-bin/conesearch/J/A+A/384/81/lms?
*****
VizieR Astronomical Server vizier.cds.unistra.fr
Date: 2023-12-12T14:47:35 [V7.32.6]
Explanations and Statistics of UCDs: See LINK below
In case of problem, please report to: ***@***.*** http://vizier.cds.unistra.fr/viz-bin/conesearch/J/A+A/384/81/lms?
…-- the two outputs are really different, which doesn't feel right
given that the service objects is basically the same (svc1.query will
return the same stuff as svc2.query).
The background on this is long and long-winded, and I'm sure we'd not
do today what the original implementors did. In sort: in the VO's
stone age, there were certain self-description "methods" built into
the protocols, which represent *somewhat* what today's VOSI endpoints
say -- but then quite differently, and of course in different ways
for every S-protocol.
XService._get_metadata tries to use these things.
*Perhaps* it might be a good idea to try and map what _get_metadata
does to the Registry and then have an option to pass all that in
through the constructor, gleaned from what's in the Registry. But,
really, that'd be a major project of questionable merit.
That's why I personally would much prefer if the capability
description were an extra piece of metadata ("cap_desc"); the service
objects constructed with an access URL only won't have that (I'd
contribute a method trying to get it from VOSI capabilities if folks
are interested), but at least the description attribute will be
consistent between the two cases.
Sorry -- this is all ugly legacy.
But I'd be ok to leave things like this if nobody else is concerned
about these differences (I'd hope people will mainly go through the
Registry one day anyway). But even then I think it would be better
to escape or normalise to blanks lfs when putting the description
into repr() -- try repr(svc2) in the example above; sure, it
doesn't *have* to be valid python, but it's definitely a plus for
repr if it is.
A second wish: In index.rst, I think it would be great if you'd
mention the main reason where this would happen, with something like:
The leading reason for multiple services of the same type in one
resource is that mainly VizieR often has one cone search capability
per table when a resource has more than one of them. In this case,
you can use list_services("scs") to inspect which services are
available.
I am frankly surprised that there are quite a few non-VizieR
resources that have multiple capabilities of the same type, too:
select ivoid, standard_id
from rr.capability
where ivoid not like 'ivo://cds.vizier%'
and standard_id is not null
group by ivoid, standard_id
having count(*)>1
In particular the HiPS one on ivo://cefca/j-plus/j-plus-dr2 makes me
wonder whether we ought to have a smoother interface on this anyway
in the long run.
Which brings me to a third wish: Suppose someone writes a jupyter
notebook and for some reason doesn't want to hard-code access URLs.
So, they'll say:
result = registry.search(something or other)
resource = result["ivo://cds.vizier/j/a+a/384/81"]
-- that way, they get a guaranteed part of the Registry response (as
long as the resource is still there); more likely, they'll use the
resource's short name, which is less certain but still reasonable.
But then:
svc = resource.list_services("scs")[4]
Ouch. This would choose a different service, as the order of the
services is unpredictable at the moment.
One way to... solve this would be to sort the result of list_services
by, say, the "cap_desc". Of course, if the cap_desc-s are all
None or identical, that won't help, but then everything is lost and
there's nothing we can do anyway.
I *think* I'd slightly prefer a method
resource.choose_service(servicetype, description_fragment),
where description_fragment should be a part of the description and
the method would raise an exception if there's not exactly one
matching capability.
Or perhaps there's even non non-crazy solution to this I'm not seeing
at the moment?
I'm truly sorry all this is so complicated; I'm totally ok, too, with
merging now, and I'll turn my wishes into bugs that I'll hopefully
even write PRs for. The PR, as it is, certainly improves the current
situation, so in my view it's worth merging as-is, too.
|
No no, you're not the only bothered person. I was feeling uneasy with the two behavior too, but I only thought about renaming the other For the list of services, the order should become deterministic with an additional What about integrating the Sure for the index.rst, I'll update it! |
On Tue, Dec 12, 2023 at 09:23:55AM -0800, Manon Marchand wrote:
For the list of services, the order should become deterministic
with an additional `ORDER BY cap_index` in the long list of order
bys?
I'd not veto such a change, but I'm not a huge fan, either; you see,
cap_index is basically an implementation detail (it could a a UUID,
for instance) and hence could change on every re-ingest of a
resource, which again might change the service order over time.
So, frankly, I like the locality of just sorting list_service's
return value, and sorting by description seems relatively
user-understandable to me (and might actually help people who have a
rough idea what they're looking for when visually inspecting a
result).
What about integrating the `choose_service` method into
`list_services` with a keyword argument?
Something like
`list_services('scs', description_fragment='your_string')`
Looks reasonable to me in principle. I've been racking my brain for
a better argument name than description_fragment that somehow conveys
in one or two words that it's "stuff I want to see in the service's
description" but didn't come up with anything.
The other reason I'm doubting is that I think passing
description_fragment should change the method's behaviour to only
return when there's exactly one match, whereas list_services can
legally return a list of any length.
So... what if we added the description_fragment to get_service and
in its docstring then said something like:
Some resources have multiple capabilities ("services") of the same
time (see list_services for a discussion). get_service will
usually raise an XYException in that case; passing a
descripiton_fragment that uniquely identifies the capability to
query will make get_service predictably return the desired
capability. Use list_services to find such a unique descripton
fragment.
or so? You see, get_service already has the signature I'd want (error
out on 0 or >1 results, else directly return a service object rather
than a sequence).
that there is only one element). I'm just afraid about all the
different methods to get services right now and would like if we
Absolutely, yes. Let's keep the APIs as lean as we can given the
adversities of the VO :-).
|
About the mapping between capabities and datasets: Could we envisage an update in a next VOResource version to include the mapping? |
On Wed, Dec 13, 2023 at 08:48:43AM -0800, gilleslandais wrote:
Could we envisage an update in a next VOResource version to include
the mapping? For instance with a new element in the schema that
lists the datasets which are served by the capability?
The question is how you would specify the "dataset" part of that
relation...
We've discussed essentially this problem several years ago, I think
with Sébastien. At least in the VizieR case, the "dataset" would
basically be a table in the tableset. Somehow linking that and the
capability querying it might indeed improve the discovery process
beyond a full text search in the capability description.
We could invent something for doing that in a VODataService update
Full disclosure: I'm not crazy about that, because that's the first
feature in VOResource+extension doing cross-branch links, and these
have a way of being painful. Still, if they help a lot, I'd not
stand in the way.
But that's the point: *does* it help? Me, I'm not sure whether that
would enable plausible and understandable UIs beyond what TOPCAT does
in its SCS window. Can you imagine one?
The longer I think about it, the more I'm convinced that having a
single-table protocol like SCS on top of a multi-table resource
necessarily is a kludge, and hence finding a clean, logical, and
pretty solution may very well be a fool's errand. Perhaps we simply
need to nudge folks towards TAP (or perhaps some future multi-table
parametric protocol) for the more complex resources? And meanwhile
limp along, emulating what has been working reasonably well in TOPCAT
as best we can?
|
Hi! When is the 1.5 going out? Any chance I can still squeeze this in? I think we're close to agreeing on everything and I can submit something around tomorrow |
5a3cf6d
to
47a75d1
Compare
(note that this is totally unrelated to this PR but I had to rebase and add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no more nits to pick at this point, and this PR is definitely fixing something that needs fixing at least to the level TOPCAT has. So, let's merge (and then perhaps see at a later point whether we can become better than TOPCAT :-).
I love the sentiment here, we can always aspire :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this PR introduces a lot of unintended API changes, however, the fixes are trivial.
(I haven't tried to use this or run any more tests, as I hoped to get in the review before someone hits merge)
Actually, ignore most of my review above (as #507 can go in before this one) as I would suggest we bump this for v1.6, as it wasn't really on the radar for v1.5 anyway, and then we can try to make that release cycle a short one. |
@ManonMarchand - #507 has gone in, so if you're still around before the holidays, you can go ahead with a rebase and a final commit. |
this should prevent from users not discovering that a service might have different services of a same type
this lists the services available for this RegistryResource. They can then be accessed like `record.list_services()[0].search(***)`
this will return the service for which the keyword is a substring of capability_description
2718a8a
to
8b8f578
Compare
8b8f578
to
930932a
Compare
Pfiou, not my easiest rebase 😅 PS: wishing nice holidays to you both, see you in January! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All my review comments have been addressed after the rebase, so this is good to go from the API perspective.
Given Marcus' approval above and the fixups against |
Hello PyVO,
EDIT following conversation
list_services
that return a list of servicesWhat it looks like