Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iter_datalinks should be smarter about datalink products #543

Closed
msdemlei opened this issue Apr 29, 2024 · 2 comments
Closed

iter_datalinks should be smarter about datalink products #543

msdemlei opened this issue Apr 29, 2024 · 2 comments

Comments

@msdemlei
Copy link
Contributor

msdemlei commented Apr 29, 2024

pyvo/dal/adhoc.py (we ought to have a plan to improve that naming, btw) in iter_datalinks has the following code:

  elif row.access_format == DATALINK_MIME_TYPE:
       yield DatalinkResults.from_result_url(row.getdataurl())

This is for the case of data products that in obscore (or similar protocols) are distributed as datalink documents. There is two types of trouble with this:

(1) it does not work in some interesting scenarios because it is trying attribute access, and the attribute is only defined in "typed" records (obscore, siap2) rather than generic TAP results.

(2) it will confuse error messages whenever iter_datalinks is called on something pyvo cannot find datalinks in.

Consider this code:

import pyvo

svc = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
results = svc.run_sync(
 "select top 5 access_url, 'application/x-votable+xml' as access_format"
 " from dasch.plates")
for dl in results.iter_datalinks():
    print(dl.id)

If you look, dasch.plates access_urls really point at datalinks, so this
is a situation the access_format code was designed to handle. But if
you run the script, you will see:

Traceback (most recent call last):
  File "/home/msdemlei/zw.py", line 7, in <module>
    for dl in results.iter_datalinks():
  File "/home/msdemlei/gavo/src/pyvo/pyvo/dal/adhoc.py", line 222, in iter_datalinks
    elif row.access_format == DATALINK_MIME_TYPE:
         ^^^^^^^^^^^^^^^^^
AttributeError: 'TAPRecord' object has no attribute 'access_format'

– and that is exactly the message you will also get on tables that have
nothing to do with datalink at all, which seems rather misleading to me.

So... I'd suggest the following:

(a) only access access_format when the record actually has that
attribute

(b) perhaps support utype- or name-based discovery of the access_format, too (name-based this would make my example work but is deeply in the realm of ad-hoc hacks; utype-based would be a lot cleaner and perhaps good enough. Data providers would have to add constant-valued access_format columns then, but that's not very painful).

(c) produce an error message "No datalinks discernible in this table" or something to that effect when we cannot locate an access format column (and neither find datalinks through any of the more reputable means of attaching them).

Opinions? Should I go ahead with that?

@aragilar
Copy link
Contributor

aragilar commented Jul 1, 2024

We (Data Central) use the standard obscore names and utypes for SSA responses, are there any other values that make sense other than those?

@msdemlei
Copy link
Contributor Author

msdemlei commented Jul 1, 2024 via email

msdemlei added a commit to msdemlei/pyvo that referenced this issue Sep 17, 2024
To do that, we split the original iter_datalinks in the two cases; one
where the datalinks come from a table via a datalink meta RESOURCE, and the
other where we believe there's datalink products in the table.

In the second case (the one that was broken before), we try the name access_format as attribute and column name, and then a utype and two UCDs
(generic and SIA1) each.  We could add SSAP UCDs, but I'll only do that
if someone is asking for it.
@msdemlei msdemlei mentioned this issue Sep 17, 2024
msdemlei added a commit to msdemlei/pyvo that referenced this issue Sep 17, 2024
To do that, we split the original iter_datalinks in the two cases; one
where the datalinks come from a table via a datalink meta RESOURCE, and the
other where we believe there's datalink products in the table.

In the second case (the one that was broken before), we try the name access_format as attribute and column name, and then a utype and two UCDs
(generic and SIA1) each.  We could add SSAP UCDs, but I'll only do that
if someone is asking for it.
msdemlei added a commit to msdemlei/pyvo that referenced this issue Sep 17, 2024
To do that, we split the original iter_datalinks in the two cases; one
where the datalinks come from a table via a datalink meta RESOURCE, and the
other where we believe there's datalink products in the table.

In the second case (the one that was broken before), we try the name access_format as attribute and column name, and then a utype and two UCDs
(generic and SIA1) each.  We could add SSAP UCDs, but I'll only do that
if someone is asking for it.
msdemlei added a commit to msdemlei/pyvo that referenced this issue Sep 17, 2024
To do that, we split the original iter_datalinks in the two cases; one
where the datalinks come from a table via a datalink meta RESOURCE, and the
other where we believe there's datalink products in the table.

In the second case (the one that was broken before), we try the name access_format as attribute and column name, and then a utype and two UCDs
(generic and SIA1) each.  We could add SSAP UCDs, but I'll only do that
if someone is asking for it.
msdemlei added a commit to msdemlei/pyvo that referenced this issue Sep 17, 2024
To do that, we split the original iter_datalinks in the two cases; one
where the datalinks come from a table via a datalink meta RESOURCE, and the
other where we believe there's datalink products in the table.

In the second case (the one that was broken before), we try the name access_format as attribute and column name, and then a utype and two UCDs
(generic and SIA1) each.  We could add SSAP UCDs, but I'll only do that
if someone is asking for it.
msdemlei added a commit to msdemlei/pyvo that referenced this issue Sep 25, 2024
To do that, we split the original iter_datalinks in the two cases; one
where the datalinks come from a table via a datalink meta RESOURCE, and the
other where we believe there's datalink products in the table.

In the second case (the one that was broken before), we try the name access_format as attribute and column name, and then a utype and two UCDs
(generic and SIA1) each.  We could add SSAP UCDs, but I'll only do that
if someone is asking for it.
andamian added a commit that referenced this issue Sep 26, 2024
@bsipocz bsipocz closed this as completed Sep 26, 2024
bsipocz pushed a commit that referenced this issue Oct 15, 2024
bsipocz pushed a commit that referenced this issue Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants