-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maxrec in regsearch #375
Maxrec in regsearch #375
Conversation
This addresses bug astropy#373.
The tables in resource.get_tables() come in random column order (because they are fed from the results of relational database queries). That's arguably not ideal, but it is what we can do given what the registry has in terms of metadata, and it actually doesn't hurt as long as people are aware of it. This commit changes an example in the documentation to become reproducible. The example gets a bit uglier this way, but it's probably still on the good side of making examples runnable rather than didactic (where the two conflict, which they do in this case).
Codecov Report
@@ Coverage Diff @@
## main #375 +/- ##
==========================================
- Coverage 78.56% 78.55% -0.02%
==========================================
Files 47 47
Lines 5562 5562
==========================================
- Hits 4370 4369 -1
- Misses 1192 1193 +1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@@ -128,6 +131,12 @@ def search(*constraints:rtcons.Constraint, includeaux=False, **kwargs): | |||
This may result in duplicate capabilities being returned, | |||
especially if the servicetype is not specified. | |||
|
|||
maxrec : int | |||
Overrides the RegTAP server's default limit on the number of rows to | |||
return. You may need to use this if you want to retrieve more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing. The DALI standard states that The service may also enforce a limit on the value of MAXREC that is smaller than the value in the request.
so I don't think one can use this to override server's MAXREC
unless the requested value is smaller than the server's limit. So in general, requested MAXREC
is to limit the number of rows and not to expand.
On Wed, Oct 12, 2022 at 09:11:49AM -0700, Adrian wrote:
> @@ -128,6 +131,12 @@ def search(*constraints:rtcons.Constraint, includeaux=False, **kwargs):
This is a bit confusing. The DALI standard states that `The service
may also enforce a limit on the value of MAXREC that is smaller
than the value in the request.` so I don't think one can use this
to override server's `MAXREC`.
So... what technically happens is:
* Most servers will by default only return a rather limited (though
not by registry standards, which is small data for astronomy) number
of rows (the "default MAXREC")
* If you pass in a MAXREC yourself, the server will instead use that
as the limit of the number of returned rows
* There usually is a limit as to how far you can raise MAXREC (that
that is probably not relevant here); setting a higher MAXREC is not
an error, but the server will then use its hard limit.
In my docstring, I'm trying to convey as much of this as I hope is
necessary for RegTAP operations; if I'm unsuccessful in this: do you
have suggestions on how to improve the wording?
|
pyvo/registry/regtap.py
Outdated
Overrides the RegTAP server's default limit on the number of rows to | ||
return. You may need to use this if you want to retrieve more | ||
than a few thousand matches. Note that truncated search results | ||
are not reproducible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overrides the RegTAP server's default limit on the number of rows to | |
return. You may need to use this if you want to retrieve more | |
than a few thousand matches. Note that truncated search results | |
are not reproducible. | |
Overrides the RegTAP server's default limit on the number of rows to | |
return. You may need to use this if you want to retrieve more | |
than a few thousand matches. | |
The server may also have a hard limit that ``maxrec`` cannot override. | |
Please note that truncated search results are not reproducible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reading of the spec is that the server has the last word and might override the requested MAXREC
as I highlighted in my comment. The rest looks OK so I'll also approve it to fix the doctest.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving so we can go ahead and merge this. Please either add my suggestion or a similar sentence that clears up the hard limit override scenario (I don't feel strongly about how you phrase it, but agree with the review that it's not coming through from the original text).
On Sun, Oct 16, 2022 at 01:25:01PM -0700, Brigitta Sipőcz wrote:
```suggestion
Overrides the RegTAP server's default limit on the number of rows to
return. You may need to use this if you want to retrieve more
than a few thousand matches.
The server may also have a hard limit that ``maxrec`` cannot override.
Please note that truncated search results are not reproducible.
```
Fair enough (by the way, I'm almost always grateful for commits into
my PRs...).
I've added the sentence here -- but we should probably think about a
common place where maxrec behaviour is discussed at some point. It
works the same way in TAP, SIAv2, Datalink and future DALI-compliant
protocols (in Datalink, there's a bit of a twist). There already is
some discussion in the TAP section,
https://pyvo.readthedocs.io/en/latest/dal/index.html#table-access-protocol;
perhaps that should be extracted into a top-level section "On maxrec
in various protocols" that we can then link to for all the various
maxrec docstrings?
|
OK, I'm going ahead with the merge and rebasing the other PRs. Thanks @msdemlei! As for explaining the maxrec situation in the narrative docs, it all sounds good, but I strongly think that it should be repeated in each and every docstring, too. |
I would say this is fixing a bug (#389), so if we do a bugfix 1.4.1, it can be shipped with it. |
registry.regsearch now accepts an optional maxrec argument
This addresses bug #373. It may change results if people actually retrieved humungous amounts (by registry standards) of records (more than 20000 with the default server). I'd say: Whoever does this kind of thing deserves a heads-up that they should be doing it differently. Alternatively, they can pass in maxrec=10000000 manually.