Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add API to download likelihood patchset archives #1046

Merged
merged 49 commits into from
Oct 10, 2020

Conversation

matthewfeickert
Copy link
Member

@matthewfeickert matthewfeickert commented Aug 22, 2020

Description

Resolves #1045

$ pyhf contrib download --help
Usage: pyhf contrib download [OPTIONS] [ARCHIVE_URL] [OUTPUT_DIRECTORY]

  Download the patchset archive from the remote URL and extract it in a
  directory at the path given.

  Example:

  .. code-block:: shell

      $ pyhf contrib download --verbose
      https://www.hepdata.net/record/resource/1408476?view=true 1Lbb-
      likelihoods

          1Lbb-likelihoods/patchset.json
          1Lbb-likelihoods/README.md
          1Lbb-likelihoods/BkgOnly.json

  Raises:     :class:`~pyhf.exceptions.InvalidArchiveHost`: if the provided
  archive host name is not known to be valid

Options:
  -v, --verbose   Enables verbose mode
  -f, --force     Force download from non-approved host
  -c, --compress  Keep the archive in a compressed tar.gz form
  -h, --help      Show this message and exit.

CLI API Example:

$ pyhf contrib download https://www.hepdata.net/record/resource/1408476?view=true 1Lbb-likelihoods
$ ls 1Lbb-likelihoods/
BkgOnly.json  patchset.json  README.md
$ rm -rf 1Lbb-likelihoods && pyhf contrib download -v https://www.hepdata.net/record/resource/1408476?view=true 1Lbb-likelihoods
1Lbb-likelihoods/patchset.json
1Lbb-likelihoods/README.md
1Lbb-likelihoods/BkgOnly.json
$ pyhf contrib download --compress https://www.hepdata.net/record/resource/1408476?view=true 1Lbb-likelihoods.tar.gz
$ file 1Lbb-likelihoods.tar.gz 
1Lbb-likelihoods.tar.gz: gzip compressed data, from Unix, original size modulo 2^32 6177792

ReadTheDocs build:

Checklist Before Requesting Reviewer

  • Tests are passing
  • "WIP" removed from the title of the pull request
  • Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

  • Summarize commit messages into a comprehensive review of the PR
* Add download command to Python contrib API
* Add download command to pyhf CLI API and add contrib group
* Add pyhf custom exception if an untrusted archive host is given
* Add tests of Python and CLI API versions of contrib download
   - Thanks to Anthony Sottile (@asottile) for advice: https://youtu.be/PXu3KCMT3l4
* Add requests as dependency of contrib extra
* Add Python contrib API to docs

@matthewfeickert matthewfeickert added feat/enhancement New feature or request CLI Affects the CLI API labels Aug 22, 2020
@matthewfeickert matthewfeickert self-assigned this Aug 22, 2020
@codecov
Copy link

codecov bot commented Aug 22, 2020

Codecov Report

Merging #1046 into master will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1046      +/-   ##
==========================================
+ Coverage   96.83%   96.87%   +0.04%     
==========================================
  Files          60       62       +2     
  Lines        3507     3556      +49     
  Branches      504      510       +6     
==========================================
+ Hits         3396     3445      +49     
  Misses         68       68              
  Partials       43       43              
Flag Coverage Δ
#unittests 96.87% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/pyhf/cli/__init__.py 100.00% <100.00%> (ø)
src/pyhf/cli/cli.py 100.00% <100.00%> (ø)
src/pyhf/cli/contrib.py 100.00% <100.00%> (ø)
src/pyhf/contrib/utils.py 100.00% <100.00%> (ø)
src/pyhf/exceptions/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75f2a87...b72e399. Read the comment docs.

@matthewfeickert matthewfeickert marked this pull request as ready for review August 22, 2020 08:29
@matthewfeickert
Copy link
Member Author

@kratsg Given your comment on Issue #1045 RE: implementing Issue #947 what are your thoughts on this PR?

It seems to me that having something like this PRs functionality of downloading a tarball and then extracting it into a directory is still in scope of pyhf utilities, but susyudio is another separate planned library(?) given that your example starts with import susyudio. Is that correct?

@matthewfeickert matthewfeickert force-pushed the feat/add-patchset-download-cli-api branch from cc53772 to 3b8fb74 Compare August 31, 2020 22:43
@matthewfeickert matthewfeickert force-pushed the feat/add-patchset-download-cli-api branch from 3b8fb74 to 8258466 Compare September 2, 2020 22:42
@matthewfeickert matthewfeickert force-pushed the feat/add-patchset-download-cli-api branch 2 times, most recently from 32e35be to f6f9ce8 Compare September 11, 2020 14:56
src/pyhf/cli/contrib.py Outdated Show resolved Hide resolved
src/pyhf/cli/contrib.py Outdated Show resolved Hide resolved
@matthewfeickert matthewfeickert force-pushed the feat/add-patchset-download-cli-api branch from 840b567 to 2317bab Compare September 13, 2020 07:42
Comment on lines 527 to 604
# hide
CACHE_MODULE, sys.modules[module_name] = sys.modules[module_name], None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only other thing to think about is if this should be made into a decorator.

Copy link
Contributor

@kratsg kratsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if pyhf contrib download was hidden away if it didn't import correctly?

additionally, there should be a non-script/CLI test to check the python API itself.

@matthewfeickert matthewfeickert marked this pull request as draft September 14, 2020 06:09
@matthewfeickert
Copy link
Member Author

additionally, there should be a non-script/CLI test to check the python API itself.

Hm. Maybe I misunderstand what you mean, but we don't have a public Python API for this, we only have the CLI API that's exposed.

@kratsg
Copy link
Contributor

kratsg commented Sep 15, 2020

Hm. Maybe I misunderstand what you mean, but we don't have a public Python API for this, we only have the CLI API that's exposed.

It would generally make sense that we don't provide a CLI for anything that does not have a python API (for consistency) no? Maybe we need the python API first.

@matthewfeickert
Copy link
Member Author

It would generally make sense that we don't provide a CLI for anything that does not have a python API (for consistency) no? Maybe we need the python API first.

I think I generally agree with that statement but if you're at the Python API do you really care about extracting files into directories rather than just reading them out from the in memory representation of the tarfile? I guess what you're thinking though is that if you wanted to script the download and unpacking of multiple likelihoods through the Python API you should be able to do that as well (correct?).

@kratsg
Copy link
Contributor

kratsg commented Sep 15, 2020

I think I generally agree with that statement but if you're at the Python API do you really care about extracting files into directories rather than just reading them out from the in memory representation of the tarfile? I guess what you're thinking though is that if you wanted to script the download and unpacking of multiple likelihoods through the Python API you should be able to do that as well (correct?).

Yes, or likely -- someone will want to parallelize and do a signal grid... But I'm realizing that this doesn't make sense given we only really ever have two files per grid anyway (bkgonly + patchset)...

@matthewfeickert matthewfeickert force-pushed the feat/add-patchset-download-cli-api branch from afc202c to 90f9e80 Compare September 17, 2020 05:50
@matthewfeickert matthewfeickert added the API Changes the public API label Sep 17, 2020
@matthewfeickert matthewfeickert force-pushed the feat/add-patchset-download-cli-api branch from fa735b0 to b72e399 Compare October 9, 2020 22:20
@lukasheinrich lukasheinrich merged commit 6ce4669 into master Oct 10, 2020
@lukasheinrich lukasheinrich deleted the feat/add-patchset-download-cli-api branch October 10, 2020 06:35
matthewfeickert added a commit that referenced this pull request Oct 13, 2020
* Remove contrib download defaults for arguments ARCHIVE_URL OUTPUT_DIRECTORY
* Amends PR #1046
kratsg pushed a commit that referenced this pull request Oct 16, 2020
* Migrate contrib download code from cli module to contrib module
   - Unify all of the contrib code in a single contrib module for easier factoring out and bookkeeping
* Use DOI URL for all contrib download docstring examples
* Amends PR #1046
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Changes the public API CLI Affects the CLI API docs Documentation related feat/enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CLI API to download likelihood patchset archives from HEPData
3 participants