Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Make contrib.utils.download robust to archive file types #1697

Merged
merged 32 commits into from
Nov 17, 2021

Conversation

matthewfeickert
Copy link
Member

@matthewfeickert matthewfeickert commented Nov 12, 2021

Description

Checklist Before Requesting Reviewer

  • Tests are passing
  • "WIP" removed from the title of the pull request
  • Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

  • Summarize commit messages into a comprehensive review of the PR
* Refactor the internals of contrib.utils.download to be more robust and
handle uncompressed tarfile targets as well as zipfile targets.
* Mock HEPData responses to avoid load on HEPData API and avoid testing
problems due to HEPData outages. This generally makes the test suite more
robust.
* Add requests-mock as a dependency to the 'test' extra. requests-mock makes
mocking requests calls easy and allows for a pytest fixture.
* Add tests for contrib.utils.download Python API to test outside of script runner

@matthewfeickert matthewfeickert added the refactor A code change that neither fixes a bug nor adds a feature label Nov 12, 2021
@matthewfeickert matthewfeickert self-assigned this Nov 12, 2021
@matthewfeickert
Copy link
Member Author

matthewfeickert commented Nov 13, 2021

I've put some temporary files up at https://feickert.web.cern.ch/feickert/pyhf/tests/ to test this out and at this point the following work:

$ pyhf contrib download --force --verbose https://feickert.web.cern.ch/feickert/pyhf/tests/1Lbb-likelihoods.tar.gz /tmp/1Lbb-likelihoods
/tmp/1Lbb-likelihoods/patchset.json
/tmp/1Lbb-likelihoods/BkgOnly.json
/tmp/1Lbb-likelihoods/README.md
$ pyhf contrib download --force --verbose https://feickert.web.cern.ch/feickert/pyhf/tests/1Lbb-likelihoods.tar /tmp/1Lbb-likelihoods
/tmp/1Lbb-likelihoods/patchset.json
/tmp/1Lbb-likelihoods/BkgOnly.json
/tmp/1Lbb-likelihoods/README.md
$ pyhf contrib download --force --verbose https://feickert.web.cern.ch/feickert/pyhf/tests/1Lbb-likelihoods.zip /tmp/1Lbb-likelihoods
/tmp/1Lbb-likelihoods/patchset.json
/tmp/1Lbb-likelihoods/BkgOnly.json
/tmp/1Lbb-likelihoods/README.md

@codecov
Copy link

codecov bot commented Nov 13, 2021

Codecov Report

Merging #1697 (8098ac4) into master (e433109) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1697   +/-   ##
=======================================
  Coverage   98.10%   98.11%           
=======================================
  Files          64       64           
  Lines        4226     4246   +20     
  Branches      587      591    +4     
=======================================
+ Hits         4146     4166   +20     
  Misses         46       46           
  Partials       34       34           
Flag Coverage Δ
contrib 26.28% <100.00%> (+0.86%) ⬆️
doctest 61.02% <37.50%> (-0.18%) ⬇️
unittests 96.16% <50.00%> (-0.27%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/pyhf/contrib/utils.py 100.00% <100.00%> (ø)
src/pyhf/exceptions/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e433109...8098ac4. Read the comment docs.

@matthewfeickert matthewfeickert added the tests pytest label Nov 15, 2021
@matthewfeickert
Copy link
Member Author

matthewfeickert commented Nov 16, 2021

There's a Python 3.7 problem for pathlib that I'm not seeing in Python 3.9

                        child_path = [child for child in output_directory.iterdir()][0]
                        _tmp_path = output_directory.parent.joinpath(
                            Path(output_directory.name + "__tmp__")
                        )
>                       child_path.replace(_tmp_path).replace(output_directory)
E                       AttributeError: 'NoneType' object has no attribute 'replace'

I'll need to resolve this later, but I'm pretty sure this is because pathlib.Path.replace

Changed in version 3.8: Added return value, return the new Path instance.

so I think I can't chain these until we drop Python 3.7 support.

@matthewfeickert matthewfeickert force-pushed the refactor/contrib-download-internals branch from 3818e0b to 719bb24 Compare November 16, 2021 17:30
@matthewfeickert matthewfeickert changed the title refactor: Add support for zipfile to contrib.utils.download refactor: Make contrib.utils.download robust to archive file types Nov 16, 2021
@matthewfeickert matthewfeickert marked this pull request as ready for review November 16, 2021 19:48
@matthewfeickert
Copy link
Member Author

@GraemeWatt your review is welcome here as well. :)

@matthewfeickert
Copy link
Member Author

matthewfeickert commented Nov 16, 2021

Can confirm with regards to Issue #1519 that the following now works 👍

$ curl -sL https://www.hepdata.net/record/resource/1981552?view=true -o /tmp/SUSY-2018-14.zip
$ python -m zipfile --list /tmp/SUSY-2018-14.zip 
File Name                                             Modified             Size
likelihood/                                    2021-02-15 08:21:20            0
__MACOSX/._likelihood                          2021-02-15 08:21:20          312
likelihood/SRmm_left_bkgonly.json              2020-11-02 16:29:16         2023
__MACOSX/likelihood/._SRmm_left_bkgonly.json   2020-11-02 16:29:16          312
likelihood/Stauleft_patchset.json              2021-02-15 08:21:14       569930
likelihood/SRmm_patchset.json                  2021-02-15 08:20:34       476160
likelihood/SRmm_bkgonly.json                   2020-11-02 16:29:14         2023
__MACOSX/likelihood/._SRmm_bkgonly.json        2020-11-02 16:29:14          312
likelihood/SRee_bkgonly.json                   2020-11-02 16:29:08         2023
__MACOSX/likelihood/._SRee_bkgonly.json        2020-11-02 16:29:08          368
likelihood/SRmm_right_patchset.json            2021-02-15 08:20:54       469313
likelihood/Stau_patchset.json                  2021-02-15 08:21:04       568408
likelihood/README.md                           2020-12-17 11:02:54         1911
__MACOSX/likelihood/._README.md                2020-12-17 11:02:54          368
likelihood/SRee_patchset.json                  2021-02-15 08:20:24       392287
likelihood/SRee_left_bkgonly.json              2020-11-02 16:29:10         2023
__MACOSX/likelihood/._SRee_left_bkgonly.json   2020-11-02 16:29:10          312
likelihood/SRee_right_patchset.json            2021-02-15 08:20:14       395526
likelihood/Comb_patchset.json                  2021-02-15 08:19:46       563860
__MACOSX/likelihood/._Comb_patchset.json       2021-02-15 08:19:46          276
likelihood/Stauleft_bkgonly.json               2020-11-02 16:29:22         3516
__MACOSX/likelihood/._Stauleft_bkgonly.json    2020-11-02 16:29:22          312
likelihood/SRee_right_bkgonly.json             2020-11-02 16:29:12         2023
__MACOSX/likelihood/._SRee_right_bkgonly.json  2020-11-02 16:29:12          312
likelihood/Stau_bkgonly.json                   2020-11-02 16:29:20         3516
__MACOSX/likelihood/._Stau_bkgonly.json        2020-11-02 16:29:20          312
likelihood/SRmm_left_patchset.json             2021-02-15 08:20:44       478671
likelihood/SRmm_right_bkgonly.json             2020-11-02 16:29:18         2023
__MACOSX/likelihood/._SRmm_right_bkgonly.json  2020-11-02 16:29:18          312
likelihood/SRee_left_patchset.json             2021-02-15 08:20:04       396176
likelihood/Comb_bkgonly.json                   2020-11-02 16:29:06         3516
__MACOSX/likelihood/._Comb_bkgonly.json        2020-11-02 16:29:06          368
$ pyhf contrib download --verbose https://www.hepdata.net/record/resource/1981552?view=true /tmp/ins1831504
/tmp/ins1831504/SRmm_right_patchset.json
/tmp/ins1831504/Stauleft_patchset.json
/tmp/ins1831504/Comb_bkgonly.json
/tmp/ins1831504/SRmm_left_bkgonly.json
/tmp/ins1831504/SRee_right_patchset.json
/tmp/ins1831504/SRmm_bkgonly.json
/tmp/ins1831504/SRmm_left_patchset.json
/tmp/ins1831504/Stau_patchset.json
/tmp/ins1831504/Stauleft_bkgonly.json
/tmp/ins1831504/SRee_left_bkgonly.json
/tmp/ins1831504/SRee_patchset.json
/tmp/ins1831504/SRmm_right_bkgonly.json
/tmp/ins1831504/SRee_bkgonly.json
/tmp/ins1831504/SRee_right_bkgonly.json
/tmp/ins1831504/README.md
/tmp/ins1831504/Comb_patchset.json
/tmp/ins1831504/SRee_left_patchset.json
/tmp/ins1831504/Stau_bkgonly.json
/tmp/ins1831504/SRmm_patchset.json
$ curl -sL https://www.hepdata.net/record/resource/2568426?view=true -o ins-1827025.tar.gz
$ python -m tarfile --list ins-1827025.tar.gz 
Likelihoods/ 
Likelihoods/BDT-GGd1_bkgonly.json 
Likelihoods/SR2j-1600_bkgonly.json 
Likelihoods/SR6j-3400_bkgonly.json 
Likelihoods/MB-SSd_bkgonly.json 
Likelihoods/MB-GGd_bkgonly.json 
Likelihoods/MB-C_bkgonly.json 
Likelihoods/MB-SSd_patchset.json 
Likelihoods/SR4j-1000_bkgonly.json 
Likelihoods/SR6j-2200_bkgonly.json 
Likelihoods/BDT-GGo2_bkgonly.json 
Likelihoods/BDT-GGo4_bkgonly.json 
Likelihoods/BDT-GGo3_bkgonly.json 
Likelihoods/SR6j-1000_bkgonly.json 
Likelihoods/SR2j-2800_bkgonly.json 
Likelihoods/MB-C_patchset.json 
Likelihoods/SR4j-2200_bkgonly.json 
Likelihoods/BDT-GGo1_bkgonly.json 
Likelihoods/SR5j-1600_bkgonly.json 
Likelihoods/README.md 
Likelihoods/MB-GGd_patchset.json 
Likelihoods/SR4j-3400_bkgonly.json 
Likelihoods/BDT-GGd2_bkgonly.json 
Likelihoods/BDT-GGd4_bkgonly.json 
Likelihoods/BDT-GGd3_bkgonly.json 
Likelihoods/SR2j-2200_bkgonly.json
$ pyhf contrib download --verbose https://www.hepdata.net/record/resource/2568426?view=true /tmp/1827025
/tmp/1827025/Likelihoods
$ ls -lhtra /tmp/1827025/Likelihoods/
total 63M
-rw-r--r-- 1 feickert feickert 154K Jul 14 01:23 BDT-GGd1_bkgonly.json
-rw-r--r-- 1 feickert feickert 2.4M Jul 14 01:23 MB-C_bkgonly.json
-rw-r--r-- 1 feickert feickert 174K Jul 14 01:23 BDT-GGo4_bkgonly.json
-rw-r--r-- 1 feickert feickert 177K Jul 14 01:23 BDT-GGo3_bkgonly.json
-rw-r--r-- 1 feickert feickert 168K Jul 14 01:23 BDT-GGo2_bkgonly.json
-rw-r--r-- 1 feickert feickert 151K Jul 14 01:23 BDT-GGo1_bkgonly.json
-rw-r--r-- 1 feickert feickert 185K Jul 14 01:23 BDT-GGd4_bkgonly.json
-rw-r--r-- 1 feickert feickert 177K Jul 14 01:23 BDT-GGd3_bkgonly.json
-rw-r--r-- 1 feickert feickert 168K Jul 14 01:23 BDT-GGd2_bkgonly.json
-rw-r--r-- 1 feickert feickert 1.7M Jul 14 01:23 MB-GGd_bkgonly.json
-rw-r--r-- 1 feickert feickert  17M Jul 14 01:23 MB-C_patchset.json
-rw-r--r-- 1 feickert feickert  16M Jul 14 01:23 MB-GGd_patchset.json
-rw-r--r-- 1 feickert feickert 2.7M Jul 14 01:24 MB-SSd_bkgonly.json
-rw-r--r-- 1 feickert feickert 199K Jul 14 01:24 SR2j-1600_bkgonly.json
-rw-r--r-- 1 feickert feickert  21M Jul 14 01:24 MB-SSd_patchset.json
-rw-r--r-- 1 feickert feickert 197K Jul 14 01:24 SR5j-1600_bkgonly.json
-rw-r--r-- 1 feickert feickert 158K Jul 14 01:24 SR4j-3400_bkgonly.json
-rw-r--r-- 1 feickert feickert 182K Jul 14 01:24 SR4j-2200_bkgonly.json
-rw-r--r-- 1 feickert feickert 196K Jul 14 01:24 SR4j-1000_bkgonly.json
-rw-r--r-- 1 feickert feickert 175K Jul 14 01:24 SR2j-2800_bkgonly.json
-rw-r--r-- 1 feickert feickert 198K Jul 14 01:24 SR2j-2200_bkgonly.json
-rw-r--r-- 1 feickert feickert 117K Jul 14 01:24 SR6j-3400_bkgonly.json
-rw-r--r-- 1 feickert feickert 148K Jul 14 01:24 SR6j-2200_bkgonly.json
-rw-r--r-- 1 feickert feickert 161K Jul 14 01:24 SR6j-1000_bkgonly.json
-rw-r--r-- 1 feickert feickert 2.0K Jul 14 01:25 README.md
drwxr-xr-x 2 feickert feickert 4.0K Jul 14 01:25 .
drwxrwxr-x 3 feickert feickert 4.0K Nov 16 14:26 ..

@matthewfeickert
Copy link
Member Author

I'm going to merge this now, but if people have additional comments please either add them here after the fact or make a new Issue and I'll address them.

@matthewfeickert matthewfeickert merged commit d4b2d9c into master Nov 17, 2021
@matthewfeickert matthewfeickert deleted the refactor/contrib-download-internals branch November 17, 2021 15:21
@matthewfeickert matthewfeickert added the contrib Targeting pyhf.contrib and not the core of pyhf label Nov 17, 2021
matthewfeickert added a commit that referenced this pull request Nov 17, 2021
…1704)

* Add zip headers to the accepted content headers for contrib.utils.download
* Amends PR #1697

Co-authored-by: Graeme Watt <[email protected]>
@matthewfeickert matthewfeickert added the dependencies Pull requests that update a dependency file label Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contrib Targeting pyhf.contrib and not the core of pyhf dependencies Pull requests that update a dependency file refactor A code change that neither fixes a bug nor adds a feature tests pytest
Projects
None yet
4 participants