-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Make contrib.utils.download robust to archive file types #1697
Conversation
I've put some temporary files up at https://feickert.web.cern.ch/feickert/pyhf/tests/ to test this out and at this point the following work: $ pyhf contrib download --force --verbose https://feickert.web.cern.ch/feickert/pyhf/tests/1Lbb-likelihoods.tar.gz /tmp/1Lbb-likelihoods
/tmp/1Lbb-likelihoods/patchset.json
/tmp/1Lbb-likelihoods/BkgOnly.json
/tmp/1Lbb-likelihoods/README.md
$ pyhf contrib download --force --verbose https://feickert.web.cern.ch/feickert/pyhf/tests/1Lbb-likelihoods.tar /tmp/1Lbb-likelihoods
/tmp/1Lbb-likelihoods/patchset.json
/tmp/1Lbb-likelihoods/BkgOnly.json
/tmp/1Lbb-likelihoods/README.md
$ pyhf contrib download --force --verbose https://feickert.web.cern.ch/feickert/pyhf/tests/1Lbb-likelihoods.zip /tmp/1Lbb-likelihoods
/tmp/1Lbb-likelihoods/patchset.json
/tmp/1Lbb-likelihoods/BkgOnly.json
/tmp/1Lbb-likelihoods/README.md |
Codecov Report
@@ Coverage Diff @@
## master #1697 +/- ##
=======================================
Coverage 98.10% 98.11%
=======================================
Files 64 64
Lines 4226 4246 +20
Branches 587 591 +4
=======================================
+ Hits 4146 4166 +20
Misses 46 46
Partials 34 34
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There's a Python 3.7 problem for child_path = [child for child in output_directory.iterdir()][0]
_tmp_path = output_directory.parent.joinpath(
Path(output_directory.name + "__tmp__")
)
> child_path.replace(_tmp_path).replace(output_directory)
E AttributeError: 'NoneType' object has no attribute 'replace' I'll need to resolve this later, but I'm pretty sure this is because
so I think I can't chain these until we drop Python 3.7 support. |
Need to work out how to extract the files from the directory
… 3.9 tarfile.is_tarfile requires a serialized object prior to Python 3.9, so there is no way to use a BytesIO object in Python 3.7 and Python 3.8 and avoid pre-mature serialization. To get around this, just use a try except block to see if the file is valid as a tarfile or not.
3818e0b
to
719bb24
Compare
@GraemeWatt your review is welcome here as well. :) |
Can confirm with regards to Issue #1519 that the following now works 👍 $ curl -sL https://www.hepdata.net/record/resource/1981552?view=true -o /tmp/SUSY-2018-14.zip
$ python -m zipfile --list /tmp/SUSY-2018-14.zip
File Name Modified Size
likelihood/ 2021-02-15 08:21:20 0
__MACOSX/._likelihood 2021-02-15 08:21:20 312
likelihood/SRmm_left_bkgonly.json 2020-11-02 16:29:16 2023
__MACOSX/likelihood/._SRmm_left_bkgonly.json 2020-11-02 16:29:16 312
likelihood/Stauleft_patchset.json 2021-02-15 08:21:14 569930
likelihood/SRmm_patchset.json 2021-02-15 08:20:34 476160
likelihood/SRmm_bkgonly.json 2020-11-02 16:29:14 2023
__MACOSX/likelihood/._SRmm_bkgonly.json 2020-11-02 16:29:14 312
likelihood/SRee_bkgonly.json 2020-11-02 16:29:08 2023
__MACOSX/likelihood/._SRee_bkgonly.json 2020-11-02 16:29:08 368
likelihood/SRmm_right_patchset.json 2021-02-15 08:20:54 469313
likelihood/Stau_patchset.json 2021-02-15 08:21:04 568408
likelihood/README.md 2020-12-17 11:02:54 1911
__MACOSX/likelihood/._README.md 2020-12-17 11:02:54 368
likelihood/SRee_patchset.json 2021-02-15 08:20:24 392287
likelihood/SRee_left_bkgonly.json 2020-11-02 16:29:10 2023
__MACOSX/likelihood/._SRee_left_bkgonly.json 2020-11-02 16:29:10 312
likelihood/SRee_right_patchset.json 2021-02-15 08:20:14 395526
likelihood/Comb_patchset.json 2021-02-15 08:19:46 563860
__MACOSX/likelihood/._Comb_patchset.json 2021-02-15 08:19:46 276
likelihood/Stauleft_bkgonly.json 2020-11-02 16:29:22 3516
__MACOSX/likelihood/._Stauleft_bkgonly.json 2020-11-02 16:29:22 312
likelihood/SRee_right_bkgonly.json 2020-11-02 16:29:12 2023
__MACOSX/likelihood/._SRee_right_bkgonly.json 2020-11-02 16:29:12 312
likelihood/Stau_bkgonly.json 2020-11-02 16:29:20 3516
__MACOSX/likelihood/._Stau_bkgonly.json 2020-11-02 16:29:20 312
likelihood/SRmm_left_patchset.json 2021-02-15 08:20:44 478671
likelihood/SRmm_right_bkgonly.json 2020-11-02 16:29:18 2023
__MACOSX/likelihood/._SRmm_right_bkgonly.json 2020-11-02 16:29:18 312
likelihood/SRee_left_patchset.json 2021-02-15 08:20:04 396176
likelihood/Comb_bkgonly.json 2020-11-02 16:29:06 3516
__MACOSX/likelihood/._Comb_bkgonly.json 2020-11-02 16:29:06 368
$ pyhf contrib download --verbose https://www.hepdata.net/record/resource/1981552?view=true /tmp/ins1831504
/tmp/ins1831504/SRmm_right_patchset.json
/tmp/ins1831504/Stauleft_patchset.json
/tmp/ins1831504/Comb_bkgonly.json
/tmp/ins1831504/SRmm_left_bkgonly.json
/tmp/ins1831504/SRee_right_patchset.json
/tmp/ins1831504/SRmm_bkgonly.json
/tmp/ins1831504/SRmm_left_patchset.json
/tmp/ins1831504/Stau_patchset.json
/tmp/ins1831504/Stauleft_bkgonly.json
/tmp/ins1831504/SRee_left_bkgonly.json
/tmp/ins1831504/SRee_patchset.json
/tmp/ins1831504/SRmm_right_bkgonly.json
/tmp/ins1831504/SRee_bkgonly.json
/tmp/ins1831504/SRee_right_bkgonly.json
/tmp/ins1831504/README.md
/tmp/ins1831504/Comb_patchset.json
/tmp/ins1831504/SRee_left_patchset.json
/tmp/ins1831504/Stau_bkgonly.json
/tmp/ins1831504/SRmm_patchset.json
$ curl -sL https://www.hepdata.net/record/resource/2568426?view=true -o ins-1827025.tar.gz
$ python -m tarfile --list ins-1827025.tar.gz
Likelihoods/
Likelihoods/BDT-GGd1_bkgonly.json
Likelihoods/SR2j-1600_bkgonly.json
Likelihoods/SR6j-3400_bkgonly.json
Likelihoods/MB-SSd_bkgonly.json
Likelihoods/MB-GGd_bkgonly.json
Likelihoods/MB-C_bkgonly.json
Likelihoods/MB-SSd_patchset.json
Likelihoods/SR4j-1000_bkgonly.json
Likelihoods/SR6j-2200_bkgonly.json
Likelihoods/BDT-GGo2_bkgonly.json
Likelihoods/BDT-GGo4_bkgonly.json
Likelihoods/BDT-GGo3_bkgonly.json
Likelihoods/SR6j-1000_bkgonly.json
Likelihoods/SR2j-2800_bkgonly.json
Likelihoods/MB-C_patchset.json
Likelihoods/SR4j-2200_bkgonly.json
Likelihoods/BDT-GGo1_bkgonly.json
Likelihoods/SR5j-1600_bkgonly.json
Likelihoods/README.md
Likelihoods/MB-GGd_patchset.json
Likelihoods/SR4j-3400_bkgonly.json
Likelihoods/BDT-GGd2_bkgonly.json
Likelihoods/BDT-GGd4_bkgonly.json
Likelihoods/BDT-GGd3_bkgonly.json
Likelihoods/SR2j-2200_bkgonly.json
$ pyhf contrib download --verbose https://www.hepdata.net/record/resource/2568426?view=true /tmp/1827025
/tmp/1827025/Likelihoods
$ ls -lhtra /tmp/1827025/Likelihoods/
total 63M
-rw-r--r-- 1 feickert feickert 154K Jul 14 01:23 BDT-GGd1_bkgonly.json
-rw-r--r-- 1 feickert feickert 2.4M Jul 14 01:23 MB-C_bkgonly.json
-rw-r--r-- 1 feickert feickert 174K Jul 14 01:23 BDT-GGo4_bkgonly.json
-rw-r--r-- 1 feickert feickert 177K Jul 14 01:23 BDT-GGo3_bkgonly.json
-rw-r--r-- 1 feickert feickert 168K Jul 14 01:23 BDT-GGo2_bkgonly.json
-rw-r--r-- 1 feickert feickert 151K Jul 14 01:23 BDT-GGo1_bkgonly.json
-rw-r--r-- 1 feickert feickert 185K Jul 14 01:23 BDT-GGd4_bkgonly.json
-rw-r--r-- 1 feickert feickert 177K Jul 14 01:23 BDT-GGd3_bkgonly.json
-rw-r--r-- 1 feickert feickert 168K Jul 14 01:23 BDT-GGd2_bkgonly.json
-rw-r--r-- 1 feickert feickert 1.7M Jul 14 01:23 MB-GGd_bkgonly.json
-rw-r--r-- 1 feickert feickert 17M Jul 14 01:23 MB-C_patchset.json
-rw-r--r-- 1 feickert feickert 16M Jul 14 01:23 MB-GGd_patchset.json
-rw-r--r-- 1 feickert feickert 2.7M Jul 14 01:24 MB-SSd_bkgonly.json
-rw-r--r-- 1 feickert feickert 199K Jul 14 01:24 SR2j-1600_bkgonly.json
-rw-r--r-- 1 feickert feickert 21M Jul 14 01:24 MB-SSd_patchset.json
-rw-r--r-- 1 feickert feickert 197K Jul 14 01:24 SR5j-1600_bkgonly.json
-rw-r--r-- 1 feickert feickert 158K Jul 14 01:24 SR4j-3400_bkgonly.json
-rw-r--r-- 1 feickert feickert 182K Jul 14 01:24 SR4j-2200_bkgonly.json
-rw-r--r-- 1 feickert feickert 196K Jul 14 01:24 SR4j-1000_bkgonly.json
-rw-r--r-- 1 feickert feickert 175K Jul 14 01:24 SR2j-2800_bkgonly.json
-rw-r--r-- 1 feickert feickert 198K Jul 14 01:24 SR2j-2200_bkgonly.json
-rw-r--r-- 1 feickert feickert 117K Jul 14 01:24 SR6j-3400_bkgonly.json
-rw-r--r-- 1 feickert feickert 148K Jul 14 01:24 SR6j-2200_bkgonly.json
-rw-r--r-- 1 feickert feickert 161K Jul 14 01:24 SR6j-1000_bkgonly.json
-rw-r--r-- 1 feickert feickert 2.0K Jul 14 01:25 README.md
drwxr-xr-x 2 feickert feickert 4.0K Jul 14 01:25 .
drwxrwxr-x 3 feickert feickert 4.0K Nov 16 14:26 ..
|
I'm going to merge this now, but if people have additional comments please either add them here after the fact or make a new Issue and I'll address them. |
…1704) * Add zip headers to the accepted content headers for contrib.utils.download * Amends PR #1697 Co-authored-by: Graeme Watt <[email protected]>
Description
contrib.utils.download
internals #1672pyhf contrib download
be able to handle multiple compression types #1519Checklist Before Requesting Reviewer
Before Merging
For the PR Assignees: