panda-shortname documentation/clarification #10

kratsg · 2018-10-31T18:42:59Z

I'm unclear on what panda-shortname does based on what it currently does:

./panda-shortname mc16_13TeV:mc16_13TeV.375933.MGPy8EG_A14N_GG_bbn1_1100_5000_1.deriv.DAOD_SUSY10.e6353_e5984_a875_r9364_r9315_p3404
mc16_13TeV.375933.MGPyGG_bbn1_1.d.SUSY10.e6353_e5984_a875_r9364_r9315_p3404

It seems to strip away some pieces, including the scope (which the documentation states) but i'm not sure the other pieces that got stripped away should have.

The text was updated successfully, but these errors were encountered:

dguest · 2018-11-01T07:27:05Z

Hi

Then general answer for any questions about panda-shortname is that it's something I thought would be useful but didn't really maintain (I generally just use sed for most scripts). I'm tempted to remove it if we're going to put this code into a release or something, but before yesterday no one so much as asked about it.

That being said, I think it could also be quite a useful command with some work. My thinking is that any name-shortening script should follow a few rules:

Don't change the ordering of . separated fields
- this also means don't remove them. For final-stage private production it's fine to drop fields entirely, but as soon as you do that you loose the ability to parse names easily.
The result should be mostly human readable if possible. So the mcYEAR_Energy part is useful, the first part of the job option file name (MG* in your example) is useful, and the format is useful. But all of these can also be shortened a bit.
Also the tune, which is usually right after the first _ in the JO name, is almost always useless for all but a few people. This third field is in general stupidly long, I suspect just because it's created first (by whoever requests the MC) and people are about as good at estimating how long file names will become as they are at estimating how long projects will take.
the DSID has to stay, because we don't have a better unique identifier.
the tags have to stay, again because we need a unique identifier. But we should be using 4-tag datasets these days, so this is already getting shorter.

dguest · 2018-11-01T07:43:46Z

But yeah, I think the third field is the tricky one, because unless PMG really cracks down on the naming of JO files this is going to be all over the place for the foreseeable future.

What usually works is splitting by _ and taking the first part, discarding the second, and keeping everything after. Apparently I'm not doing that doing exactly that here, it looks like instead I'm removing every all-caps or numbers string between the _. Anyway, suggestions as to how to do that smarter are welcome.

dguest added the help wanted label Nov 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

panda-shortname documentation/clarification #10

panda-shortname documentation/clarification #10

kratsg commented Oct 31, 2018

dguest commented Nov 1, 2018 •

edited

Loading

dguest commented Nov 1, 2018

panda-shortname documentation/clarification #10

panda-shortname documentation/clarification #10

Comments

kratsg commented Oct 31, 2018

dguest commented Nov 1, 2018 • edited Loading

dguest commented Nov 1, 2018

dguest commented Nov 1, 2018 •

edited

Loading