Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panda-shortname documentation/clarification #10

Open
kratsg opened this issue Oct 31, 2018 · 2 comments
Open

panda-shortname documentation/clarification #10

kratsg opened this issue Oct 31, 2018 · 2 comments

Comments

@kratsg
Copy link
Contributor

kratsg commented Oct 31, 2018

I'm unclear on what panda-shortname does based on what it currently does:

./panda-shortname mc16_13TeV:mc16_13TeV.375933.MGPy8EG_A14N_GG_bbn1_1100_5000_1.deriv.DAOD_SUSY10.e6353_e5984_a875_r9364_r9315_p3404
mc16_13TeV.375933.MGPyGG_bbn1_1.d.SUSY10.e6353_e5984_a875_r9364_r9315_p3404

It seems to strip away some pieces, including the scope (which the documentation states) but i'm not sure the other pieces that got stripped away should have.

@dguest
Copy link
Owner

dguest commented Nov 1, 2018

Hi

Then general answer for any questions about panda-shortname is that it's something I thought would be useful but didn't really maintain (I generally just use sed for most scripts). I'm tempted to remove it if we're going to put this code into a release or something, but before yesterday no one so much as asked about it.

That being said, I think it could also be quite a useful command with some work. My thinking is that any name-shortening script should follow a few rules:

  • Don't change the ordering of . separated fields
    • this also means don't remove them. For final-stage private production it's fine to drop fields entirely, but as soon as you do that you loose the ability to parse names easily.
  • The result should be mostly human readable if possible. So the mcYEAR_Energy part is useful, the first part of the job option file name (MG* in your example) is useful, and the format is useful. But all of these can also be shortened a bit.
  • Also the tune, which is usually right after the first _ in the JO name, is almost always useless for all but a few people. This third field is in general stupidly long, I suspect just because it's created first (by whoever requests the MC) and people are about as good at estimating how long file names will become as they are at estimating how long projects will take.
  • the DSID has to stay, because we don't have a better unique identifier.
  • the tags have to stay, again because we need a unique identifier. But we should be using 4-tag datasets these days, so this is already getting shorter.

@dguest
Copy link
Owner

dguest commented Nov 1, 2018

But yeah, I think the third field is the tricky one, because unless PMG really cracks down on the naming of JO files this is going to be all over the place for the foreseeable future.

What usually works is splitting by _ and taking the first part, discarding the second, and keeping everything after. Apparently I'm not doing that doing exactly that here, it looks like instead I'm removing every all-caps or numbers string between the _. Anyway, suggestions as to how to do that smarter are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants