Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application of POIs using formulas #850

Open
rafaellopesdesa opened this issue May 2, 2020 · 21 comments
Open

Application of POIs using formulas #850

rafaellopesdesa opened this issue May 2, 2020 · 21 comments
Labels
question Further information is requested user request Request coming form a pyhf user

Comments

@rafaellopesdesa
Copy link

Hi,

It seems that, right now, the modifiers are limited to addition of multiplication, with the POIs as normfactors being defined by simple modification POI*pdf. That's ok for most models, but some models require more complex likelihoods. The case that comes to my mind (need?) is EFT analyses which have an interference term and a squared term and, therefore, would need something like sqrt(POI)inter + POIsquare.

I don't know exactly what is the best way to implement that. I think that a simple way would be if one could define multiple POIs and if one could define the value of a modifier as a function of other modifiers. Even simple functions like polynomials would already encompass most of the case while keeping all operations very fast and efficient.

Let me know if there is anything I can do to help.

Thanks for considering.

@kratsg kratsg added the question Further information is requested label May 2, 2020
@lukasheinrich
Copy link
Contributor

lukasheinrich commented May 9, 2020

hi @rafaellopesdesa

apologies for the slow response.

the way to implement this is to add a new modifier. We could call it a formulafactor, with the rule being that it can only reuse existing parameters and does not introduce new ones

{
"name": "some_name",
"type": "formulafactor", 
"data": {
  "expression" : "sqrt(x)*y", dependencies: [{"x": "mu", "y": mu2}]
}
},
{
"name": "another_name",
"type": "formulafactor", 
"data": {
  "expression" : "x**2", dependencies: [{"x": "some_name"}]
}
}

I think in the constrained language of numexpr https://github.com/pydata/numexpr and the with this constraint that no new parameters are introduced this would be quite doable.

cc @alexander-held who has also raised this issue

@matthewfeickert matthewfeickert added the user request Request coming form a pyhf user label May 9, 2020
@rafaellopesdesa
Copy link
Author

Hi @lukasheinrich ,

Yes, that would be great for more complex likelihood models.

I will give it a try this weekend and, if I manage to implement your idea, will come back here to report.

Thanks again!

@lukasheinrich
Copy link
Contributor

I would suggest you orient yourself along the lines of the normfactor code

https://scikit-hep.org/pyhf/_modules/pyhf/modifiers/normfactor.html#normfactor

the output shape of a modifier is

(n_modifiers, n_global_samples, n_alphas, n_global_bin)

here's the meaning of the output shape

  • different "formulas" to be applied
  • the affect of that formula for each sample (if the sample doesn't declare/is unaffected by the formulafactor, it should just be multiplied by 1)
  • the batch size of parameters.. this is probably just 1
  • the global bin number (across all channels)

so if you need to compute the affect of 3 formulas and in total have 10 sample (which might or might not be affected by the formula(s) and want to evaluate the formula for 5 parameter settings and the total number of bins is 20 you would have

an output tensor of

(3,10,5,20)

@lukasheinrich
Copy link
Contributor

lukasheinrich commented May 27, 2020

Hi, I will likely take a stab at it since I need this for my analysis. (unless someone is already working on it?)

@alexander-held
Copy link
Member

Hi, just to add an example of what other libraries do: uproot4 can interpret cuts / selection requirements as python (https://github.com/scikit-hep/uproot4/blob/master/uproot4/language/python.py).

I imagine that in practical applications, a very limited amount of operators will be enough for most fits: addition, subtraction, multiplication, division. Taking a power is convenient, but I guess non-integer powers are rarely needed, so multiplication is fine. Square roots are convenient too, but could be worked around by redefining normfactor->sqrt(normfactor) and then taking the power instead.

@rafaellopesdesa
Copy link
Author

Ok, I am coming back here because I am starting a new project that could use this and this time I really want to give pyhf a try.

I would say that powers of the normfactor would already solve 99% of the cases I have in mind. The main application I have in my is cases with interference (like theta^2 * S + theta * I + B) [think EFT, for instance]

Maybe we could just have an option in normfactor that says the power which which the normfactor acts on the sample.

Would that be a simple way to implement it?

@balunas
Copy link

balunas commented Aug 6, 2021

To bring this issue up again, I see that @lukasheinrich added some "custom modifier" feature but I can't locate any documentation on how to use this to define my own modifier. Is this something users can currently interface with? We'd like to use something like this in ATLAS di-Higgs analyses, where our POI can be a coupling which has rather complicated interference effects and can't be described by any existing modifier. To give a concrete example, we want to implement something like

yield = B + f(k)*S1 + g(k)*S2 + h(k)*S3

where k is our POI, f, g, and h are functions we define, and S1, S2, S3 are different "basis" signal distributions. Is something like this already supported in some way, or is this part of the open items remaining on this issue?

@mswiatlo
Copy link

Hi @lukasheinrich , @alexander-held ,

I wanted to bring this up again as it's starting to become a limiting factor in our analysis, and it would be really nice to avoid having to convert everything back to Roofit. Is there any documentation/testing/etc. that still needs to be done here? I also have a student who could potentially contribute to this (development, testing, etc.) if that's the limiting factor. Please let us know-- it would be really great to get this last feature that would enable a pure pyhf fit structure for our analysis!

Best wishes,
Max

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Sep 21, 2021

hi @mswiatlo - let's prioritize this then.

is this the same type of formula that @balunas needs? and if so what do f(k),g(k),h(k) look like?

Importantly - is each bin modification computed only as a function of the parameters and the sample yields in that bin or are cross-bin relations important?

@mswiatlo
Copy link

Yes, we're working on the same analysis :)

@balunas
Copy link

balunas commented Sep 22, 2021

Hi @lukasheinrich -

The ideal scenario would be if we could implement f(k) etc. ourselves in python, but in practice we're dealing with polynomials here at the moment. That is, f(k) = a + bk + ck^2 + ... There's also a k-dependent overall normalization (which gets looked up from the theory calculation). In principle that can be handled with a separate normfactor which we fix ourselves, but it would be even better if that could be included in a user-defined f(k).

There are no cross-bin relations that come in, the only dependence is on the POI.

Looking further ahead, we'd also love to use this sort of functionality with multiple POIs. I know general support for that is still in the works, but the dream would be to have something like

yield = B + f(k1, k2, k3)*S1 + g(k1, k2, k3)*S2 + ...

@lukasheinrich
Copy link
Contributor

lukasheinrich commented Sep 22, 2021

I think if the general structure is a multiplicative modifier per-sample that only depends on the parameters

sample_yield = base_sample_yield * f(POIs, NPs) is definitely workable. The summing over samples would then happen as in the other modifiers

this would not cover yet e.g. custom functions that depend on the bin location or other bins e.g. f(mass) for a mass bump or so, but seems like a good, quite general middle ground?

@alexander-held
Copy link
Member

alexander-held commented Sep 22, 2021

yield = B + f(k1, k2, k3)*S1 + g(k1, k2, k3)*S2 + ...

As long as f and g are not too complicated (I assume everything that can be put in a TFormula may work), I guess this is exactly of the form that ROOT allows via AddPreprocessFunction. k1, k2 etc. are going to be normalization factors not acting on any sample, but they are free parameters in the fit. f, g are expressions that depend on those normalization factors, but are not a free parameter in the fit themselves (but they are attached to the samples they should scale). For example f could be something like

k1 + k2*3.5 + k3*k2*7.0

without any problem (in the existing ROOT implementation). I don't think that any sort of look-up works in that ROOT system from within that formula but could very well be wrong here. A look-up could also be approximated by some fit and then implemented via an appropriate polynomial or similar. I am not sure whether ROOT allows these expressions to depend on (constrained) nuisance parameters, I have only used them with normalization factors.

In case it may be helpful, here's an (ATLAS-internal) documentation page describing the use of this in TRExFiter: https://trexfitter-docs.web.cern.ch/trexfitter-docs/model_building/expression/.

@lukasheinrich
Copy link
Contributor

thanks @alexander-held - so I guess that's compatible wiith my previous reply? can you confirm?

@alexander-held
Copy link
Member

alexander-held commented Sep 22, 2021

@lukasheinrich I think so, yes. The ROOT version may be slightly less general as you have f(POIs, NPs) and I am not sure that you can do a dependency on NPs in ROOT (nor have I seen a use case for that yet). This should work for almost all cases I have come across so far.

@lukasheinrich
Copy link
Contributor

(not sure why github is not auto-linking this thread here: #1627)

@matthewfeickert
Copy link
Member

(not sure why github is not auto-linking this thread here: #1627)

It's pretty annoying, but GitHub won't show a visual cross-references between Issues and Discussions. No idea why. 😞

@lukasheinrich
Copy link
Contributor

hi all,

we're preparing our 0.7.0 release and with this we are making some progress on this front

check out the code here

image

https://gist.github.com/lukasheinrich/a7f23b71cf048727ca9012f5f6ea940a

@alexander-held has been testing this and an initial look make it seem like this would cover quite a bit of ground. It'd be interesting to get your opinion @balunas @rafaellopesdesa

@kratsg
Copy link
Contributor

kratsg commented Sep 8, 2022

This is migrated into a draft PR #1991. See tests/test_experimental.py for an example usage (basically what is currently in the cell above right now).

@amartone21
Copy link

amartone21 commented Aug 7, 2023

Hi,

I'm a master's student and for my thesis I'm working on the HWW off-shell analysis. In my case, I'd need to have modifiers which are function of the POI, something like mu - sqrt(mu). Can this be done via the above implementation?

@alexander-held
Copy link
Member

Yes this works, an example of that is also discussed in this issue: scikit-hep/cabinetry#382.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested user request Request coming form a pyhf user
Projects
None yet
Development

No branches or pull requests

8 participants