Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Allow coef and p-value extraction by variable name #491

Open
eirikbrandsaas opened this issue Aug 2, 2022 · 5 comments

Comments

@eirikbrandsaas
Copy link

Hi,

It would be great (and safer?) if one could extract coeffecients by variable names:

df = DataFrame(y=rand(3),x=rand(3))
out = reg(df,@formula(y~x))
out.coef[findfirst(isequal("x"),out.coefnames)] # hard to do
out.coef["x"] # would be great
out.coef[:x] # would be great 

Or really any thing like that.

See e.g., https://discourse.julialang.org/t/how-to-obtain-the-pvalues-of-the-coefficients-in-glm-jl/9531/4

@ararslan
Copy link
Member

ararslan commented Aug 2, 2022

I guess in addition to directly delegating coef to the model object in https://github.com/JuliaStats/StatsModels.jl/blob/61de82aa23fb562697fe0f750f6f83ca7be79506/src/statsmodel.jl#L128 we could define e.g.

coef(model, term) = coef(model)[findfirst(==(term), coefnames(model))]
model = fit(Whatever, @formula(y ~ 1 + x), data)
coef(model, :x)  # coefficient for `x`

That could only sensibly support table-based models though, since those are the only ones for which you know the coefficient names. (e.g. this wouldn't work for models fit with an explicit design matrix rather than a formula)

@xgdgsc
Copy link

xgdgsc commented Jun 14, 2023

This should be added or at least documented.

@andreasnoack
Copy link
Member

I think we should consider if we can do something here before releasing 2.0. The raw vectors without any context aren't that helpful

@ararslan
Copy link
Member

The raw vectors without any context aren't that helpful

How do you mean? Like coef(model) returning a Vector? If it didn't, I'd be concerned about possible performance regressions for downstream linear algebra computations that use the result of coef.

I guess it could be convenient for coef(::TableRegressionModel{<:GeneralizedLinearModel}) to return e.g. a 1-dimensional AxisArray and coef(::GeneralizedLinearModel) to return a Vector?

@andreasnoack
Copy link
Member

If it didn't, I'd be concerned about possible performance regressions for downstream linear algebra computations that use the result of coef.

Costless abstractions and all that. Hopefully we can have both. I just think it's really error prone not to have some kind of label associating an estimate in a vector with a parameter name or an effect name. That being said, I'm not sure what the right implementation would look like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants