Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gamma models convergence checks and run time #94

Open
hikea opened this issue Aug 7, 2017 · 23 comments
Open

Gamma models convergence checks and run time #94

hikea opened this issue Aug 7, 2017 · 23 comments

Comments

@hikea
Copy link

hikea commented Aug 7, 2017

I am trying to fit a Gamma model to a reaction time data (there is a recent paper that argues for not transforming RTs and using GLMM with Gamma or Inverse Gaussian --> http://journal.frontiersin.org/article/10.3389/fpsyg.2015.01171/full )

I have a few question about running GLMMs in Julia, but before asking them I will go over a reproducible example using a known dataset (my dataset is large and takes too long to run). I am new to GLMMs and Julia, so I might have done something wrong. Let me know if that's the case.

Using Baayen's data from langugaeR package in R, I ran 4 models, 3 out of which do not make sense for this data. There are number of variables, but I will focus on the native language (English and other) and word class (Animal and Plant). Also, I've converted RTs to raw RTs, as in the dataset they are log-ed.

using DataArrays, DataFrames , MixedModels, GLM, RDatasets, RData
#Baayen's data from langugaeR package in R saved as .csv from R
lexdec=readtable("lexdec.csv", makefactors=true)

lexdec_1 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class*NativeLanguage|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()))
   ERROR: MethodError: no method matching getindex(::DataFrames.DataFrame, ::Expr)

lexdec_2 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()))

lexdec_3 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+NativeLanguage|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()))

lexdec_4 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()))

Lexdec_1 did not run. I suppose the interaction term is not supported in the random effects. Lexdec_2, 3, 4 ran without a problem. However, Lexdec_2 and 3 do not make sense for this data as random slopes for native language are inappropriate here (each person has only one language background, so there's no data to estimate how that person would behave if they had the other language background). Only Lexdec_4 is a valid model here. Nevertheless, all models except lexdec_1 ran just fine. I also ran the models using lmm in Julia with identical results on convergence.

For comparison I ran Gamma models in R using lme4 and glmmTMB. glmer from lme4 sucsesfully ran all 4 models (including the interaction model). glmmTMB gave convergence warnings on Lexdec_1, 2, and 4, while no warning on Lexdec_3.

I observed different results when running lmm models in R using lme4 and glmmTMB. lmer from lme4 gave warnings on Lexdec_1 and 3, while glmmTMB gave warnings on Lexdec_1, 2, and 3 (was glmmTMB able to catch the inappropriate models?)

My questions:

  1. Are there any convergence checks? I can run any model with any parameters without getting any convergence warnings.
  2. Are interactions of random effects supported?
  3. What can I do to decrees runtime? My dataset has ~20k observations and there are 2 within factors with 3 levels each and one between factor with 2 levels.
  4. Are there any plans to implement support for Inverse Gaussian? I ran Inverse Gaussian models using lme4 and glmmTMB with identical results, so I guess there is not much of difference from RT data.

I am running arch linux 4.12.3-1-ARCH with openblas-lapack installed from AUR.

julia> versioninfo()
Julia Version 0.6.0
Commit 903644385b* (2017-06-19 13:05 UTC)
Platform Info:
 OS: Linux (x86_64-pc-linux-gnu)
 CPU: Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
 WORD_SIZE: 64
 BLAS: libopenblas (HASWELL)
 LAPACK: liblapack
 LIBM: libm
 LLVM: libLLVM-4.0.0 (ORCJIT, haswell)

 julia> Pkg.status()
  - MixedModels 0.10.0
  - GLM 0.7.0
@dmbates
Copy link
Collaborator

dmbates commented Aug 8, 2017

Thank you for opening what is indeed an interesting issue. I can see that the results from lme4 and from this package do not coincide. What I don't know is which one, if either, is correct.

It would probably help if @bbolker could take a look at this too.

Just to make sure that we are on the same page, when I copy the lexdec data from the languageR package, I don't have an rt_raw column.

julia> using RCall

julia> lexdec = rcopy(R"languageR::lexdec");

julia> dump(lexdec)
DataFrames.DataFrame  1659 observations of 28 variables
  Subject: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["A1", "A1", "A1", "A1"]
  RT: DataArrays.DataArray{Float64,1}(1659) [6.34036, 6.3081, 6.34914, 6.18621]
  Trial: DataArrays.DataArray{Int32,1}(1659) Int32[23, 27, 29, 30]
  Sex: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["F", "F", "F", "F"]
  NativeLanguage: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["English", "English", "English", "English"]
  Correct: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["correct", "correct", "correct", "correct"]
  PrevType: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["word", "nonword", "nonword", "word"]
  PrevCorrect: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["correct", "correct", "correct", "correct"]
  Word: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["owl", "mole", "cherry", "pear"]
  Frequency: DataArrays.DataArray{Float64,1}(1659) [4.85981, 4.60517, 4.99721, 4.72739]
  FamilySize: DataArrays.DataArray{Float64,1}(1659) [1.38629, 1.09861, 0.693147, 0.0]
  SynsetCount: DataArrays.DataArray{Float64,1}(1659) [0.693147, 1.94591, 1.60944, 1.09861]
  Length: DataArrays.DataArray{Int32,1}(1659) Int32[3, 4, 6, 4]
  Class: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["animal", "animal", "plant", "plant"]
  FreqSingular: DataArrays.DataArray{Int32,1}(1659) Int32[54, 69, 83, 44]
  FreqPlural: DataArrays.DataArray{Int32,1}(1659) Int32[74, 30, 49, 68]
  DerivEntropy: DataArrays.DataArray{Float64,1}(1659) [0.7912, 0.6968, 0.4754, 0.0]
  Complex: DataArrays.PooledDataArray{String,UInt8,1}(1659) String["simplex", "simplex", "simplex", "simplex"]
  rInfl: DataArrays.DataArray{Float64,1}(1659) [-0.310155, 0.814508, 0.518794, -0.427444]
  meanRT: DataArrays.DataArray{Float64,1}(1659) [6.3582, 6.415, 6.3426, 6.3353]
  SubjFreq: DataArrays.DataArray{Float64,1}(1659) [3.12, 2.4, 3.88, 4.52]
  meanSize: DataArrays.DataArray{Float64,1}(1659) [3.4758, 2.9999, 1.6278, 1.9908]
  meanWeight: DataArrays.DataArray{Float64,1}(1659) [3.1806, 2.6112, 1.2081, 1.6114]
  BNCw: DataArrays.DataArray{Float64,1}(1659) [12.0571, 5.73881, 5.71652, 2.05037]
  BNCc: DataArrays.DataArray{Float64,1}(1659) [0.0, 4.06225, 3.2498, 1.46241]
  BNCd: DataArrays.DataArray{Float64,1}(1659) [6.1756, 2.85028, 12.5887, 7.36322]
  BNCcRatio: DataArrays.DataArray{Float64,1}(1659) [0.0, 0.707856, 0.568493, 0.713242]
  BNCdRatio: DataArrays.DataArray{Float64,1}(1659) [0.512198, 0.496667, 2.20217, 3.59117]

julia> R"""packageVersion("languageR")"""
RCall.RObject{RCall.VecSxp}
[1] ‘1.4.1

I am going to work on the assumption that the RT column here is what you have as rt_raw. If I try to fit a GLM I get similar results in R

julia> R"""summary(glm(RT ~ 1 + Class * NativeLanguage, family=Gamma(link="identity"), data=languageR::lexdec))"""
RCall.RObject{RCall.VecSxp}

Call:
glm(formula = RT ~ 1 + Class * NativeLanguage, family = Gamma(link = "identity"),
    data = languageR::lexdec)

Deviance Residuals:
      Min         1Q     Median         3Q        Max
-0.091631  -0.023924  -0.005263   0.017681   0.165656

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)
(Intercept)                     6.313149   0.009814 643.297   <2e-16 ***
Classplant                      0.011648   0.014759   0.789   0.4301
NativeLanguageOther             0.174059   0.015228  11.430   <2e-16 ***
Classplant:NativeLanguageOther -0.041166   0.022855  -1.801   0.0719 .
---
Signif. codes:  0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Gamma family taken to be 0.001275883)

    Null deviance: 2.3196  on 1658  degrees of freedom
Residual deviance: 2.0737  on 1655  degrees of freedom
AIC: -222.26

Number of Fisher Scoring iterations: 3

and using the GLM package in Julia

julia> using DataFrames, GLM, MixedModels

julia> glm(@formula(RT ~ 1 + Class * NativeLanguage), lexdec, Gamma(), GLM.IdentityLink())
DataFrames.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Gamma{Float64},GLM.IdentityLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: RT ~ 1 + Class + NativeLanguage + Class & NativeLanguage

Coefficients:
                                        Estimate  Std.Error  z value Pr(>|z|)
(Intercept)                              6.31315 0.00981374  643.297   <1e-99
Class: plant                           0.0116479  0.0147591 0.789201   0.4300
NativeLanguage: Other                   0.174059  0.0152283    11.43   <1e-29
Class: plant & NativeLanguage: Other  -0.0411659  0.0228546 -1.80121   0.0717

julia> deviance(ans)   # I should change the show method for GeneralizedLinearModel to include this
2.0737368935267377

So far, so good. I'm not quite sure why a identity link would be used here. The canonical link for the Gamma() distribution is InverseLink() which doesn't really change that much except for the values of the coefficients, which are now on the inverse scale.

julia> glm(@formula(RT ~ 1 + Class * NativeLanguage), lexdec, Gamma())
DataFrames.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Gamma{Float64},GLM.InverseLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: RT ~ 1 + Class + NativeLanguage + Class & NativeLanguage

Coefficients:
                                          Estimate   Std.Error   z value Pr(>|z|)
(Intercept)                                 0.1584 0.000246231   643.298   <1e-99
Class: plant                          -0.000291713 0.000369552 -0.789368   0.4299
NativeLanguage: Other                  -0.00425004  0.00037039  -11.4745   <1e-29
Class: plant & NativeLanguage: Other   0.000996327 0.000557006   1.78872   0.0737


julia> deviance(ans)
2.0737368935267413

The deviance is essentially the same as before which just indicates that the fitted values are over a very small range so the identity or inverse link give essentially the same predictions.

In fact, we can simplify this model by eliminating the Class factor and its interaction with NativeLanguage.

julia> gm3 = glm(@formula(RT ~ 1 + NativeLanguage), lexdec, Gamma())
DataFrames.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Gamma{Float64},GLM.InverseLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: RT ~ 1 + NativeLanguage

Coefficients:
                          Estimate   Std.Error  z value Pr(>|z|)
(Intercept)                0.15827 0.000183705  861.547   <1e-99
NativeLanguage: Other  -0.00380929 0.000276774 -13.7632   <1e-42


julia> deviance(gm3)
2.078180142969401

If we add Word as a fixed-effects term there isn't much of a difference in the deviance.

julia> gm4 = glm(@formula(RT ~ 1 + NativeLanguage + Word), lexdec, Gamma())
DataFrames.DataFrameRegressionModel{GLM.GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Distributions.Gamma{Float64},GLM.InverseLink},GLM.DensePredChol{Float64,Base.LinAlg.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

Formula: RT ~ 1 + NativeLanguage + Word

Coefficients:
                           Estimate   Std.Error    z value Pr(>|z|)
(Intercept)                0.157104  0.00115407     136.13   <1e-99
NativeLanguage: Other   -0.00380861 0.000262157    -14.528   <1e-47
Word: ant                0.00269966  0.00163808    1.64807   0.0993
Word: apple              0.00467417  0.00164851    2.83539   0.0046
Word: apricot            0.00169193  0.00163277    1.03623   0.3001
Word: asparagus         -0.00116379  0.00161784  -0.719348   0.4719
Word: avocado           0.000723696   0.0016277   0.444614   0.6566
Word: banana             0.00283276  0.00163878    1.72858   0.0839
Word: bat                0.00308949  0.00164013    1.88368   0.0596
Word: beaver            0.000316319  0.00162556   0.194591   0.8457
Word: bee                0.00333687  0.00164144    2.03289   0.0421
Word: beetroot          -0.00144623  0.00161637  -0.894739   0.3709
Word: blackberry        0.000899666  0.00162862   0.552411   0.5807
Word: blueberry         0.000345391  0.00162571   0.212455   0.8318
Word: broccoli          0.000591975  0.00162701   0.363843   0.7160
Word: bunny             0.000577392  0.00162693   0.354897   0.7227
Word: butterfly         0.000649564  0.00162731   0.399165   0.6898
Word: camel              0.00114343  0.00162989   0.701534   0.4830
Word: carrot             0.00203069  0.00163455    1.24235   0.2141
Word: cat                0.00410297  0.00164549    2.49346   0.0127
Word: cherry             0.00221536  0.00163553    1.35452   0.1756
Word: chicken            0.00369529  0.00164333    2.24866   0.0245
Word: clove              0.00012808  0.00162458  0.0788388   0.9372
Word: crocodile        -0.000749636  0.00161999   -0.46274   0.6436
Word: cucumber         -0.000153309  0.00162311 -0.0944541   0.9247
Word: dog                 0.0033922  0.00164173    2.06623   0.0388
Word: dolphin            0.00155248  0.00163204   0.951249   0.3415
Word: donkey             0.00273681  0.00163827    1.67054   0.0948
Word: eagle              0.00155643  0.00163206   0.953659   0.3403
Word: eggplant           0.00128519  0.00163064   0.788151   0.4306
Word: elephant          0.000913973  0.00162869    0.56117   0.5747
Word: fox                0.00247122  0.00163687    1.50972   0.1311
Word: frog                0.0024406  0.00163671    1.49116   0.1359
Word: gherkin            -0.0029535  0.00160855   -1.83612   0.0663
Word: goat                0.0039133  0.00164448    2.37965   0.0173
Word: goose              0.00231002  0.00163602    1.41197   0.1580
Word: grape               0.0031909  0.00164067    1.94488   0.0518
Word: gull             -0.000653618   0.0016205  -0.403344   0.6867
Word: hedgehog          -0.00321096  0.00160722   -1.99784   0.0457
Word: horse              0.00236936  0.00163634    1.44797   0.1476
Word: kiwi               0.00284684  0.00163885    1.73709   0.0824
Word: leek             -0.000567066  0.00162095  -0.349836   0.7265
Word: lemon              0.00406159  0.00164527    2.46865   0.0136
Word: lettuce            0.00140547  0.00163127   0.861579   0.3889
Word: lion               0.00324603  0.00164096    1.97813   0.0479
Word: magpie            -0.00283215  0.00160918      -1.76   0.0784
Word: melon               0.0015061   0.0016318   0.922966   0.3560
Word: mole              0.000435963  0.00162619   0.268088   0.7886
Word: monkey              0.0030919  0.00164015    1.88514   0.0594
Word: moose              0.00161353  0.00163236   0.988461   0.3229
Word: mouse              0.00278345  0.00163852    1.69876   0.0894
Word: mushroom           0.00236613  0.00163632    1.44601   0.1482
Word: mustard           0.000370361  0.00162585   0.227796   0.8198
Word: olive              0.00340746  0.00164181    2.07543   0.0379
Word: orange             0.00366038  0.00164315    2.22767   0.0259
Word: owl                0.00182726  0.00163349    1.11863   0.2633
Word: paprika           -0.00172592  0.00161492   -1.06873   0.2852
Word: peanut              0.0032834  0.00164116    2.00066   0.0454
Word: pear               0.00239641  0.00163648    1.46437   0.1431
Word: pig                 0.0043727  0.00164692    2.65508   0.0079
Word: pineapple          0.00074701  0.00162782   0.458902   0.6463
Word: potato             0.00178238  0.00163325    1.09131   0.2751
Word: radish           -0.000350782  0.00162208  -0.216255   0.8288
Word: reindeer          -0.00230666   0.0016119   -1.43102   0.1524
Word: shark              0.00282996  0.00163876    1.72689   0.0842
Word: sheep              0.00331967  0.00164135    2.02252   0.0431
Word: snake              0.00229796  0.00163596    1.40465   0.1601
Word: spider             0.00227782  0.00163586    1.39243   0.1638
Word: squid             0.000583408  0.00162696   0.358587   0.7199
Word: squirrel          -0.00137398  0.00161675   -0.84984   0.3954
Word: stork             -0.00308618  0.00160787   -1.91942   0.0549
Word: strawberry         0.00209358  0.00163489    1.28056   0.2003
Word: swan               0.00133668  0.00163091   0.819593   0.4124
Word: tomato             0.00222563  0.00163558    1.36076   0.1736
Word: tortoise          -0.00308983  0.00160785   -1.92172   0.0546
Word: vulture           -0.00507203  0.00159763   -3.17472   0.0015
Word: walnut            -7.66396e-5  0.00162351 -0.0472061   0.9623
Word: wasp              -0.00229149  0.00161198   -1.42154   0.1552
Word: whale              0.00191123  0.00163393    1.16972   0.2421
Word: woodpecker       -0.000608575  0.00162073  -0.375494   0.7073


julia> deviance(gm4)
1.785921568969029

so it is not really surprising that a GLMM with just a random intercept for Word ends up with a estimated standard deviation of zero.

julia> gmm1 = fit!(glmm(@formula(RT ~ 1 + NativeLanguage + (1|Word)), lexdec, Gamma()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: RT ~ 1 + NativeLanguage + (1 | Word)
  Distribution: Distributions.Gamma{Float64}
  Link: GLM.InverseLink()

  Deviance (Laplace approximation): 2.0782

Variance components:
          Column   Variance Std.Dev.
 Word (Intercept)         0        0

 Number of obs: 1659; levels of grouping factors: 79

Fixed-effects parameters:
                          Estimate  Std.Error   z value P(>|z|)
(Intercept)                0.15827 0.00514038   30.7896  <1e-99
NativeLanguage: Other  -0.00380929 0.00774463 -0.491862  0.6228

This should correspond exactly to the fixed-effects GLM but it doesn't. The standard errors of the coefficients are very different, which I think has to do with estimation of the dispersion parameter.

However, glmer produces very different results.

julia> R"summary(glmer(RT ~ 1 + NativeLanguage + (1|Word), data=languageR::lexdec, family=Gamma(), nAGQ=1))"
WARNING: RCall.jl: Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.00106101 (tol = 0.001, component 1)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
RCall.RObject{RCall.VecSxp}
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: Gamma  ( inverse )
Formula: RT ~ 1 + NativeLanguage + (1 | Word)
   Data: languageR::lexdec

     AIC      BIC   logLik deviance df.resid
  -355.6   -333.9    181.8   -363.6     1655

Scaled residuals:
    Min      1Q  Median      3Q     Max
-2.7398 -0.6618 -0.1312  0.5145  4.7924

Random effects:
 Groups   Name        Variance  Std.Dev.
 Word     (Intercept) 2.143e-06 0.001464
 Residual             1.162e-03 0.034082
Number of obs: 1659, groups:  Word, 79

Fixed effects:
                      Estimate Std. Error t value Pr(>|z|)
(Intercept)          0.1583137  0.0003481   454.9   <2e-16 ***
NativeLanguageOther -0.0038086  0.0002571   -14.8   <2e-16 ***
---
Signif. codes:  0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
            (Intr)
NtvLnggOthr -0.325
convergence code: 0
Model failed to converge with max|grad| = 0.00106101 (tol = 0.001, component 1)
Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?

I'm not sure what to make of this. Do you have any ideas, Ben?

@bbolker
Copy link

bbolker commented Aug 8, 2017

I will try to dig into this tomorrow. I agree that an identity link seems weird (and potentially could make things harder, numerically); I would generally prefer the log link for Gamma as being the most numerically stable, but the authors of the linked paper do make some theoretical and practical arguments for an identity link.

I might try the allFit extension to see how it behaves across optimizers.

The inverse Gaussian family probably wouldn't be very hard to implement.

Doug, you said

I am going to work on the assumption that the RT column here is what you have as rt_raw

but the OP said

I've converted RTs to raw RTs

and the ?languageR::lexdec says

‘RT’ a numeric vector for logarithmically transformed reaction times.

so you might be working with the wrong response variable ...

@dmbates
Copy link
Collaborator

dmbates commented Aug 8, 2017

Thanks for the quick response, Ben.

I have a vague recollection of discussion related to the dispersion parameter for the Gamma family and how its value influences the evaluation of the log-likelihood or deviance. Does this ring a bell at all?

@hikea
Copy link
Author

hikea commented Aug 8, 2017

Thank you very much for a quick reply!

Regarding raw_rt, I did convert lexdec RTs back to milliseconds, so all my examples are on raw RTs. The entire question discussed in the paper has to do with running the analysis on untransformed RTs instead of log or using identity=log. However, since RTs are not normally distributed, the authors recommend using Gamma or Inverse Gaussian with identity link (or inverse RTs, i.e. -1000/RT). This all has theoretical basis in assumptions about mental chronometry and the additive nature of different effects on the resulting response times measured in psycholinguistic experiments.

As I was running these comparisons in R and Julia. I've noticed that lme4, glmmTMB, and glmm in Julia give different results not only with glmm, but also with lmm. For instance,

 fit!(lmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec))

will converge just fine in Julia and in R with lme4, but not with glmmTMB. The latter will give a convergence error and NA for AIC, BIC, logLik, and deviance. However, fixed parameter estimates and variance components seem to be the same.

@bbolker
Copy link

bbolker commented Aug 8, 2017

FWIW glmmTMB refuses to report a log-likelihood (or associated statistics) if the Hessian is not positive definite. You can still retrieve the negative log-likelihood with object$fit$objective ...

@dmbates
Copy link
Collaborator

dmbates commented Aug 8, 2017

@hikea You write "Regarding raw_rt, I did convert lexdec RTs back to milliseconds, so all my examples are on raw RTs." I'm not sure what that means. Can you be more specific, perhaps giving a formula for rt_raw in terms of the RT column in the data set?

One thing about the RT response is that its range is comparatively small,

julia> extrema(lexdec[:RT])
(5.828946, 7.587311)

which means that there will not be much difference between the inverse link and the identity link. Also, if you look at a density plot for one of the combinations of levels of the fixed-effects parameters, e.g. Class .== "animal" and NativeLanguage .== "English" the density of the RT values doesn't look much like a Gamma to me.

Perhaps it would be better to consider another example.

If I fit the model you indicate to the RT response I get a singular fit.

julia> lmm1 = fit!(lmm(@formula(RT ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec))
Linear mixed model fit by maximum likelihood
 Formula: RT ~ 1 + Class * NativeLanguage + ((1 + Class + NativeLanguage) | Subject) + (1 | Word)
   logLik   -2 logLik     AIC        BIC
  460.29075 -920.58151 -896.58151 -831.61386

Variance components:
                   Column           Variance     Std.Dev.    Corr.
 Word     (Intercept)            0.00589016370 0.076747402
 Subject  (Intercept)            0.00791110551 0.088944396
          Class: plant           0.00009214977 0.009599467  1.00
          NativeLanguage: Other  0.00503482335 0.070956489  1.00  1.00
 Residual                        0.02970852476 0.172361610
 Number of obs: 1659; levels of grouping factors: 79, 21

  Fixed-effects parameters:
                                        Estimate Std.Error  z value P(>|z|)
(Intercept)                              6.31315 0.0291443  216.617  <1e-99
Class: plant                           0.0116479 0.0209007 0.557299  0.5773
NativeLanguage: Other                   0.174059 0.0602617  2.88839  0.0039
Class: plant & NativeLanguage: Other  -0.0411659 0.0177272 -2.32219  0.0202
julia> getθ(lmm1)
7-element Array{Float64,1}:
 0.44527
 0.516034
 0.0556938
 0.411672
 0.0
 7.63226e-5
 0.0

julia> getΛ(lmm1)[2]
3×3 Array{Float64,2}:
 0.516034   0.0         0.0
 0.0556938  0.0         0.0
 0.411672   7.63226e-5  0.0

The 3 by 3 covariance matrix for the Subject random-effects term is of rank 1.

@bbolker As this model converges on the boundary the Hessian is not required to be positive definite. Do you know if glmmTMB takes into account that a constrained optimization problem can converge to a minimum without the Hessian being positive definite?

@hikea
Copy link
Author

hikea commented Aug 8, 2017

I did the conversion like so in R (don't know how to do it in Julia so far)

lexdec$rt_raw <- exp(lexdec$RT)

These are the density curves when I plot rt_raw, which is in milliseconds.

ggplot(lexdec, aes(x=rt_raw)) +  geom_density() + facet_grid (NativeLanguage ~ Class, scale='free_x')

image

If you plot log transformed RTs (lexdec$RT), you get more "normal" picture, of curse.

One of my concerns above is that running with random by-subjects slopes for NativeLanguage should lead to non-convergence or a warning of some kind. glmmTMB did give convergence warnings on these models, but only with fits with Gaussian distribution, not Gamma. Julia on the other hand has never displayed any warnings for any model I ran so far.

@dmbates
Copy link
Collaborator

dmbates commented Aug 8, 2017

@hikea Thanks for the explanation. I went back and looked at the documentation for languageR::lexdec and found that the RT column is logarithmic. Using lexdec[:rt_raw] = exp.(lexdec[:RT]) I get a model fit with a similar structure

julia> lmm2 = fit!(lmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec))
Linear mixed model fit by maximum likelihood
 Formula: rt_raw ~ 1 + Class * NativeLanguage + ((1 + Class + NativeLanguage) | Subject) + (1 | Word)
     logLik        -2 logLik          AIC             BIC
 -1.04862493×102.09724985×102.09964985×102.10614662×10⁴

Variance components:
                   Column          Variance     Std.Dev.    Corr.
 Word     (Intercept)             2842.995443  53.3197472
 Subject  (Intercept)             2502.046695  50.0204628
          Class: plant              28.963076   5.3817354  1.00
          NativeLanguage: Other   4201.453564  64.8186205  1.00  1.00
 Residual                        16133.661777 127.0183521
 Number of obs: 1659; levels of grouping factors: 79, 21

  Fixed-effects parameters:
                                      Estimate Std.Error  z value P(>|z|)
(Intercept)                            563.536   17.4262  32.3384  <1e-99
Class: plant                           6.61877   14.7386 0.449079  0.6534
NativeLanguage: Other                    119.1   41.7748  2.85101  0.0044
Class: plant & NativeLanguage: Other  -28.9027   12.9058 -2.23951  0.0251

julia> getΛ(lmm2)[2]
3×3 Array{Float64,2}:
 0.393805    0.0         0.0
 0.0423697   0.0         0.0
 0.510309   -2.42915e-5  0.0

But it doesn't make sense to include a random effect for NativeLanguage with respect to Subject because NativeLanguage doesn't vary within subject. Fitting the simpler model

julia> lmm3 = fit!(lmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class|Subject) + (1|Word)), lexdec))
Linear mixed model fit by maximum likelihood
 Formula: rt_raw ~ 1 + Class * NativeLanguage + ((1 + Class) | Subject) + (1 | Word)
     logLik        -2 logLik          AIC             BIC
 -1.04893234×10⁴  2.09786467×10⁴  2.09966467×10⁴  2.10453724×10⁴

Variance components:
              Column      Variance    Std.Dev.    Corr.
 Word     (Intercept)    2846.267870  53.350425
 Subject  (Intercept)    7091.953472  84.213737
          Class: plant     30.075374   5.484102  1.00
 Residual               16133.230828 127.016656
 Number of obs: 1659; levels of grouping factors: 79, 21

  Fixed-effects parameters:
                                      Estimate Std.Error  z value P(>|z|)
(Intercept)                            563.536   26.1962  21.5122  <1e-99
Class: plant                           6.61877   14.7473 0.448812  0.6536
NativeLanguage: Other                    119.1   38.0826  3.12742  0.0018
Class: plant & NativeLanguage: Other  -28.9027   12.9141 -2.23808  0.0252

julia> getΛ(lmm3)[2]
2×2 Array{Float64,2}:
 0.663013   0.0
 0.0431762  0.0

still produces a singular fit. Notice that the log-likelihood has changed very little between lmm2 and lmm3.

Because of the singularity I would go further and reduce the model to

julia> lmm4 = fit!(lmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject) + (1|Word)), lexdec))
Linear mixed model fit by maximum likelihood
 Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject) + (1 | Word)
     logLik        -2 logLik          AIC             BIC
 -1.04896988×102.09793976×102.09933976×102.10312954×10⁴

Variance components:
              Column    Variance   Std.Dev.
 Word     (Intercept)   2845.8845  53.346832
 Subject  (Intercept)   7506.9142  86.642450
 Residual              16141.2087 127.048056
 Number of obs: 1659; levels of grouping factors: 79, 21

  Fixed-effects parameters:
                                      Estimate Std.Error  z value P(>|z|)
(Intercept)                            563.536   26.8482  20.9897  <1e-97
Class: plant                           6.61877   14.6626 0.451405  0.6517
NativeLanguage: Other                    119.1   39.1281  3.04386  0.0023
Class: plant & NativeLanguage: Other  -28.9027   12.6888 -2.27782  0.0227

As a GLMM with the inverse link it is

julia> glmm1 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject) + (1|Word)), lexdec, Gamma()), fast=true)
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject) + (1 | Word)
  Distribution: Distributions.Gamma{Float64}
  Link: GLM.InverseLink()

  Deviance (Laplace approximation): 90.5431

Variance components:
             Column        Variance        Std.Dev.
 Word    (Intercept)  0.000000000000000 0.00000000000
 Subject (Intercept)  0.000000015758068 0.00012553114

 Number of obs: 1659; levels of grouping factors: 79, 21

Fixed-effects parameters:
                                          Estimate   Std.Error   z value P(>|z|)
(Intercept)                              0.0017758  8.52957e-5   20.8194  <1e-95
Class: plant                           -2.05845e-5 0.000115231 -0.178637  0.8582
NativeLanguage: Other                 -0.000304747 0.000120173   -2.5359  0.0112
Class: plant & NativeLanguage: Other    6.98092e-5 0.000161009  0.433574  0.6646

which must be reduced to

julia> glmm2 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject)), lexdec, Gamma()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject)
  Distribution: Distributions.Gamma{Float64}
  Link: GLM.InverseLink()

  Deviance (Laplace approximation): 90.5248

Variance components:
             Column        Variance        Std.Dev.
 Subject (Intercept)  0.000000015872202 0.00012598493

 Number of obs: 1659; levels of grouping factors: 21

Fixed-effects parameters:
                                          Estimate   Std.Error   z value P(>|z|)
(Intercept)                             0.00178218  8.55298e-5   20.8369  <1e-95
Class: plant                           -2.05235e-5 0.000115529 -0.177647  0.8590
NativeLanguage: Other                 -0.000304179 0.000120528  -2.52372  0.0116
Class: plant & NativeLanguage: Other    6.95288e-5 0.000161447  0.430659  0.6667


julia> glmm2.LMM.optsum
Initial parameter vector: [0.0017758, -2.05845e-5, -0.000304747, 6.98092e-5, 0.000125531]
Initial objective value:  90.54308478875748

Optimizer (from NLopt):   LN_BOBYQA
Lower bounds:             [-Inf, -Inf, -Inf, -Inf, 0.0]
ftol_rel:                 1.0e-12
ftol_abs:                 1.0e-8
xtol_rel:                 0.0
xtol_abs:                 [1.0e-10]
initial_step:             [2.84319e-5, 3.84104e-5, 4.00576e-5, 5.36696e-5, 3.13828e-5]
maxfeval:                 -1

Function evaluations:     77
Final parameter vector:   [0.00178218, -2.05235e-5, -0.000304179, 6.95288e-5, 0.000125985]
Final objective value:    90.52478628806914
Return code:              FTOL_REACHED

If you really want the identity link it would look like

julia> glmm3 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Word) + (1|Subject)), lexdec, Gamma(), GLM.IdentityLink()), fast=true)
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Word) + (1 | Subject)
  Distribution: Distributions.Gamma{Float64}
  Link: GLM.IdentityLink()

  Deviance (Laplace approximation): 91.9169

Variance components:
             Column    Variance  Std.Dev.
 Word    (Intercept)     0.0000  0.000000
 Subject (Intercept)  1492.1437 38.628276

 Number of obs: 1659; levels of grouping factors: 79, 21

Fixed-effects parameters:
                                      Estimate Std.Error   z value P(>|z|)
(Intercept)                            561.603   26.8466   20.9189  <1e-96
Class: plant                           6.47181    36.919  0.175298  0.8608
NativeLanguage: Other                  114.858   45.1489   2.54399  0.0110
Class: plant & NativeLanguage: Other   -29.445   62.1656 -0.473654  0.6357

julia> glmm4 = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject)), lexdec, Gamma(), GLM.IdentityLink()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject)
  Distribution: Distributions.Gamma{Float64}
  Link: GLM.IdentityLink()

  Deviance (Laplace approximation): 91.9053

Variance components:
             Column    Variance Std.Dev.
 Subject (Intercept)  1495.1625 38.66733

 Number of obs: 1659; levels of grouping factors: 21

Fixed-effects parameters:
                                      Estimate Std.Error   z value P(>|z|)
(Intercept)                            563.544   26.9076   20.9437  <1e-96
Class: plant                           6.45074   37.0112  0.174292  0.8616
NativeLanguage: Other                   114.57   45.2377   2.53262  0.0113
Class: plant & NativeLanguage: Other  -29.3727   62.3007 -0.471467  0.6373

or, eliminating the insignificant fixed-effects terms in Class

julia> glmm5 = fit!(glmm(@formula(rt_raw ~ 1 + NativeLanguage + (1|Subject)), lexdec, Gamma(), GLM.IdentityLink()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: rt_raw ~ 1 + NativeLanguage + (1 | Subject)
  Distribution: Distributions.Gamma{Float64}
  Link: GLM.IdentityLink()

  Deviance (Laplace approximation): 92.1422

Variance components:
             Column    Variance  Std.Dev.
 Subject (Intercept)  1482.8712 38.508066

 Number of obs: 1659; levels of grouping factors: 21

Fixed-effects parameters:
                       Estimate Std.Error z value P(>|z|)
(Intercept)             566.399   21.4695 26.3815  <1e-99
NativeLanguage: Other    101.59    35.368 2.87238  0.0041

@dmbates
Copy link
Collaborator

dmbates commented Aug 8, 2017

Note that the deviance as reported here with the inverse link is slightly better (i.e. smaller) than the corresponding fits with the identity link. I say as reported here because I am not entirely sure that the calculation of the deviance for a Gamma GLMM is correct. I am also unsure about the standard errors on the fixed-effects coefficients in this case.

@dmbates
Copy link
Collaborator

dmbates commented Aug 8, 2017

@hikea Regarding the failure to produce a warning for models including random effects for NativeLanguage by Subject, the fitted model is obviously singular and I presume that the user is going to check for singularity. Trying to guess in what way a user may specify a nonsensical model and check if they have done so becomes a game of Whac-A-Mole and I don't think it is a good use of my time. Pull requests would be welcome, of course.

@hikea
Copy link
Author

hikea commented Aug 8, 2017

Thank you!
It looks like I get a proper fit for the Gamma model with Identitylink when I don't use fast=true

julia> lexdec_glmm_gamma = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject) + (1 | Word)
  Distribution: Distributions.Gamma{Float64}
  Link: GLM.IdentityLink()

  Deviance (Laplace approximation): 91.9053

Variance components:
             Column        Variance        Std.Dev.   
 Word    (Intercept)     0.000016739485  0.0040913915
 Subject (Intercept)  1496.106669109329 38.6795381191

 Number of obs: 1659; levels of grouping factors: 79, 21

Fixed-effects parameters:
                                      Estimate Std.Error   z value P(>|z|)
(Intercept)                            563.546    26.909   20.9426  <1e-96
Class: plant                           6.45169   37.0112  0.174317  0.8616
NativeLanguage: Other                  114.567   45.2397   2.53245  0.0113
Class: plant & NativeLanguage: Other  -29.3727   62.3006 -0.471468  0.6373


julia> getΛ(lexdec_glmm_gamma)[2]
1×1 Array{Float64,2}:
 38.6795

@dmbates
Copy link
Collaborator

dmbates commented Aug 8, 2017

@hikea Regrettably that fit fails in the development version of MixedModels. I need to patch up some starting estimates

julia> lexdec_glmm_gamma = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject) + (1|Word)), lexdec, Gamma(), GLM.IdentityLink()))
ERROR: ArgumentError: invalid NLopt arguments
Stacktrace:
 [1] chk(::Int32) at /home/bates/.julia/v0.6/NLopt/src/NLopt.jl:259
 [2] chkn at /home/bates/.julia/v0.6/NLopt/src/NLopt.jl:277 [inlined]
 [3] initial_step!(::NLopt.Opt, ::Array{Float64,1}) at /home/bates/.julia/v0.6/NLopt/src/NLopt.jl:365
 [4] NLopt.Opt(::MixedModels.OptSummary{Float64}) at /home/bates/.julia/v0.6/MixedModels/src/types.jl:94
 [5] #fit!#68(::Bool, ::Bool, ::Function, ::MixedModels.GeneralizedLinearMixedModel{Float64}) at /home/bates/.julia/v0.6/MixedModels/src/PIRLS.jl:213
 [6] fit!(::MixedModels.GeneralizedLinearMixedModel{Float64}) at /home/bates/.julia/v0.6/MixedModels/src/PIRLS.jl:193

What happens is that the fast = true optimization is always performed first to get starting estimates for the more general fit. I will need to modify the starting estimates for the more general fit when one of the constrained θ parameters converges to zero in the fast = true case. See #95

In the fit you show it looks as if the nonzero standard deviation for the Word random effect is essentially round-off error

@hikea
Copy link
Author

hikea commented Aug 8, 2017

Not a problem. I've noticed some weird behaviour when using fast=true. Is it similar to calc.derivs = FALSE in lme4? When I use that, I can fit any model, fast, and without warnings.
The standard deviation for the Word random effect is indeed close to zero, but the fit is non-singular.

@dmbates
Copy link
Collaborator

dmbates commented Aug 8, 2017

The fast=true version of the fit uses Penalized Iteratively Reweighted Least Squares (PIRLS) to determine the conditional mode of the random effects and the conditional estimate of the fixed-effects parameter, given a value of θ. It is similar to using nAGQ=0L in a glmer fit.

Generally the difference between the fast=true fit and the optimization of all the parameters using PIRLS only to determine the conditional modes of the random effects is very small.

When you say that you have noticed some weird behaviour, do you have examples you can share?

@hikea
Copy link
Author

hikea commented Aug 8, 2017

Unfortunately, I don't have anything to share at the moment, but I will when I encounter something again.

Thank you for all the explanations. I hope this post is useful not only to me but also to everyone who is new to GLMM and Julia.

Could you please address my questions 2 and 4?
2. Are interactions of random effects supported?
4. Are there any plans to implement support for Inverse Gaussian? (@bbolker mentioned that this is a possibility)

@hikea
Copy link
Author

hikea commented Aug 8, 2017

To address my third question, using my data (20k observations, 3x3x2 design), I dropped a by-word random slope (the maximal model for my data requires one) and used fast=true. This decreased the run time from 304 seconds to 117 seconds. With by-item slope and fast=true I get 406 seconds.

I have openblas-lapack installed, but I only allow 2 threads for openblas via export OPENBLAS_NUM_THREADS=2 as I am using a laptop and want to prevent overheating issues.

@bbolker
Copy link

bbolker commented Aug 8, 2017

I'm falling behind a bit.

  • with regard to interactions of random effects in Julia (Q2): I don't know how the formulas are evaluated, but at the very worst you can construct your own interaction term and add it to the data frame. The R analogue would be
lexdat$CNLint <- with(lexdat, interaction(Class,NativeLanguage))

then fit the model with a (1+CNLint|grpvar) term in it.

  • with regard to an inverse-Gamma family (Q4): you (or someone) would need to go to the Julia GLM package and implement all the bits of the inverse-Gamma family (variance function etc.) that are needed. Shouldn't be too hard to port from R's inverse.gamma function ... you could post an issue there if you were hoping that someone else would do it ...
  • with regard to singular fits, variations among platforms:
    • I agree with Doug that trying to detect bad model specifications is often whac-a-mole-ish, although lme4's pre-convergence checks (checking numbers of observations vs. rank of Z, number of levels, number of random effects) are reasonably sensible. It might not be impossible to come up with some automated ways to test for the required minimal replication levels, but it's not high on my list ...
    • glmmTMB parameterizes variance-covariance matrices via decomposing them into a correlation matrix (further parameterized by lower-triangle elements, see here whose parameters are (I think) naturally unconstrained) and log-standard deviations. It should be easy to guess that singularity occurring when log-sd gets sufficiently negative, but I'm not sure whether there are any other conditions/limits on the parameters (it's probably necessary but ? not sufficient for some set of the theta values to get large?)

@hikea
Copy link
Author

hikea commented Aug 9, 2017

@bbolker thank you! This is very helpful. I've never done any implementations, but I'll take a look.

@hikea
Copy link
Author

hikea commented Aug 9, 2017

Looking a bit more at this (not necessarily in the right direction), I've noticed that correlations of random effects do not match.

julia> lexdec_model_rt_raw = fit!(lmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec))
Linear mixed model fit by maximum likelihood
 Formula: rt_raw ~ 1 + Class * NativeLanguage + ((1 + Class + NativeLanguage) | Subject) + (1 | Word)
     logLik        -2 logLik          AIC             BIC       
 -1.04862493×102.09724986×102.09964986×102.10614662×10⁴

Variance components:
                   Column          Variance   Std.Dev.    Corr.
 Word     (Intercept)             2842.95699  53.319387
 Subject  (Intercept)             2502.20502  50.022045
          Class: plant              28.94764   5.380301  1.00
          NativeLanguage: Other   4200.34712  64.810085  1.00  1.00
 Residual                        16133.67862 127.018418
 Number of obs: 1659; levels of grouping factors: 79, 21

  Fixed-effects parameters:
                                      Estimate Std.Error  z value P(>|z|)
(Intercept)                            563.536   17.4266  32.3377  <1e-99
Class: plant                           6.61877   14.7385 0.449082  0.6534
NativeLanguage: Other                    119.1   41.7728  2.85115  0.0044
Class: plant & NativeLanguage: Other  -28.9027   12.9057 -2.23952  0.0251



julia> lexdec_glmm_gamma = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: rt_raw ~ 1 + Class * NativeLanguage + ((1 + Class + NativeLanguage) | Subject) + (1 | Word)
  Distribution: Distributions.Gamma{Float64}
  Link: GLM.IdentityLink()

  Deviance (Laplace approximation): 88.5160

Variance components:
                  Column             Variance        Std.Dev.     Corr.
 Word    (Intercept)               0.000013282934  0.0036445759
 Subject (Intercept)               0.000000000000  0.0000000000
         Class: plant              2.154574504719  1.4678468942   NaN
         NativeLanguage: Other  6919.833771948175 83.1855382380   NaN  0.99

 Number of obs: 1659; levels of grouping factors: 79, 21

Fixed-effects parameters:
                                      Estimate Std.Error   z value P(>|z|)
(Intercept)                             563.54   24.5249   22.9783  <1e-99
Class: plant                           6.61341   37.0896  0.178309  0.8585
NativeLanguage: Other                   114.65   50.0573   2.29038  0.0220
Class: plant & NativeLanguage: Other  -30.0731    61.753 -0.486989  0.6263


> summary( lmer(rt_raw ~ Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word), data=lexdec, REML=F) )
Linear mixed model fit by maximum likelihood  
t-tests use  Satterthwaite approximations to degrees of freedom ['lmerMod']
Formula: rt_raw ~ Class * NativeLanguage + (1 + Class + NativeLanguage |      Subject) + (1 | Word)
   Data: lexdec

     AIC      BIC   logLik deviance df.resid 
 20996.5  21061.5 -10486.2  20972.5     1647 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.1722 -0.5469 -0.1480  0.3082  8.1841 

Random effects:
 Groups   Name                Variance Std.Dev. Corr     
 Word     (Intercept)          2842.96  53.319           
 Subject  (Intercept)          2502.17  50.022           
          Classplant             28.96   5.381  1.00     
          NativeLanguageOther  4201.33  64.818  1.00 1.00
 Residual                     16133.66 127.018           
Number of obs: 1659, groups:  Word, 79; Subject, 21

Fixed effects:
                               Estimate Std. Error      df t value Pr(>|t|)    
(Intercept)                     563.536     17.427  19.500  32.338   <2e-16 ***
Classplant                        6.619     14.738 103.630   0.449   0.6543    
NativeLanguageOther             119.100     41.775  11.900   2.851   0.0147 *  
Classplant:NativeLanguageOther  -28.903     12.906 259.890  -2.240   0.0260 *  
---
Signif. codes:  0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
            (Intr) Clsspl NtvLnO
Classplant  -0.283              
NtvLnggOthr -0.328  0.013       
Clsspln:NLO  0.036 -0.375  0.037


> summary( glmer(rt_raw ~ Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word), data=lexdec, family=Gamma(link="identity")) )
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: Gamma  ( identity )
Formula: rt_raw ~ Class * NativeLanguage + (1 + Class + NativeLanguage |      Subject) + (1 | Word)
   Data: lexdec

     AIC      BIC   logLik deviance df.resid 
 20343.6  20408.6 -10159.8  20319.6     1647 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.7847 -0.5918 -0.1717  0.3281  8.2704 

Random effects:
 Groups   Name                Variance  Std.Dev. Corr       
 Word     (Intercept)         1.529e+03 39.1034             
 Subject  (Intercept)         8.950e+02 29.9174             
          Classplant          1.861e+01  4.3135   0.45      
          NativeLanguageOther 2.176e+03 46.6512  -0.15  0.81
 Residual                     4.127e-02  0.2032             
Number of obs: 1659, groups:  Word, 79; Subject, 21

Fixed effects:
                               Estimate Std. Error t value Pr(>|z|)    
(Intercept)                     580.836     11.858   48.98  < 2e-16 ***
Classplant                        6.813     10.282    0.66  0.50759    
NativeLanguageOther             124.780     10.898   11.45  < 2e-16 ***
Classplant:NativeLanguageOther  -24.912      7.757   -3.21  0.00132 ** 
---
Signif. codes:  0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
            (Intr) Clsspl NtvLnO
Classplant   0.041              
NtvLnggOthr  0.015 -0.037       
Clsspln:NLO -0.059 -0.140  0.099


> summary( glmmTMB(rt_raw ~ Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word), data=lexdec) )
 Family: gaussian  ( identity )
Formula:          rt_raw ~ Class * NativeLanguage + (1 + Class + NativeLanguage |      Subject) + (1 | Word)
Data: lexdec

     AIC      BIC   logLik deviance df.resid 
      NA       NA       NA       NA     1647 

Random effects:

Conditional model:
 Groups   Name                Variance  Std.Dev.  Corr       
 Word     (Intercept)         2.846e+03 5.335e+01            
 Subject  (Intercept)         7.507e+03 8.664e+01            
          Classplant          5.503e-54 2.346e-27 -1.00      
          NativeLanguageOther 1.545e-17 3.931e-09 -0.97  0.97
 Residual                     1.614e+04 1.270e+02            
Number of obs: 1659, groups:  Word, 79; Subject, 21

Dispersion estimate for gaussian family (sigma^2): 1.61e+04 

Conditional model:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                     563.536     26.848  20.990  < 2e-16 ***
Classplant                        6.619     14.663   0.451  0.65169    
NativeLanguageOther             119.101     39.128   3.044  0.00234 ** 
Classplant:NativeLanguageOther  -28.903     12.689  -2.278  0.02274 *  
---
Signif. codes:  0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1
Warning messages:
1: In nlminb(start = par, objective = fn, gradient = gr) :
  NA/NaN function evaluation
2: In nlminb(start = par, objective = fn, gradient = gr) :
  NA/NaN function evaluation
3: In glmmTMB(rt_raw ~ Class * NativeLanguage + (1 + Class + NativeLanguage |  :
  Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')

> summary( glmmTMB(rt_raw ~ Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word), data=lexdec, family=list(family="Gamma",link="identity")) )
 Family: Gamma  ( identity )
Formula:          rt_raw ~ Class * NativeLanguage + (1 + Class + NativeLanguage |      Subject) + (1 | Word)
Data: lexdec

     AIC      BIC   logLik deviance df.resid 
      NA       NA       NA       NA     1647 

Random effects:

Conditional model:
 Groups  Name                Variance   Std.Dev.  Corr       
 Word    (Intercept)          2.123e+03 4.608e+01            
 Subject (Intercept)          7.615e+04 2.760e+02            
         Classplant          6.378e-119 7.986e-60 -1.00      
         NativeLanguageOther  4.414e+03 6.644e+01  1.00 -0.99
Number of obs: 1659, groups:  Word, 79; Subject, 21

Dispersion estimate for Gamma family (sigma^2): 0.0327 

Conditional model:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                      266.34         NA      NA       NA    
Classplant                        31.71      12.63   2.511    0.012 *  
NativeLanguageOther              125.93         NA      NA       NA    
Classplant:NativeLanguageOther   -56.56      10.97  -5.154 2.55e-07 ***
---
Signif. codes:  0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1
There were 47 warnings (use warnings() to see them)

All correlations look pretty high, but there are discrepancies both in direction and magnitude.

@dmbates
Copy link
Collaborator

dmbates commented Aug 9, 2017

@hikea You have almost convinced me that I should print warning messages about convergence to singular covariance matrices. All of GLMM fits are to singular covariance matrices for the Subject factor.

julia> using MixedModels, RCall

julia> lexdec = rcopy(R"languageR::lexdec");

julia> lexdec[:rt_raw] = exp.(lexdec[:RT]);
julia> lexdec_glmm_gamma = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1+Class+NativeLanguage|Subject) + (1|Word)), lexdec, Gamma(), IdentityLink()), fast=true)
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: rt_raw ~ 1 + Class * NativeLanguage + ((1 + Class + NativeLanguage) | Subject) + (1 | Word)
  Distribution: Distributions.Gamma{Float64}
  Link: GLM.IdentityLink()

  Deviance (Laplace approximation): 88.5302

Variance components:
                  Column             Variance        Std.Dev.     Corr.
 Word    (Intercept)               0.000008148119  0.0028544911
 Subject (Intercept)               0.000000000000  0.0000000000
         Class: plant              2.141102817949  1.4632507707   NaN
         NativeLanguage: Other  6917.959907320534 83.1742743120   NaN  0.99

 Number of obs: 1659; levels of grouping factors: 79, 21

Fixed-effects parameters:
                                      Estimate Std.Error   z value P(>|z|)
(Intercept)                            563.536   24.5248   22.9783  <1e-99
Class: plant                           6.61613   37.0894  0.178383  0.8584
NativeLanguage: Other                  110.297   49.9933   2.20624  0.0274
Class: plant & NativeLanguage: Other  -30.3897   61.6327 -0.493078  0.6220


julia> getΛ(lexdec_glmm_gamma)[2]
3×3 Array{Float64,2}:
  0.0       0.0        0.0
  1.32808   0.614256   0.0
 74.3338   34.6921    13.7442

Also, it is nonsensical to include random effects for NativeLanguage by Subject because NativeLanguage is a property of Subject.

You can't expect consistency in fits to nonsense models.

@hikea
Copy link
Author

hikea commented Aug 9, 2017

I am glad that I am being of help here as well (although, I thought it was the opposite).
I know that these are nonsensical models with wrong parameters, but I thought I would get consistent results nevertheless.
Thank you for implementing Inverse Gaussian in GLM --> JuliaStats/GLM.jl#193

@hikea
Copy link
Author

hikea commented Aug 16, 2017

Thank you @dmbates I've just noticed that Inverse Gaussian implementation was merged and it works in MixedModels (after using Pgk.checkout("GLM") and Pkg.chekout("MixedModels") ).

However, I get the same error you were getting earlier

julia> lexdec_inv_gauss = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject)), lexdec, InverseGaussian(), IdentityLink()))
ERROR: ArgumentError: invalid NLopt arguments

but not with Gamma

julia> lexdec_gamma = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject)), lexdec, Gamma(), IdentityLink()))
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject)
  Distribution: Distributions.Gamma{Float64}
  Link: GLM.IdentityLink()

  Deviance (Laplace approximation): 91.9053

Variance components:
             Column    Variance Std.Dev. 
 Subject (Intercept)  1495.1625 38.66733

 Number of obs: 1659; levels of grouping factors: 21

Fixed-effects parameters:
                                      Estimate Std.Error   z value P(>|z|)
(Intercept)                            563.544   26.9076   20.9437  <1e-96
Class: plant                           6.45074   37.0112  0.174292  0.8616
NativeLanguage: Other                   114.57   45.2377   2.53262  0.0113
Class: plant & NativeLanguage: Other  -29.3727   62.3007 -0.471467  0.6373

however, Gamma will give the same error if I include random effects for words (1|Word), which has a 0 variance in Julia (see below for R output)

julia> lexdec_gamma = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject)+(1|Word)), lexdec, Gamma(), IdentityLink()))
ERROR: ArgumentError: invalid NLopt arguments

using fast=true helps with this issue

julia> lexdec_inv_gauss = fit!(glmm(@formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject)), lexdec, InverseGaussian(), IdentityLink()), fast=true)
Generalized Linear Mixed Model fit by minimizing the Laplace approximation to the deviance
  Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject)
  Distribution: Distributions.InverseGaussian{Float64}
  Link: GLM.IdentityLink()

  Deviance (Laplace approximation): 0.1438

Variance components:
             Column   Variance Std.Dev. 
 Subject (Intercept)         0        0

 Number of obs: 1659; levels of grouping factors: 21

Fixed-effects parameters:
                                      Estimate Std.Error    z value P(>|z|)
(Intercept)                            563.536   582.191   0.967957  0.3331
Class: plant                           6.61877   883.313 0.00749313  0.9940
NativeLanguage: Other                    119.1   1068.76   0.111438  0.9113
Class: plant & NativeLanguage: Other  -28.9027   1580.41 -0.0182882  0.9854

We can see that the model with InverseGaussian returns 0 variance for (1|Subject) as well. I guess that's why it didn't converge without fast=true, just like Gamma

The fixed effects estimates are close to R output with nAGQ=0L, but std.errors, random effects and deviance are completely different. You can see that neither Word nor Subject has 0 variance (the same is true for Gamma model), although the residual is zero-ish.

> lexdec_inv_gauss_ <- glmer(rt_raw ~ 1+ Class*NativeLanguage + (1|Subject) + (1|Word), data=lexdec, family=inverse.gaussian(link="identity"), nAGQ=0)
> summary(lexdec_inv_gauss_)
Generalized linear mixed model fit by maximum likelihood (Adaptive Gauss-Hermite Quadrature, nAGQ = 0) ['glmerMod']
 Family: inverse.gaussian  ( identity )
Formula: rt_raw ~ 1 + Class * NativeLanguage + (1 | Subject) + (1 | Word)
   Data: lexdec

     AIC      BIC   logLik deviance df.resid 
 20185.1  20223.0 -10085.6  20171.1     1652 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.7972 -0.5759 -0.1617  0.3632  8.2905 

Random effects:
 Groups   Name        Variance  Std.Dev. 
 Word     (Intercept) 1.400e+03 37.412913
 Subject  (Intercept) 1.909e+03 43.692319
 Residual             6.799e-05  0.008245
Number of obs: 1659, groups:  Word, 79; Subject, 21

Fixed effects:
                               Estimate Std. Error t value Pr(>|z|)    
(Intercept)                     561.088     14.604   38.42  < 2e-16 ***
Classplant                        6.484     11.026    0.59   0.5565    
NativeLanguageOther             109.614     21.039    5.21 1.89e-07 ***
Classplant:NativeLanguageOther  -29.105     12.064   -2.41   0.0158 *  
---
Signif. codes:  0***0.001**0.01*0.05.0.1 ‘ ’ 1

Correlation of Fixed Effects:
            (Intr) Clsspl NtvLnO
Classplant  -0.332              
NtvLnggOthr -0.590  0.092       
Clsspln:NLO  0.121 -0.370 -0.261

@palday
Copy link
Member

palday commented Sep 27, 2019

Updating the last example for the current release, we still have a problem for the general optimization, and then the general optimization fails (see #95).

julia> using RCall, MixedModels, GLM

julia> lexdec = rcopy(R"languageR::lexdec");

julia> lexdec[:rt_raw] = exp.(lexdec[:RT]);

julia> f = @formula(rt_raw ~ 1 + Class*NativeLanguage + (1|Subject) + (1|Word));

julia> lexdec_glmm_gamma = fit!(GeneralizedLinearMixedModel(f, lexdec, Gamma(), IdentityLink()), fast=true)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  rt_raw ~ 1 + Class + NativeLanguage + Class & NativeLanguage + (1 | Subject) + (1 | Word)
  Distribution: Gamma{Float64}
  Link: IdentityLink()

  Deviance: 91.9169

Variance components:
             Column    Variance  Std.Dev. 
 Word    (Intercept)     0.0000  0.000000
 Subject (Intercept)  1492.1437 38.628276
 Number of obs: 1659; levels of grouping factors: 79, 21

Fixed-effects parameters:
──────────────────────────────────────────────────────────────────────────────
                                       Estimate  Std.Error    z value  P(>|z|)
──────────────────────────────────────────────────────────────────────────────
(Intercept)                           561.603      6.69334  83.9047     <1e-99
Class: plant                            6.47181    9.20457   0.703109   0.4820
NativeLanguage: Other                 114.858     11.2564   10.2038     <1e-23
Class: plant & NativeLanguage: Other  -29.445     15.499    -1.8998     0.0575
──────────────────────────────────────────────────────────────────────────────

julia> fit!(lexdec_glmm_gamma, fast=false)
ERROR: ArgumentError: invalid NLopt arguments: zero step size
. . .

julia> lexdec_glmm_invgauss = fit!(GeneralizedLinearMixedModel(f, lexdec, InverseGaussian(), IdentityLink()), fast=true)
Generalized Linear Mixed Model fit by maximum likelihood (nAGQ = 1)
  rt_raw ~ 1 + Class + NativeLanguage + Class & NativeLanguage + (1 | Subject) + (1 | Word)
  Distribution: InverseGaussian{Float64}
  Link: IdentityLink()

  Deviance: 0.1438

Variance components:
             Column   Variance Std.Dev. 
 Word    (Intercept)         0        0
 Subject (Intercept)         0        0
 Number of obs: 1659; levels of grouping factors: 79, 21

Fixed-effects parameters:
─────────────────────────────────────────────────────────────────────────────
                                       Estimate  Std.Error   z value  P(>|z|)
─────────────────────────────────────────────────────────────────────────────
(Intercept)                           563.536      6.03062  93.4458    <1e-99
Class: plant                            6.61877    9.14979   0.72338   0.4694
NativeLanguage: Other                 119.1       11.0707   10.7582    <1e-26
Class: plant & NativeLanguage: Other  -28.9027    16.3706   -1.76552   0.0775
─────────────────────────────────────────────────────────────────────────────

julia> fit!(lexdec_glmm_invgauss, fast=false)
ERROR: ArgumentError: invalid NLopt arguments: zero step size
. . .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants