Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

does SE clustering actually work? #16

Open
thorek1 opened this issue Aug 26, 2021 · 3 comments
Open

does SE clustering actually work? #16

thorek1 opened this issue Aug 26, 2021 · 3 comments

Comments

@thorek1
Copy link

thorek1 commented Aug 26, 2021

Hi,

I was wondering whether standard error clustering actually works. In the code example you can see that clustering does not change the standard errors whereas I would expect them to move and fixest estimates them to change:

n <- 1000

set.seed(1)
dat <- data.frame(y = runif(n) > .5,
                  x1 = rnorm(n),
                  x2 = rnorm(n) + 2,
                  x2 = (rnorm(n) + 2) / 2,
                  x3 = sample(LETTERS,n, T),
                  x4 = sample(letters,n, T))


library(alpaca)

alpaca::feglm(y ~ x1 + x2 | x3, dat) |> summary()
#binomial - logit link
#
#y ~ x1 + x2 | x3
#
#Estimates:
#   Estimate Std. error z value Pr(> |z|)
#x1 -0.01404    0.06397  -0.219     0.826
#x2 -0.01322    0.06193  -0.213     0.831
#
#residual deviance= 1341.74,
#null deviance= 1384.69,
#n= 1000, l= [26]
#
#Number of Fisher Scoring Iterations: 4 
alpaca::feglm(y ~ x1 + x2 | x3 | x4, dat) |> summary()
#binomial - logit link
#
#y ~ x1 + x2 | x3 | x4
#
#Estimates:
#   Estimate Std. error z value Pr(> |z|)
#x1 -0.01404    0.06397  -0.219     0.826
#x2 -0.01322    0.06193  -0.213     0.831
#
#residual deviance= 1341.74,
#null deviance= 1384.69,
#n= 1000, l= [26]
#
#Number of Fisher Scoring Iterations: 4 

library(fixest)

fixest::feglm(y ~ x1 + x2 | x3, dat, family = 'binomial')
#GLM estimation, family = binomial, Dep. Var.: y
#Observations: 1,000 
#Fixed-effects: x3: 26
#Standard-errors: Clustered (x3) 
#    Estimate Std. Error   t value Pr(>|t|)) 
#x1 -0.014040   0.054451 -0.257846  0.796526 
#x2 -0.013218   0.055863 -0.236617  0.812954 
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Log-Likelihood:  -670.9   Adj. Pseudo R2: -0.007979
#           BIC: 1,535.2     Squared Cor.:  0.041794
fixest::feglm(y ~ x1 + x2 | x3, dat, family = 'binomial', cluster = 'x4')
#GLM estimation, family = binomial, Dep. Var.: y
#Observations: 1,000 
#Fixed-effects: x3: 26
#Standard-errors: Clustered (x4) 
#    Estimate Std. Error   t value Pr(>|t|)) 
#x1 -0.014040   0.058249 -0.241034  0.809529 
#x2 -0.013218   0.058738 -0.225035  0.821952 
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Log-Likelihood:  -670.9   Adj. Pseudo R2: -0.007979
#           BIC: 1,535.2     Squared Cor.:  0.041794
@amrei-stammann
Copy link
Owner

Hello,

summary() returns per default standard errors based on the inverse of the negative Hessian (not clustered).

If you want to compute clustered standard errors, you need to specify further arguments, see example below:

# clustered by x3
summary(feglm(y ~ x1 + x2 | x3, dat) , type = "clustered", cluster = ~ x3)

# clustered by x4
summary(feglm(y ~ x1 + x2 | x3 | x4, dat) , type = "clustered", cluster = ~ x4)

A further example is available in the vignette:
https://cran.r-project.org/web/packages/alpaca/vignettes/trade.html

Best wishes,
Amrei

@thorek1
Copy link
Author

thorek1 commented Sep 11, 2021

Thanks for your answer. I should have read the documentation more thoroughly.

In the second example you give, why would you need to specify the x4 in the formula? It seems redundant and does not change the functionality (not needed in your first example). I would have expected, from reading the help page of feglm that specifying it there would be enough to have clustered standard errors printed.

Using summary to print the clustered standard errors unfortunately makes a typical workflow of mine impossible:

results <- list()
alpaca::feglm(y ~ x1 + x2 | x3, dat) |> summary() -> results[[1]]
alpaca::feglm(y ~ x1 + x2 | x3 | x4, dat) |> summary() -> results[[2]]

library(texreg)
results |> texreg()
#Error in extract(l[[i]], ...) : 
#  Neither texreg nor broom supports models of class summary.feglm.


alpaca::feglm(y ~ x1 + x2 | x3, dat) -> results[[1]]
alpaca::feglm(y ~ x1 + x2 | x3 | x4, dat) -> results[[2]]

library(texreg)
results |> texreg()
# no clustered standard errors

Integration with other tools (such as texreg, stargazer,...) makes life a lot easer as is the case for fixest.

@pachadotdev
Copy link

here is a a modification that I made to simplify the clustering #17 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants