Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not understanding tolerance with Cholesky decomposition #136

Closed
pdeffebach opened this issue Apr 29, 2021 · 3 comments
Closed

Not understanding tolerance with Cholesky decomposition #136

pdeffebach opened this issue Apr 29, 2021 · 3 comments

Comments

@pdeffebach
Copy link

In GLM in Julia, we are trying to diagnose a "false positive" rank deficiency in Cholesky decomposition for OLS here.

I'm trying to benchmark why a lm call in Julia fails while a feols call in fixest succeeds. I'm comparing with fixest because R's lm uses QR decomposition by default, but it seems fixest uses Cholesky, like GLM.jl.

Consider the following MWE with fixest from cpp_cholesky defined here.

library(tidyverse)
library(fixest)
cpp_cholesky = utils::getFromNamespace("cpp_cholesky", "fixest"); 
df = read_csv("https://github.com/JuliaStats/GLM.jl/files/6384056/Car-Training.csv")
m = feols(Price ~ Year + Mileage, df)
X = model.matrix(m)
XtX = t(X) %*% X 
cpp_cholesky(XtX, tol=80.0) # full rank
cpp_cholesky(XtX, tol=81.0) # not full rank

Any idea why tol=81 fails? It's odd that the tolerance that leads cpp_cholesky to conclude XtX is rank-deficient is so high, given that the default tolerance in the cpp_cholesky is 1e-10.

I would like to explore this algorithm for computing Cholesky decomposition compared to the LAPACK one that Julis calling. But I don't think I'm getting the tolerance right.

Any insight is helpful. Thank you.

@lrberge
Copy link
Owner

lrberge commented Apr 30, 2021

Hi, the algorithm I implemented is a bit peculiar. For speed consideration, I don't apply any pivoting: given a threshold (here 1e-10), variables are removed on-the-fly as the algorithm goes. This means that the order of the variables matters:

# Let's reorder the columns
X1 = X
X2 = X[, c(2, 1, 3)]
X3 = X[, 3:1]

# => leads to different thresholds
cpp_cholesky(crossprod(X1))$min_norm
#> [1] 80.42991
cpp_cholesky(crossprod(X2))$min_norm
#> [1] 1.991382e-05
cpp_cholesky(crossprod(X3))$min_norm
#> [1] 1.366587e-05

The difference can be large, as shown in your example. But my algorithm is really non standard, although it works rather well.

From my viewpoint, it's on the user side to ensure that his/her model is not ill conditioned. Removing variables on the fly is just for convenience but I really consider it not good practice. By construction, different algorithms will pick different variables since there is no "truth" in that regard.

Hope it clarifies a bit how it works!

@lrberge
Copy link
Owner

lrberge commented May 3, 2021

OK, I'm closing. Let me know if you need more information.

@lrberge lrberge closed this as completed May 3, 2021
@pdeffebach
Copy link
Author

Thanks for the response, and apologies for the slow reply.

It's good to know this wasn't a bug! I will do more to understand this algorithm after finals and see how it compares to LAPACK's.

From my viewpoint, it's on the user side to ensure that his/her model is not ill conditioned.

Yes, this is the tension. Hopefully we can find a solution in GLM.jl that maintains flexibility while also being numerically accurate.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants