Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to spatially predict(type="se")$se and the output has many null values and what look like spatial offsets? #668

Closed
KWB4484 opened this issue Apr 6, 2023 · 6 comments

Comments

@KWB4484
Copy link

KWB4484 commented Apr 6, 2023

In short, I've trained regression models for 20 ecological units across the US by mapping this function to nested training datasets for each unit. Example:

train%>%
  mutate(fit = future_map(data, ~ ranger(y ~ x1+x2...,num.trees=500, data = .x, keep.inbag=TRUE,importance='impurity',min.node.size=10), .options = furrr_options(seed = 44)))%>%
  {.->>EcoModels}  

I then create a raster stack of covariates, I join ecological unit boundaries to the "EcoModels" tibble and essentially map a function where for each row the raster stack gets crop/masked using the ecological unit boundary, the model is applied to the raster stack and the output is written to disk.

 for(i in 1:nrow(EcoModels)){
  filename<-paste("./Out/UnitTL01_",EcoModels$UnitName[[i]], ".tiff",sep="")
  ras<-raster::crop(ras_stack,extent(EcoModels$unit_boundary[[i]]))
  ras<-raster::mask(ras,EcoModels$unit_boundary[[i]])
  out<-predict(ras, EcoModels$fit[[i]], type='se', progress='window', fun = function(model, ...) predict(model, ...)$se)
  writeRaster(out, filename=filename, datatype='FLT4S', overwrite=TRUE)
  removeTmpFiles(h=0.001)
}

This worked well for response predictions, but my computer ran out of memory when trying to get standard errors. Luckily, I was just given the opportunity to use a computer with a ridiculous amount of RAM and I got it to run....but the output had lots of null values and spatial chunks that look like some offset was applied.

I checked the raster stack and it appears fine. Is there something that I'm doing wrong or is this normal because of xyz? Thanks for any ideas you can provide for troubleshooting this.

Standard Errors
image

Response predictions
Layout3

Units
image

Example SE for one unit
image

@rvalavi
Copy link
Contributor

rvalavi commented Apr 18, 2023

I had the same issue @KWB4484! I solved it by converting the raster to data.frame. But the bigger problem was that my raster is so big (more than 111 million cells) so I had to tile it and do this in parallel. While doing that realized the "se" calculating varies between the tiles!!!!

@rvalavi
Copy link
Contributor

rvalavi commented Apr 18, 2023

It seems the length of the new data affects se calculation.

@KWB4484
Copy link
Author

KWB4484 commented Apr 18, 2023

Thanks for the suggestion @rvalavi! I'll give it a try. I'm assuming you can retain xy and convert it back to raster after calculating the standard errors? Fingers crossed I have enough memory.

@rvalavi
Copy link
Contributor

rvalavi commented Apr 19, 2023

My pleasure, @KWB4484!
Yes, you need xy to convert back to raster. But the main issue is that predicting se with a model on two datasets with different lengths produces different results (even if one is just a subset of the other and you have set a random seed)! I need to read its paper and check the code in more detail to see what is the issue. It might be something in the proposed method but ideally, it shouldn't be like that!

The model fitting with ranger is fantastic with very little RAM requirement, but I wish the prediction would have been the same story.

@mnwright
Copy link
Member

See also #136. Also consider quantile regression or the forestError package?

@KWB4484
Copy link
Author

KWB4484 commented May 1, 2023

@rvalavi Well I tried going the data.frame route but I'm in the same boat -- too big --

"Error in .local(x, y, ...) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 102"

and given your experience with tiling I'll just quit now. Thanks for saving me time!

Thanks @mnwright, I'll try the forestError package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants