gc() race in data.table with R-devel #2882
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #2866
Closes #2767
The approach of this PR is to ensure ALTREP vectors are not allowed as columns in a DT. We like and benefit from ALTREP very much in R code such as
[.data.table
where sequence vectors are used a lot. But as columns in a data.table, ALTREPs are not so appropriate. Internals like:=
assign by reference are rewritten on the basis of columns already being materialized (expanded).setDT()
now expands ALTREP columns. The reproducible examples usedsetDT()
to create the test data.table becausedata.table()
already expanded ALTREP columns (by happy accident) inCcopyNamedInList
. That function now checks for ALTREP just in case (as well as MAYBE_REFERENCED as before) just to be safe.Luke said that ALTREPs may in future be more than just sequence vectors; e.g. distributed arrays that cannot be expanded. But in that case, data.table will need code changes anyway to deal with such arrays. If and when that happens, the expansion will fail on such ALTREPs which is reasonable, graceful behaviour; much better than a subsequent gc race at least.
The above is long-term approach with no plans to change; i.e. data.table is unlikely to ever support ALTREP columns in a data.table.
What is short term though, is in all the parallel regions this PR will add checks that no SEXP being used inside the parallel region are ALTREPs (and fail if so). In R-devel, there's only a problem with INTEGER(), REAL() etc on ALTREP vectors. Those functions are still thread-safe on regular vectors, currently. This is a short term solution in the interests of getting an update to CRAN which is intermittently in error state on R-devel due to the gc race. In future we will still take all API use out of parallel regions as Luke suggested. That involves a new approach around the parallel regions which will take time to work through.
setDT()
expands ALTREPs (data.table()
already did)Add checks before all parallel regions that no ALTREPs are present.
All files with parallel regions :
The following were moved to follow up PR #2899 : between.c, freadR.c, fwrite.c, fsort.c, reorder.c