Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression in big grouping #4818

Closed
jangorecki opened this issue Nov 22, 2020 · 5 comments · Fixed by #4297
Closed

regression in big grouping #4818

jangorecki opened this issue Nov 22, 2020 · 5 comments · Fixed by #4297
Labels
benchmark GForce issues relating to optimized grouping calculations (GForce) regression
Milestone

Comments

@jangorecki
Copy link
Member

jangorecki commented Nov 22, 2020

While running grouping benchmark on 2e9 rows dataset (96GB csv) using recent stable data.table 1.13.2 I am getting following exception:

> system.time( DT[, sum(v1), keyby=id1] )
Error in gforce(thisEnv, jsub, o__, f__, len__, irows) :
  Internal error: Failed to allocate counts or TMP when assigning g in gforce
Calls: system.time -> [ -> [.data.table -> gforce

if (!counts || !TMP ) error(_("Internal error: Failed to allocate counts or TMP when assigning g in gforce"));

It is the same machine as the one used in 2014: 32 cores and 244GB memory.


I run data.table 1.9.2 as well to ensure that version which previously worked fine for this data size continue to work on a recent R version.

> system.time( DT[, sum(v1), keyby=id1] )
   user  system elapsed
 58.113  17.098  75.219
> system.time( DT[, sum(v1), keyby=id1] )
   user  system elapsed
 59.185  15.303  74.496
> system.time( DT[, sum(v1), keyby="id1,id2"] )
   user  system elapsed
180.160  19.953 200.137
> system.time( DT[, sum(v1), keyby="id1,id2"] )
   user  system elapsed
204.208  39.651 243.889
> system.time( DT[, list(sum(v1),mean(v3)), keyby=id3] )
    user   system  elapsed
1037.451   51.269 1088.853
> system.time( DT[, list(sum(v1),mean(v3)), keyby=id3] )
    user   system  elapsed
1023.068   29.556 1052.753
> system.time( DT[, lapply(.SD, mean), keyby=id4, .SDcols=7:9] )
   user  system elapsed
 73.123  18.026  91.160
> system.time( DT[, lapply(.SD, mean), keyby=id4, .SDcols=7:9] )
   user  system elapsed
 70.523   8.951  79.483
> system.time( DT[, lapply(.SD, sum), keyby=id6, .SDcols=7:9] )
   user  system elapsed
489.294  36.192 525.548
> system.time( DT[, lapply(.SD, sum), keyby=id6, .SDcols=7:9] )
   user  system elapsed 
488.316  28.808 517.188 

Timings are slower than they were in the past, but AFAIK this is what we observed in other issues: newer version of R was introducing an overhead that data.table was later addressing in newer versions. So if users upgrade R, then they should also upgrade data.table.

@jangorecki
Copy link
Member Author

jangorecki commented Nov 22, 2020

I tweaked little bit the error message

diff --git a/src/gsumm.c b/src/gsumm.c
index 372ae594..d1e006d4 100644
--- a/src/gsumm.c
+++ b/src/gsumm.c
@@ -112,8 +112,9 @@ SEXP gforce(SEXP env, SEXP jsub, SEXP o, SEXP f, SEXP l, SEXP irowsArg) {
     int highSize = ((nrow-1)>>shift) + 1;
     //Rprintf(_("When assigning grp[o] = g, highSize=%d  nb=%d  shift=%d  nBatch=%d\n"), highSize, nb, shift, nBatch);
     int *counts = calloc(nBatch*highSize, sizeof(int));  // TODO: cache-line align and make highSize a multiple of 64
+    if (!counts) error(_("Internal error: Failed to allocate counts when assigning g in gforce"));
     int *TMP   = malloc(nrow*2*sizeof(int));
-    if (!counts || !TMP ) error(_("Internal error: Failed to allocate counts or TMP when assigning g in gforce"));
+    if (!TMP ) error(_("Internal error: Failed to allocate TMP when assigning g in gforce"));
     #pragma omp parallel for num_threads(getDTthreads(nBatch, false))   // schedule(dynamic,1)
     for (int b=0; b<nBatch; b++) {
       const int howMany = b==nBatch-1 ? lastBatchSize : batchSize;

and we now know that it is allocation of TMP that is failing

int *TMP = malloc(nrow*2*sizeof(int));

which in case of 2e9 rows is trying to allocate 1.6e+10 bytes which is around 15GB.

/usr/bin/time -v of R process shows it used 195GB peak memory

        Command being timed: "R --vanilla"
        User time (seconds): 10688.91
        System time (seconds): 1511.89
        Percent of CPU this job got: 325%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:02:28
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 204565100
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 18
        Minor (reclaiming a frame) page faults: 336922011
        Voluntary context switches: 15427
        Involuntary context switches: 31926
        Swaps: 0
        File system inputs: 3384
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Machine (another one) is 64 cores and 256GB mem machine, looking at free -h we see buff/cache takes 45GB thus we have only 200GB "free" mem but according to https://www.linuxatemyram.com/ it is the "available" mem that we should care about, and that looks good.

              total        used        free      shared  buff/cache   available
Mem:           246G        708M        200G        1.1M         45G        244G
Swap:            0B          0B          0B

@jangorecki
Copy link
Member Author

I tried scaling up memory to 488GB machine

> require(data.table)                        
Loading required package: data.table         
> N=2e9; K=100                                          
> set.seed(1)                          
> DT <- data.table(                       
+   id1 = sample(sprintf("id%03d",1:K), N, TRUE),      # large groups (char)
+   id2 = sample(sprintf("id%03d",1:K), N, TRUE),      # large groups (char)
+   id3 = sample(sprintf("id%010d",1:(N/K)), N, TRUE), # small groups (char)
+   id4 = sample(K, N, TRUE),                          # large groups (int)
+   id5 = sample(K, N, TRUE),                          # large groups (int)
+   id6 = sample(N/K, N, TRUE),                        # small groups (int)
+   v1 =  sample(5, N, TRUE),                          # int in range [1,5]
+   v2 =  sample(5, N, TRUE),                          # int in range [1,5]
+   v3 =  sample(round(runif(100,max=100),4), N, TRUE) # numeric e.g. 23.5749
+ )
> cat("GB =", round(sum(gc()[,2])/1024, 3), "\n")
GB = 98.344
> system.time( DT[, sum(v1), keyby=id1] )
Error in gforce(thisEnv, jsub, o__, f__, len__, irows) :
  Internal error: Failed to allocate counts or TMP when assigning g in gforce
Calls: system.time -> [ -> [.data.table -> gforce
Timing stopped at: 48.78 35.34 4.747
Execution halted
Command exited with non-zero status 1
        Command being timed: "R -f grp.R"
        User time (seconds): 2459.06
        System time (seconds): 160.98
        Percent of CPU this job got: 103%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 42:20.94
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 204562564
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 82
        Minor (reclaiming a frame) page faults: 65458346
        Voluntary context switches: 748
        Involuntary context switches: 4269
        Swaps: 0
        File system inputs: 37712
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 1

R 4.0.3, peak memory was again 195GB despite having much more available memory.

              total        used        free      shared  buff/cache   available
Mem:           480G        745M        479G        1.1M        204M        477G
Swap:            0B          0B          0B

@ColeMiller1
Copy link
Contributor

ColeMiller1 commented Dec 4, 2020

I thought #4295 had been closed. Does #4297 fix this?

@jangorecki
Copy link
Member Author

Thanks, yes it seems to be duplicate of #4818. Not closing yet to ensure that fix or the other issue will resolve this.

@jangorecki jangorecki linked a pull request Dec 4, 2020 that will close this issue
@jangorecki jangorecki added the GForce issues relating to optimized grouping calculations (GForce) label Dec 4, 2020
@jangorecki jangorecki added this to the 1.13.5 milestone Dec 9, 2020
@jangorecki
Copy link
Member Author

I re-run latest master to ensure issue is fix. Below are the timings using default 50% cores and 100% as well. Comparing to data.table 1.9.2 (Feb 2014), running on R 4.0.3, timings are from 2 up to 10 times smaller in recent master.

Using 16 cores (50%)

> cat("GB =", round(sum(gc()[,2])/1024, 3), "\n")
GB = 98.345
> system.time( DT[, sum(v1), keyby=id1] )
   user  system elapsed
229.022  63.936  28.119
> system.time( DT[, sum(v1), keyby=id1] )
   user  system elapsed
221.368 110.280  33.301
> system.time( DT[, sum(v1), keyby="id1,id2"] )
   user  system elapsed
188.338  51.989  25.734
> system.time( DT[, sum(v1), keyby="id1,id2"] )
   user  system elapsed
191.528  57.748  26.474
> system.time( DT[, list(sum(v1),mean(v3)), keyby=id3] )
    user   system  elapsed
1026.957   99.615  135.896
> system.time( DT[, list(sum(v1),mean(v3)), keyby=id3] )
    user   system  elapsed
1027.012  124.504  143.182
> system.time( DT[, lapply(.SD, mean), keyby=id4, .SDcols=7:9] )
   user  system elapsed
347.582  60.448  46.586
> system.time( DT[, lapply(.SD, mean), keyby=id4, .SDcols=7:9] )
   user  system elapsed
295.451 120.263  46.806
> system.time( DT[, lapply(.SD, sum), keyby=id6, .SDcols=7:9] )
   user  system elapsed
598.434 153.367  85.222
> system.time( DT[, lapply(.SD, sum), keyby=id6, .SDcols=7:9] )
   user  system elapsed
580.999  87.354  69.366

Using 32 cores (100%)

> system.time( DT[, sum(v1), keyby=id1] )
   user  system elapsed         
236.456 151.958  24.138
> system.time( DT[, sum(v1), keyby=id1] )
   user  system elapsed
290.847  89.319  20.829
> system.time( DT[, sum(v1), keyby="id1,id2"] )
   user  system elapsed
346.741 102.513  26.578
> system.time( DT[, sum(v1), keyby="id1,id2"] )
   user  system elapsed
338.296  88.455  24.679
> system.time( DT[, list(sum(v1),mean(v3)), keyby=id3] )
    user   system  elapsed
1910.226  267.227  138.072
> system.time( DT[, list(sum(v1),mean(v3)), keyby=id3] )
    user   system  elapsed
1932.545  137.534  123.109
> system.time( DT[, lapply(.SD, mean), keyby=id4, .SDcols=7:9] )
   user  system elapsed
430.512 104.719  35.912
> system.time( DT[, lapply(.SD, mean), keyby=id4, .SDcols=7:9] )
   user  system elapsed
435.340  95.504  42.699
> system.time( DT[, lapply(.SD, sum), keyby=id6, .SDcols=7:9] )
    user   system  elapsed
1088.153  137.154   62.782
> system.time( DT[, lapply(.SD, sum), keyby=id6, .SDcols=7:9] )
    user   system  elapsed
1059.562  254.364   74.738

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark GForce issues relating to optimized grouping calculations (GForce) regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants