-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gshift as gforce optimized shift #5205
Conversation
@@ -1162,3 +1167,83 @@ SEXP gprod(SEXP x, SEXP narmArg) { | |||
return(ans); | |||
} | |||
|
|||
SEXP gshift(SEXP x, SEXP nArg, SEXP fillArg, SEXP typeArg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we prefer to re-use the code in shift.c here (to the extent possible)? otherwise it'll be a pain to always update code in both places going forward.
It might require some refactoring/modularizing of the original shift.c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try to recycle as much as possible but there are some different things to consider about both implementations.
Basic shift | gshift |
---|---|
works on data.tables/vectors/lists | operates on vectors |
main workhouse is memmove |
used memory may not be blocked |
no subset on i | concurrent subsetting on irows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe basic shift could be redirected to call gshift then? It would have to call gforce prep function first, perhaps in a dummy way to trick gshift into working. Then it would be common code and all existing tests of shift would cover the new gforce gshift too.
Perhaps not now. Happy to merge this PR as is, subject to integer64. A task could be created to redirect shift to gshift in future.
use match.call to deactivate gforce for calls where fill expr
Codecov Report
@@ Coverage Diff @@
## master #5205 +/- ##
=======================================
Coverage 99.50% 99.50%
=======================================
Files 77 77
Lines 14536 14590 +54
=======================================
+ Hits 14464 14518 +54
Misses 72 72
Continue to review full report at Codecov.
|
Benchmarkslibrary(data.table)
library(microbenchmark)
basic_shift = shift
bench = function(DT) print(microbenchmark(
DT[, shift(x, 1, type="lag"), y],
DT[, basic_shift(x, 1, type="lag"), y],
DT[, c(NA, head(x,-1)), y],
times = 10L, unit = "s"), signif=4) N = 1e6
DT = data.table(x = sample(N), y = sample(1e2,N,TRUE))
bench(DT)
# Unit: seconds
# expr min lq mean median uq max neval
# DT[, shift(x, 1, type = "lag"), y] 0.01342 0.0146 0.02848 0.02031 0.05030 0.05487 10
# DT[, basic_shift(x, 1, type = "lag"), y] 0.01129 0.0116 0.01416 0.01383 0.01560 0.01855 10
# DT[, c(NA, head(x, -1)), y] 0.01205 0.0169 0.01766 0.01830 0.01901 0.02111 10
DT = data.table(x = sample(N), y = sample(1e3,N,TRUE))
# Unit: seconds
# expr min lq mean median uq max neval
# DT[, shift(x, 1, type = "lag"), y] 0.01092 0.01152 0.01242 0.01207 0.01339 0.01448 10
# DT[, basic_shift(x, 1, type = "lag"), y] 0.02676 0.02710 0.02873 0.02796 0.03022 0.03277 10
# DT[, c(NA, head(x, -1)), y] 0.02064 0.02085 0.02186 0.02115 0.02273 0.02466 10
DT = data.table(x = sample(N), y = sample(1e4,N,TRUE))
bench(DT)
# Unit: seconds
# expr min lq mean median uq max neval
# DT[, shift(x, 1, type = "lag"), y] 0.01328 0.01653 0.0183 0.01722 0.02066 0.02501 10
# DT[, basic_shift(x, 1, type = "lag"), y] 0.22480 0.23000 0.2617 0.23990 0.28300 0.37830 10
# DT[, c(NA, head(x, -1)), y] 0.10440 0.11080 0.1234 0.11710 0.13030 0.15360 10
DT = data.table(x = sample(N), y = sample(1e5,N,TRUE))
bench(DT)
# Unit: seconds
# expr min lq mean median uq max neval
# DT[, shift(x, 1, type = "lag"), y] 0.02344 0.02761 0.03975 0.03406 0.04527 0.0726 10
# DT[, basic_shift(x, 1, type = "lag"), y] 1.99000 2.13600 2.17000 2.17700 2.18500 2.3810 10
# DT[, c(NA, head(x, -1)), y] 0.89570 0.92040 0.93580 0.94220 0.95250 0.9613 10 N = 1e7
DT = data.table(x = sample(N), y = sample(1e2,N,TRUE))
bench(DT)
# Unit: seconds
# expr min lq mean median uq max neval
# DT[, shift(x, 1, type = "lag"), y] 0.2044 0.244 0.2705 0.2527 0.2710 0.4261 10
# DT[, basic_shift(x, 1, type = "lag"), y] 0.1231 0.140 0.1793 0.1447 0.2030 0.3918 10
# DT[, c(NA, head(x, -1)), y] 0.1618 0.172 0.2093 0.1931 0.2416 0.2932 10
DT = data.table(x = sample(N), y = sample(1e3,N,TRUE))
# Unit: seconds
# expr min lq mean median uq max neval
# DT[, shift(x, 1, type = "lag"), y] 0.1984 0.2206 0.2307 0.2243 0.2464 0.2722 10
# DT[, basic_shift(x, 1, type = "lag"), y] 0.1278 0.1481 0.1746 0.1684 0.1943 0.2404 10
# DT[, c(NA, head(x, -1)), y] 0.1460 0.1605 0.1850 0.1727 0.2154 0.2414 10
DT = data.table(x = sample(N), y = sample(1e4,N,TRUE))
bench(DT)
# Unit: seconds
# expr min lq mean median uq max neval
# DT[, shift(x, 1, type = "lag"), y] 0.2686 0.2795 0.2972 0.2984 0.3148 0.3322 10
# DT[, basic_shift(x, 1, type = "lag"), y] 0.4850 0.4913 0.5132 0.5072 0.5424 0.5528 10
# DT[, c(NA, head(x, -1)), y] 0.3696 0.3830 0.4146 0.4233 0.4440 0.4521 10
DT = data.table(x = sample(N), y = sample(1e5,N,TRUE))
bench(DT)
# Unit: seconds
# expr min lq mean median uq max neval
# DT[, shift(x, 1, type = "lag"), y] 0.3696 0.3902 0.4027 0.3939 0.4288 0.4543 10
# DT[, basic_shift(x, 1, type = "lag"), y] 2.5130 2.5400 2.5810 2.5630 2.6230 2.7160 10
# DT[, c(NA, head(x, -1)), y] 1.2890 1.3280 1.3650 1.3700 1.4100 1.4230 10 |
… call (unusual) is distracting; even fill (a call or not) wasn't mentioned in the issue. The main thing is that shift is optimized which is awesome
switch(TYPEOF(x)) { | ||
case LGLSXP: { int *ansd=LOGICAL(tmp); SHIFT(int, LOGICAL, ansd[ansi++]=val); } break; | ||
case INTSXP: { int *ansd=INTEGER(tmp); SHIFT(int, INTEGER, ansd[ansi++]=val); } break; | ||
case REALSXP: { double *ansd=REAL(tmp); SHIFT(double, REAL, ansd[ansi++]=val); } break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just about to merge (looks awesome!) and then just saw integer64
missed here (and test_bit64
added to tests). Bug fix 45 starts shift(xInt64, fill=0)
so now that shift
is optimized I wonder if the shift
problems of int64 would return. Maybe existing tests of shift int64 don't test it by group then.
Can't think of anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added integer64
support. Interestingly we don't do anything special about integer64
anymore in shift.c
or gshift
except using coerceAs
for coercing and copyMostAttrib
for copying class integer64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. The coerceAs
is for fill
but fill
isn't currently tested by the new gforce test 2224. A loop of differently typed fill
needs adding to the test then, and I suspect a specific case here for integer64 will then be needed to pass that? Can I leave that to you, and the C99 warning, hence WIP again.
Added now both, but either one of them should do. |
src/gsumm.c
Outdated
SET_VECTOR_ELT(ans, g, tmp=allocVector(TYPEOF(x), nx)); | ||
#define SHIFT(CTYPE, RTYPE, ASSIGN) { \ | ||
const CTYPE *xd = (const CTYPE *)RTYPE(x); \ | ||
const CTYPE fill = (const CTYPE)RTYPE(thisfill)[0]; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locally I'm seeing this warning (from gcc with -std=c99
I think but it might be due to gcc 10.3.0) :
gsumm.c: In function ‘gshift’:
gsumm.c:1207:26: warning: ISO C forbids casting nonscalar to the same type [-Wpedantic]
1207 | const CTYPE fill = (const CTYPE)RTYPE(thisfill)[0]; \
| ^
gsumm.c:1241:59: note: in expansion of macro ‘SHIFT’
1241 | case CPLXSXP: { Rcomplex *ansd=COMPLEX(tmp); SHIFT(Rcomplex, COMPLEX, ansd[ansi++]=val); } break;
| ^~~~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIC it should also work without the cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the casting of fill
between NA_REAL and NA_INTEGER64 though?
I extended the tests that we introduced in #5189 to also check the coercion between types for In my current understanding it works since under the hood library(bit64)
.Internal(inspect(lim.integer64()))
#> @55a458cea988 14 REALSXP g0c2 [OBJ,ATT] (len=2, tl=0) -4.94066e-324,nan
#> ATTRIB:
#> @55a457f3c630 02 LISTSXP g0c0 [REF(1)]
#> TAG: @55a455431140 01 SYMSXP g1c0 [MARK,REF(34387),LCK,gp=0x6000] "class" (has value)
#> @55a457c6f040 16 STRSXP g0c1 [REF(65535)] (len=1, tl=0)
#> @55a458186ff8 09 CHARSXP g1c2 [MARK,REF(1114),gp=0x61] [ASCII] [cached] "integer64" |
Thanks, I see now. Tweaked the comment there, and all good. |
I suspect this is causing hard to spot issue: #5547 |
Closes #1534
fill=x[.N]
Internal optimization
Currently
shift
is only GForce optimized when its fill argument is not a call, to avoid cases wherefill=x[1]
,fill=x[.N]
, etc. since those would need evaluation within their group.General Remark
The gforce optimization of
shift
would be the first function to always return vectors of the same length asby
. The first step in this direction was made in #5089 by allowingghead
to return multiple values per group. Before that every gforce function returned exactly 1 value per group.The whole
datatable.optimize
optimization code branch has grown pretty fat and might need to be rewritten. This is also related to #1414 #5032 #3815 #735 #523.