-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[.data.table
needs to be more modular.
#852
Comments
can you elaborate on this? |
My suggestion regarding this issue would be to isolate |
Any ideas on this? Currently ...
if (!missing(i)) {
...
if (is.call(isub) && isub[[1L]] == as.name("order") && getOption("datatable.optimize") >= 1) {...}
...
if (is.call(isub) && isub[[1L]] == quote(forder)) {...} else if (is.call(isub) && getOption("datatable.auto.index") &&
as.character(isub[[1L]]) %chin% c("==","%in%") &&
is.name(isub[[2L]]) &&
(isub2<-as.character(isub[[2L]])) %chin% names(x) &&
is.null(attr(x, '.data.table.locked'))) {...}
} else {...}
...
if (missing(j)) {...} else {
...
if (is.call(jsub) && jsub[[1L]]==as.name(":=")) {...}
...
}
... Above code in a proposed structure: # control flow conditions evaluation
do_i = !missing(i)
do_j = !missing(j)
optimize_i = do_i && getOption("datatable.optimize") >= 1 && is.call(isub) && isub[[1L]] == as.name("order")
x_unlocked = is.null(attr(x, '.data.table.locked'))
create_index = do_i && getOption("datatable.auto.index") && x_unlocked && is.call(isub) && as.character(isub[[1L]]) %chin% c("==","%in%") && is.name(isub[[2L]]) && as.character(isub[[2L]]) %chin% names(x)
update_by_ref = do_j && is.call(jsub) && jsub[[1L]]==as.name(":=")
# computation on dataset
if(do_i){
if(optimize_i) {...}
if(create_index) {...}
...
} else {...}
...
if (!do_j) {...} else {
...
if (update_by_ref) {...}
...
}
... In above code calls kind of If you would consider switching to such structure I can prepare PR after |
@jangorecki thanks. Yes it should be more functional, agreed. I'll be working on this for v1.9.8. Have spent some time on this already. |
@arunsrinivasan you can push your dev version to new branch, maybe put a task list of features TODO here, so the goals will be defined. It could additionally bring some feedback from the potential contributors. |
Ideally would be to isolate processing to atomic functions, like |
@jangorecki I have been working on code to make the I saw you assigned yourself some related items. Should I hold off? |
I assigned to myself because I am working on it too. Will try to push today, so we can work together. |
I have created Help / feedback welcome. Most of my comments include Single row subset - 1 row out of 1e5 rows
column select - 1e5 rows; 3 columns
looping and selecting single row:
##negating boolean index from parent.frame library(data.table)
n = 1e7 ; grps = 1e2
set.seed(123L)
setDTthreads(1L)
dt = data.table(x = sample(grps, n, TRUE), y = runif(n), z = runif(n))
ind = sample(c(TRUE, FALSE), n, TRUE)
bench::mark(dt[!ind])
##1.12.9 master
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 dt[!ind] 133ms 191ms 5.82 192MB 11.6
##1.12.9 modular_dt2
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 dt[!ind] 115ms 138ms 7.16 154MB 12.5 |
@jangorecki, how does this actually get implemented? I was trying to do a smaller PR for the FWIW, recompiling removing OpenMP (for all tests) provides the following timings: library(data.table)
##setDTthreads(1L) ## doesn't matter because this isn't compiled.
DoSomething <- function(row) someCalculation <- row[["v1"]] + 1
allIterations <- data.table(v1 = runif(1e5), v2 = runif(1e5))
system.time(for (r in 1:nrow(allIterations)) DoSomething(allIterations[r, ]))
system.time(for (r in 1:nrow(allIterations)) DoSomething(allIterations[r, , with=c(i=FALSE)]))
So in summary, |
Just for fun, I calculated the cyclomatic complexity of cyclocomp::cyclocomp(data.table:::`[.data.table`)
# [1] 1033 For some context, |
No description provided.
The text was updated successfully, but these errors were encountered: