You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was working my way through the data.table vignette Secondary indices and auto indexing. To make it easier to compare original data and results, I replaced the non-minimal "flights" data with much smaller datasets throughout.
It worked fine until section 2f "Aggregation using by", where on is combined with keyby. There we find code to "Get the maximum departure delay for each month corresponding to origin = "JFK". Order the result by month":
flights["JFK", max(dep_delay), keyby = month, on = "origin"]
When I tried the equivalent code on two different data sets, two different issues appeared:
1. Label of the keyby and by variable identical for different levels
d1 <- data.table(x = rep(c("a", "b"), each = 4), y = 1:0, z = c(3, 6, 8, 5, 4, 1, 2, 7))
d1
# x y z
# 1: a 1 3
# 2: a 0 6
# 3: a 1 8
# 4: a 0 5
# 5: b 1 4 <~~ max z for x = b & y = 1
# 6: b 0 1
# 7: b 1 2
# 8: b 0 7 <~~ max z for x = b & y = 0
Translating the code from the vignette: Get the maximum z for each y corresponding to x = "b". Order the result by y
d1["b", max(z), keyby = y, on = "x"]
# y V1
# 1: 1 7 <~~ y should be 0 here
# 2: 1 4
The label of the keyby variable y is erroneously 1 also for the 0 level.
Also when using by instead of keyby together with on the labels are wrong:
d1["b", max(z), by = y, on = "x"]
# y V1
# 1: 1 4
# 2: 1 7 <~~ y should be 0 here
Just to verify that corresponding code without on works fine:
So far I have not been able to discern any particular pattern in the data which generates these results (e.g. any particular order of on and/or keyby variables in the original and/or subset data).
Can you spot any mistakes in my code or is there something strange going on here?
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
data.table_1.9.8
The text was updated successfully, but these errors were encountered:
I was working my way through the
data.table
vignette Secondary indices and auto indexing. To make it easier to compare original data and results, I replaced the non-minimal "flights" data with much smaller datasets throughout.It worked fine until section 2f "Aggregation using
by
", whereon
is combined withkeyby
. There we find code to "Get the maximum departure delay for each month corresponding to origin = "JFK". Order the result by month":When I tried the equivalent code on two different data sets, two different issues appeared:
1. Label of the
keyby
andby
variable identical for different levelsTranslating the code from the vignette: Get the maximum
z
for eachy
corresponding tox = "b"
. Order the result byy
The label of the
keyby
variabley
is erroneously1
also for the0
level.Also when using
by
instead ofkeyby
together withon
the labels are wrong:Just to verify that corresponding code without
on
works fine:2. (a) Label of the
keyby
(orby
) variable and the resulting value do not match. (b) Result not sorted by thekeyby
variable.Again, equivalent desired outcome: get the maximum
hp
for eachvs
corresponding toam = 0
. Order the result byvs
.First, just look at the data corresponding to
am = 0
to easier spot the desired result:When combining
keyby
andon
, the result is not sorted and the labels don't match the values:When combining
by
andon
, the labels don't match the values:Subsetting without
on
works fine:So far I have not been able to discern any particular pattern in the data which generates these results (e.g. any particular order of
on
and/orkeyby
variables in the original and/or subset data).Can you spot any mistakes in my code or is there something strange going on here?
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
data.table_1.9.8
The text was updated successfully, but these errors were encountered: