You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For a specific set-up of two data.tables a join does not deliver the results I am expecting.
library(data.table)
# In the code below the join does not deliver the result I would expect
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[colname_with_suffix == "not found", ] # automatically creates index on colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
# PLEASE NOTE: same result with slightly different syntax: DT1[DT2, lookup_result := i.lookup_result, on=c(colname="lookup")][]
# colname colname_with_suffix lookup_result
# 1: test1 other NA
# 2: test2 test NA
# 3: test2 includes test within NA
# 4: test3 other 3
# Expected result:
# colname colname_with_suffix lookup_result
# 1: test1 other 1
# 2: test2 test 2
# 3: test2 includes test within 2
# 4: test3 other 3
For the following variations the join works as expected. The unexpected behaviour above seems to be occurring only, if an index exists on a column having a column name being the prefix of the join column name and both having similar text contents.
# For all following alternatives the join delivers the correct result
# (a) Same data tables as above, but no index
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
# (b) Index on DT2, but completely different values in indexed column than in join column
DT1 <- data.table(colname=c("test1","test2","test2","test3"), colname_with_suffix=c("other","other","other","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[colname_with_suffix == "not found", ] # automatically creates index on colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
# (c) Index on DT2, similar values in indexed column, but indexed column name is not a prefix of join column name
DT1 <- data.table(colname=c("test1","test2","test2","test3"), x.colname_with_suffix=c("other","test","includes test within","other"))
DT2 <- data.table(lookup=c("test1","test2","test3"), lookup_result=c(1,2,3))
DT1[x.colname_with_suffix == "not found", ] # automatically creates index on x.colname_with_suffix
DT1[DT2, lookup_result := i.lookup_result, on=c("colname"="lookup")][]
SessionInfo:
# R version 3.3.2 (2016-10-31)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
#
# locale:
# [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C LC_TIME=German_Germany.1252
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] data.table_1.10.0
#
# loaded via a namespace (and not attached):
# [1] tools_3.3.2
Please note that the same behavior occurs for data.table 1.10.4 and R.Version 3.4.2 under Windows and also Ubuntu Linux 14.04.
The text was updated successfully, but these errors were encountered:
@MarkusBonsch I have just applied your commit to the most-recent dev version of data.table and tested it with the two examples from the linked SO question.
This issue is based on my question on stackoverflow.
For a specific set-up of two data.tables a join does not deliver the results I am expecting.
For the following variations the join works as expected. The unexpected behaviour above seems to be occurring only, if an index exists on a column having a column name being the prefix of the join column name and both having similar text contents.
SessionInfo:
Please note that the same behavior occurs for data.table 1.10.4 and R.Version 3.4.2 under Windows and also Ubuntu Linux 14.04.
The text was updated successfully, but these errors were encountered: