-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected missing matches with non-equi join with grouping by .EACHI #4911
Comments
reproduced on dev data.table in R 4.0.3 (windows x86) library(data.table)
X <- as.data.table(as.data.frame(structure(list(id = c(6456372L, 6456372L, 6456372L, 6456372L,
6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L,
6456372L, 6456372L, 6456372L), id_round = c(197801L, 199405L,
199501L, 197901L, 197905L, 198001L, 198005L, 198101L, 198105L,
198201L, 198205L, 198301L, 198305L, 198401L), field = c(NA, NA,
NA, "medicine", "medicine", "medicine", "medicine", "medicine",
"medicine", "medicine", "medicine", "medicine", "medicine", "medicine"
)), class = c("data.table", "data.frame"
), sorted = "id")))
Y <- as.data.table(as.data.frame(structure(list(id = c(6456372L, 6456345L, 6456356L), id_round = c(197705L,
197905L, 201705L), field = c("medicine", "teaching", "health"
), prio = c(6L, 1L, 10L)), class = c("data.table",
"data.frame"), sorted = c("id_round",
"id", "prio", "field"))))
X[Y, on = .(id, id_round > id_round, field), .(x.id_round[1], i.id_round[1]), by = .EACHI]
#> id id_round field V1 V2
#> 1: 6456372 197705 medicine 197901 197705
#> 2: 6456345 197905 teaching NA 197905
#> 3: 6456356 201705 health NA 201705
temp <- X[Y, on = .(id, id_round > id_round, field), .(x.id_round[1], i.id_round[1]), by = .EACHI]
temp[id == 6456372]
#> id id_round field V1 V2
#> 1: 6456372 197705 medicine 197901 197705
as.data.table(as.data.frame(temp))[id == 6456372]
#> id id_round field V1 V2
#> 1: 6456372 197705 medicine 197901 197705 Created on 2021-02-19 by the reprex package (v0.3.0) |
coercing X and Y to data.frame then back to data.table fixes this, so this is almost certainly an issue introduced in library(data.table)
X <- as.data.table(as.data.frame(structure(list(id = c(6456372L, 6456372L, 6456372L, 6456372L,
6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L,
6456372L, 6456372L, 6456372L), id_round = c(197801L, 199405L,
199501L, 197901L, 197905L, 198001L, 198005L, 198101L, 198105L,
198201L, 198205L, 198301L, 198305L, 198401L), field = c(NA, NA,
NA, "medicine", "medicine", "medicine", "medicine", "medicine",
"medicine", "medicine", "medicine", "medicine", "medicine", "medicine"
)), class = c("data.table", "data.frame"
), sorted = "id")))
Y <- as.data.table(as.data.frame(structure(list(id = c(6456372L, 6456345L, 6456356L), id_round = c(197705L,
197905L, 201705L), field = c("medicine", "teaching", "health"
), prio = c(6L, 1L, 10L)), class = c("data.table",
"data.frame"), sorted = c("id_round",
"id", "prio", "field"))))
X[Y, on = .(id, id_round > id_round, field), .(x.id_round[1], i.id_round[1]), by = .EACHI]
#> id id_round field V1 V2
#> 1: 6456372 197705 medicine 197901 197705
#> 2: 6456345 197905 teaching NA 197905
#> 3: 6456356 201705 health NA 201705
temp <- X[Y, on = .(id, id_round > id_round, field), .(x.id_round[1], i.id_round[1]), by = .EACHI]
temp[id == 6456372]
#> id id_round field V1 V2
#> 1: 6456372 197705 medicine 197901 197705
as.data.table(as.data.frame(temp))[id == 6456372]
#> id id_round field V1 V2
#> 1: 6456372 197705 medicine 197901 197705 Created on 2021-02-19 by the reprex package (v0.3.0) |
This is a duplicate of #4603 . Let's keep it open and thanks for the report. The problem is that
I do believe I have something on my computer. Let me dust it off and submit something this weekend. |
Well that explains why this seemed familiar. |
Thanks!! |
Working with proprietary data so was a bit tricky creating a reproducible example but think this works.
So everything seems to work fine, but these results are supposed to be merged back into the main data set Y and here is where I run in to trouble. It does not merge and moreover I cannot subset by id anymore:
Expecting to find a match here of course. The strange thing is that it works if I drop
by=.EACHI
or if I drop the last key column "prio":Y is keyed by "prio" but it is not included in the join. It seems to be related to the id number's relation to the other numbers, cause if I change the number to 6456344 or anything lower I get the expected results.
Running latest dev:
The text was updated successfully, but these errors were encountered: