-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in result of non-equi join, table is not actually correctly sorted even though it's keyed #4603
Comments
This bug is not due to the non-equi join having the same name in both x and y. The following code (where interval columns in x and y have different names) produces the same bug:
|
Thank you for the report - I can reproduce. Note, the bug is slightly different than reported- the table has the key incorrectly labeled as it should just be id: library(data.table) #1.12.8
x = data.table(id = rep(1L, 5L),
start = 1:5,
end = 2:6)
setkey(x, id, start, end)
y = data.table(id = rep(1L, 5L),
start = c(15L, 13L, 14L, 12L, 11L),
end = c(26L, 20L, 23L, 24L, 22),
v1 = 5:1)
setkey(y, id)
z = x[y,
on = .(id,
start <= end,
end >= start),
.(v1),
by = .EACHI]
key(z) ## should only be id!
#> [1] "id" "start" "end"
z ##not actually sorted!
#> id start end v1
#> <int> <int> <int> <int>
#> 1: 1 26 15 5
#> 2: 1 20 13 4
#> 3: 1 23 14 3
#> 4: 1 24 12 2
#> 5: 1 22 11 1 If instead we had set the key on setkey(y, id, start, end)
z = x[y,
on = .(id,
start <= end,
end >= start),
.(v1),
by = .EACHI]
key(z)
## [1] "id" "start" "end"
identical(z$end,sort(z$end))
# [1] TRUE
z
## id start end v1
## <int> <int> <int> <int>
##1: 1 22 11 1
##2: 1 24 12 2
##3: 1 20 13 4
##4: 1 23 14 3
##5: 1 26 15 5 I will work on a fix Finally, in the future it is sometimes helpful to see some output. I normally use |
If something is keyed on a variable but not sorted by that variable I would call that "not correctly sorted" conditional on that key existing as an attribute. But yes you're right the result should just be keyed and sorted on the variables specified in the key of y. |
I guess the point that I'm making is that if the return table didn't have the expected key but the return table was sorted according to that unexpected key, it wouldn't be as big as a bug (unless you're relying on knowing that the result of a join takes the key of y, this would be not likely to cause issues especially considering that it's reasonable to explicitly set the key of z prior to doing merging with it). The actual bug that is present is a bigger deal since explicitly setting the key to what you want isn't necessarily going to undo the bug as demonstrated in my code. |
I'm also happy to take a look at this and submit a PR. I've been busy with other things the last few days so I haven't tried to debug it yet. |
@myoung3 I wanted to point out to the data.table maintainers what the root issue is. The original example was unclear as it appeared that the expected output was for it to be sorted by stopifnot(identical(z$end_date,sort(z$end_date))) I agree that the unexpected key should in fact be accurate. I am still working on a fix and should have something soon. |
Reproducible in both release and development data.table
sessionInfo
The text was updated successfully, but these errors were encountered: