-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behavior in keyed/unkeyed joins against duplicate columns #4891
Comments
In fact, it looks more like a bug when we make the second column have a non-matching type:
|
I think this might be the same as #4888? |
What is expected behavior with duplicated column names and joins? I would expect an error indicating there are duplicated names in the key. Source of the issue: Lines 438 to 445 in 97c96b2
Proposed solution: if (anyDuplicated(key(x)))
stop("There are duplicated names in the key of X of the X[Y] join. To fix, rename the names with setnames()")
else if (haskey(i) && anyDuplicated(key(i)))
stop("There are duplicated names in the key of Y of the X[Y] join. To fix, rename the names with setnames().") A more global proposal would be to do more erroring when data.tables are made with duplicate names. See also #3077. |
I think an error is the right way to go, probably we should error in The fix looks good, my nit is that I would include which columns are duplicated in the error message for user-friendliness. |
If we also include |
I don't see as much of a problem with duplicate names in the table in general as I do for the keys. For example duplicate column names can be essential for formatting output tables; forcing users to create workarounds for this common case seems like an unnecessary burden to me. If there's really a compelling case to try and block duplicate column names, perhaps we could expose that through an option (e.g. |
The problem with For example
leads you back to the problem of duplicated keys although not directly setting them in the first place. |
I see... I was thinking of the general case of duplicate names in |
Following up on this thread, recently noticed an issue where, after running Might be the same as #4088 hence mentioning here. This may very well be user error - have not had time to generate a MWE. Thanks for your help. |
Observed while answering this SO Question: https://stackoverflow.com/a/66041678/3576984
Observe the difference of when
DT2
is keyed vs not:Is there some reason the first case should be intended behavior?
The
verbose
output suggests it starts doing the right thing, then gets tripped up later on:The text was updated successfully, but these errors were encountered: