Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block duplicates in assignment #4634

Closed
wants to merge 2 commits into from
Closed

block duplicates in assignment #4634

wants to merge 2 commits into from

Conversation

MichaelChirico
Copy link
Member

Closes #3077

on hold until we decide the right behavior (see issue comments)

@MichaelChirico MichaelChirico changed the title WIP: block duplicates in assignment block duplicates in assignment Jul 23, 2020
@ColeMiller1
Copy link
Contributor

I agree we should prevent duplicate names. This PR seems to mainly detect that a data.table has duplicate names and then errors if one of those duplicated names is used in j. To correct, I assume you would use a setnames(...) call?

Fundamentally, it seems like if we more-or-less lock people out of their data.table in j once there are duplicated names, it would be better to instead catch upstream during creation. As is, I might be surprised with the error if I randomly stumbled on a messy data source:

library(data.table)
DT  <- data.table(x=1:2, y=3:4, x=5:6, x=7:8, y=9:10, z=11:12)
DT[, x] ##copied from one of the AppVeyor for test 1229.28
## Observed: Unable to disambiguate reference to duplicated columns in x: [x]

I also thought this PR would address users trying to assign duplicate names but I do not believe it does. This would still work (although it may be out-of-scope of the FR / PR):

data.table(x = 1)[ , .(y = 1, y = 2)]
##       y     y
##   <num> <num>
##1:     1     2

Another idea for if we are mainly erroring when users select a name that is already duplicated, it would be nice to have a setnames(..., repair = FALSE) option so that an error message can lead the user to a helpful tool.

## VIOLATES TESTED BEHAVIOR WRT DUPLICATES, SEE TESTS 1290 **
if (anyDuplicated(used_from_x <- dupintersect(names_x, av))) {
dupcols = unique(used_from_x[duplicated(used_from_x)])
stop("Unable to disambiguate reference to duplicated columns in x: ", brackify(dupcols))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

stop(gettextf("Unable to disambiguate to duplicated columns in %s: %s", deparse(substitute(x)), brackify(dupcols), domain = ...)

This will allow the original name of the data.table to print instead of x.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great in simple cases, but e.g. data.table(x = 1)[[ , c('y', 'y') := .(1, 2)] and other dynamic cases, or when a function the user calls is the one that creates the error, it may not be as helpful. There might be an argument for building something like guess_name(x) that checks for the "non-standard" cases and returns x, otherwise a user-friendly name, but that also comes with some maintenance burden...

Copy link
Contributor

@ColeMiller1 ColeMiller1 Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. Here is a simple alternative that should both be maintainable but still allow for user-friendly messages.

fx = function(x) {
  print(if (is.name(x_sub <- substitute(x))) deparse(x_sub) else "x")
}

fx(iris)
#> [1] "iris"
fx(iris[])
#> [1] "x"

edit - and I do not see this as a separate function. I would just drop it into the gettextf(...) biti.

@MichaelChirico
Copy link
Member Author

to be clear I would not block duplicates in all cases -- e.g. data.table(x = 1)[ , .(y = 1, y = 2)], things like that can be used for output in tables (I have certainly used a cludge like this to make 2-row headers in e.g. LaTeX tables slightly easier)

@MichaelChirico MichaelChirico marked this pull request as draft December 14, 2023 11:17
@MichaelChirico
Copy link
Member Author

Closing as discussed in #3077

@MichaelChirico MichaelChirico deleted the assign_lhs_dup branch August 29, 2024 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

:= could warn/error when RHS is ambiguous
2 participants