-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
:= could warn/error when RHS is ambiguous #3077
Comments
(Looking at the title...) Seems like a useful precaution in any |
Thinking again, maybe this can be solved up front by adding gateways in Not sure it makes sense to run this code repeatedly in every |
Yeah, a warning upstream like you suggest would be more useful to me, though there are so many points at which data.tables are created, it may be too much of a headache that way, too. (Eg, besides the ones you mention, there's even For me, |
right, since it's such an edge case and fread/setDT already probably cover
95% of data.table creation (fundamental flaw of input data really), I'd be
fine letting users sort it out themselves in other cases...
the behavior's unexpected only to the extent that you don't understand the
table in the first place, which is relatively more likely at the
fread/setDT junctures as well
…On Thu, Sep 27, 2018, 12:37 PM Frank ***@***.***> wrote:
Yeah, a warning upstream like you suggest would be more useful to me,
though there are so many points at which data.tables are created, it may be
too much of a headache that way, too. (Eg, besides the ones you mention,
there's even rbindlist(list(list(x = 1, x = 2, x = 3))))
For me, := isn't special relative to other DT[...] contexts where I might
make such a mistake: SC[, table(Nomination)]; SC[, .N, by=Nomination]. I
don't feel strongly about it, though and this problem hardly affects me
(since I work with predictable data structures these days).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3077 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHQQdWrwG0vkj4xE6DNSKA73j9KTbhRWks5ufFX6gaJpZM4W7u90>
.
|
Have a fix for this issue like this:
But it breaks tests
Not sure how married we are to the behavior laid out here (a comment in tests hardly seems like documentation)... thoughts @jangorecki / @mattdowle ? |
As for me I would move towards not supporting duplicated column names wherever possible. The only use case I can think about is to produce table for reporting/formatting. |
Hmm with hindsight I'd rather not pursue this. Duplicate names can be a pain but they are a fact of life -- I suspect it would cause more pain than help to impose this on user workflows. E.g. I've recently dealt with this working with {dbplyr} -- I want to run a query like (distilled): SELECT A.*, B.*
FROM A JOIN B USING (x,y,z) But Another example: I received |
Was scraping the table here:
https://en.wikipedia.org/wiki/List_of_nominations_to_the_Supreme_Court_of_the_United_States
And didn't realize the output had duplicated column names:
nominated
will attempt to compute the RHS and will select the firstNomination
column.This is easily avoided by being careful about
check.names
up front, but if it's easy to check/warn on the fly, we should.The text was updated successfully, but these errors were encountered: