You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Looking again at #1596 , one point against it is that some folks may be taking advantage of the default V1, V2, ... names, so auto-assigning different names would break their code.
One alternative, that would address my primary use-case, would be for CJ to not need names during a join:
library(data.table)
DT = unique(data.table(datasets::CO2)[, .(Plant, Type, Treatment)])
setkey(DT, Plant, Type)
# good -- no names needed using list
DT[.("Mc1", "Mississippi"), .N, by=.EACHI]
# good -- no names needed using CJ and (implicitly) on = key
DT[CJ(Plant, Type, unique=TRUE), .N, by = .EACHI]
# bad -- breaks for explicit on = key
DT[CJ(Plant, Type, unique=TRUE), on=key(DT)]
# bad -- breaks for on=some-non-key
DT[CJ(Plant, Treatment, unique = TRUE), on=.(Plant, Treatment)]
So I guess I'm asking for an exception in [.data.table that ignores names in i when i=CJ(...), similar to how names are ignored with list inputs or when on= is implicitly the key.
Of course, my desired syntax would also work if FR #1596 went through.
The text was updated successfully, but these errors were encountered:
@MichaelChirico Great! Just to clarify, autonaming solves #1596 and the example given here but not the broader case I meant:
library(data.table)
DT = unique(data.table(datasets::CO2)[, .(Plant, Type, Treatment)])
setkey(DT, Plant, Type)
f = function(myplants, mytypes) DT[CJ(myplants, mytypes), on=key(DT)]
f(c("Qn1", "Qc2"), "Quebec") # error: i is a table and so must have correct names
# or even more generally
g = function(..., d = DT) d[do.call(CJ, list(...)), on=key(d)]
g(c("Qn1", "Qc2"), "Quebec") # error: is is a table and so must have correct names
What I mean by "must have correct names" is that unnamed lists don't have the same requirements:
DT[unname(unclass(CJ(c("Qn1", "Qc2"), "Quebec"))), on=key(DT)] # works
So I'm hoping for special parsing of i that recognizes that CJ is the top call and then "ignores" i's names in the same way lists in i get away without having correct names (when on= is an equi join only referring to x columns).
Not sure if that's reasonable, and it's certainly not a big deal if I'm writing functions as above (since I can just add unname + unclass or similar) ... so the request is pretty much just for easier interactive data exploration.
EDIT: Hm, another resolution would be to let CJ return an unnamed unclassed list via some new argument (bringing it in line, eg, with the behavior of shift), though that seems like overkill if it only serves to cover this use case.
jangorecki
changed the title
[Request] Don't inspect i=CJ(...) names in DT[CJ(...), on=] joins
Don't inspect i=CJ(...) names in DT[CJ(...), on=] joins
Apr 6, 2020
jangorecki
added
the
joins
Use label:"non-equi joins" for rolling, overlapping, and non-equi joins
label
Apr 6, 2020
Looking again at #1596 , one point against it is that some folks may be taking advantage of the default V1, V2, ... names, so auto-assigning different names would break their code.
One alternative, that would address my primary use-case, would be for
CJ
to not need names during a join:So I guess I'm asking for an exception in
[.data.table
that ignores names ini
wheni=CJ(...)
, similar to how names are ignored with list inputs or whenon=
is implicitly the key.Of course, my desired syntax would also work if FR #1596 went through.
The text was updated successfully, but these errors were encountered: