You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An R function I'd find useful: diagnostics of tables joins---column overlap to start, multiple matches, # of non-matches, missing keys & how they were handled, maybe a venn diagram of column names, etc. Something like this already exist? Joining is a high risk data cleaning task
I'd advocate for a new diagnostics logical argument (whose default behaviour could fall back to an option), since verbose already returns quite a lot of unrelated information.
Worth noting that the tidylog package largely provides this behaviour for dplyr joins.
library(dplyr, warn.conflicts=FALSE)
library(tidylog, warn.conflicts=FALSE)
x=data.frame(A=1:5, B=6:10)
y=data.frame(A= c(1, 4), C=LETTERS[c(1, 4)])
left_join(x, y)
#> Joining, by = "A"#> left_join: added one column (C)#> > rows only in x 3#> > rows only in y (0)#> > matched rows 2#> > ===#> > rows total 5#> A B C#> 1 1 6 A#> 2 2 7 <NA>#> 3 3 8 <NA>#> 4 4 9 D#> 5 5 10 <NA>
As per John's tweet, additional data.table-specific information like keys would be valuable here too. Doesn't resolve, but would go quite far towards heading off issues like #4888 and #4891.
Prompted by this tweet by @johnjosephhorton (whom, I hope, doesn't mind being tagged).
I'd advocate for a new
diagnostics
logical argument (whose default behaviour could fall back to an option), sinceverbose
already returns quite a lot of unrelated information.Worth noting that the tidylog package largely provides this behaviour for dplyr joins.
Created on 2021-12-27 by the reprex package (v2.0.1)
As per John's tweet, additional data.table-specific information like keys would be valuable here too. Doesn't resolve, but would go quite far towards heading off issues like #4888 and #4891.
Session Info
Created on 2021-12-27 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: