-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve helpfulness of warning message during on-assignment type coer… #2989
Conversation
Maybe consider changing tone instead of removing the sentence? Like "Setting the column type as intended up front is the best way to get around this if your RHS has the correct type." or something. Fwiw, for me...
... was the most useful part of the message as a new user and probably something I wouldn't think of from reading the new message.
... gets at the point too indirectly and no longer contains a recommended fix. I don't find the twitter complaint ("it's rude because I know better already") convincing as an argument to remove it when other users might know less than they do. Also, the first pitfall you highlight
...has its own (error) message already:
Also, you refer to it as "this error", though it is a warning, which may be confusing. |
…r truncation and informs user the first truncated item.
Please @franknarf1 and @MichaelChirico review latest refinement before merge. (The drop down under reviewers won't let me pick either of you. Michael because you submitted the PR I guess and Frank because you're not a project member it seems?! But you can both add your reviews I believe.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but the last sentence of the new message refers to class instead of type (comment further down).
(I've never used the github review interface, so please let me know if I've made a mistake. Thanks)
src/assign.c
Outdated
s2 = (char *)type2char(TYPEOF(thisvalue)); | ||
if (isReal(thisvalue)) s3="; may have truncated precision"; else s3=""; | ||
warning("Coerced '%s' RHS to '%s' to match the column's type%s. Either change the target column ['%s'] to '%s' first (by creating a new '%s' vector length %d (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to '%s' (e.g. 1L, NA_[real|integer]_, as.*, etc.) to make your intent clear and for speed. A common source of this error is mismatch of return types when assigning by group, especially in edge cases. Another common pitfall comes from wrong assumptions about your table's column types; check print(x, class = TRUE) or sapply(x, class) to inspect your initial LHS types and be sure they're as expected.", s2, s1, s3, CHAR(STRING_ELT(names, coln)), s2, s2, LENGTH(VECTOR_ELT(dt,0)), s1); | ||
warning("Coerced %s RHS to %s to match the type of the target column (column %d named '%s'). If the target column's type %s is correct, it's best for efficiency to avoid the coercion and create the RHS as type %s. To achieve that consider R's type postfix: typeof(0L) vs typeof(0), and typeof(NA) vs typeof(NA_integer_) vs typeof(NA_real_). You can wrap the RHS with as.%s() to avoid this warning, but that will still perform the coercion. If the target column's type is not correct, it's best to revisit where the DT was created and fix the column type there; e.g., by using colClasses= in fread(). Otherwise, you can change the column type now by plonking a new column (of the desired type) over the top of it; e.g. DT[, `%s`:=as.%s(`%s`)]. If the RHS of := has nrow(DT) elements, then the assignment is called a column plonk and is the way to change a column's type. Column types can be observed with print(x,class=TRUE) and sapply(x,class).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Column types can be observed with print(x,class=TRUE) and sapply(x,class).
Some users won't know types for non-atomic classes (integer = IDate, factor; double = Date). Could change sapply(x,class)
to sapply(x,typeof)
, but I'm not sure how to change the reference to print
. Maybe some rewording or an extra sentence could work. Alternately, maybe print
could have another option, like verbose = TRUE that prints maximal info (somewhat substituting for str and similar to tables()
):
library(data.table)
DT = data.table(id = 1:2, d = as.IDate(Sys.Date()) + 0:1)
setkey(DT, id)
setindex(DT, d, id)
print(DT, verbose = TRUE) # fake code
# id d
# <int> <IDat>
# <int>
# 1: 1 2018-08-10
# 2: 2 2018-08-11
#
# key: id
# indices:
# - d, id
(... Also displaying new statistics from #2879)
Anyway, I guess the type/class distinction will only matter in rare cases, like...
library(data.table)
DT = data.table(id = 1:2, d = as.IDate(Sys.Date()) + 0:1)
DT[1, d := "1999-01-01"]
# gets coerced to NA_integer_
DT[1, d := 1999-01-01]
# worse, if forgetting quotes and class/type, it gets silently handled as an int
In contrast, sub-assigning a character to a factor works as expected.
Awesome! Much more thorough overhaul 👍 |
…cion.
Original tweet & some discussion here:
https://twitter.com/sarahbeeysian/status/1021359529789775872