Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can infinite values be supported in IDate? #2256

Closed
franknarf1 opened this issue Jul 7, 2017 · 8 comments · Fixed by #5117
Closed

Can infinite values be supported in IDate? #2256

franknarf1 opened this issue Jul 7, 2017 · 8 comments · Fixed by #5117

Comments

@franknarf1
Copy link
Contributor

I'm taking the max or min of an empty IDate vector, but finding my code breaks since the return value is sometimes float, sometimes integer:

library(data.table)
DT = data.table(d = as.IDate(Sys.Date()), g = 1:2, cond = c(TRUE, FALSE))
DT[, max(d[cond]), by=g]

Error in `[.data.table`(DT, , max(d[cond]), by = g) : 
  Column 1 of result for group 2 is type 'double' but expecting type 'integer'. Column types must be consistent for each group.
In addition: Warning message:
In max.default(integer(0), na.rm = FALSE) :
  no non-missing arguments to max; returning -Inf

Since R's integer storage mode doesn't allow Inf or -Inf, I'm guessing IDate cannot be used for this..? My workaround is dropping to Date class:

DT[, {
  d = as.Date(d)
  max(d[cond])
}, by=g]
@MichaelChirico
Copy link
Member

just adding that i've run into this issue as well, had to write a wrapper function for the special case.

@dlksmc
Copy link

dlksmc commented Jan 12, 2018

I'm also dealing with this issue. If you want to stick with IDate you can wrap it with an as.IDate:

DT[, as.IDate(max(d[cond])), by=g]

@jangorecki
Copy link
Member

related discussion on r-devel https://stat.ethz.ch/pipermail/r-devel/2019-March/077435.html

@jangorecki jangorecki changed the title [Question] Can infinite values be supported in IDate? Can infinite values be supported in IDate? Mar 6, 2019
@MichaelChirico
Copy link
Member

MichaelChirico commented Sep 11, 2019

We could do something like this (and analogously for min)?

max.IDate = function(x, na.rm = FALSE) {
  if (!length(x)) {
    out = as.IDate(-.Machine$integer.max)
    warning("No non-missing arguments to max; returning the minimum integer date, ", .Machine$integer.max, " days before epoch time: ", out)
    return(out)
  }
  return(as.IDate(max(as.integer(x), na.rm = na.rm)))
}

Otherwise, change the warning to give some advice about adding conditions like:

if (any(idx <- !is.na(x))) max(x[idx])

.Machine$integer.max is the best we can do -- supporting Inf in IDate columns won't be possible.

@shrektan
Copy link
Member

A little off-topic but I really think max(double()) returns -Inf is annoying. Personally, I never ever need it returns an infinite number. I always prefer NA for such case - if there's no number then we can't find the "max" -> so it's Not / Applicable.

So personally I avoid to use the na.rm argument in max(). If I have to, I would write a small function like function(x) {x = na.omit(x); if (length(x)) max(x) else NA_real_ }.

Maybe we should provide an alternative to the base::max(), which never returns an Inf...

@MichaelChirico
Copy link
Member

This is a mathematical consideration, from ?max:

The minimum and maximum of a numeric empty set are +Inf and -Inf (in this order!) which ensures transitivity, e.g., min(x1, min(x2)) == min(x1, x2). For numeric x max(x) == -Inf and min(x) == +Inf whenever length(x) == 0 (after removing missing values if requested). However, pmax and pmin return NA if all the parallel elements are NA even for na.rm = TRUE.

Also note that in this case, the issue doesn't come from na.rm, but rather from subsetting the input and getting empty output.

I think the behavior is a bit annoying but reasonable. As you said, if you anticipate an empty set into min, you code around it -- the behavior here is a shield against cases when the empty set is unanticipated. I'd rather get hit with a warning then have data.table do something silently in such a case.

@franknarf1
Copy link
Contributor Author

franknarf1 commented Sep 12, 2019

I like the idea of using +/- the machine max int, though I guess further related changes would be desirable for consistency:

  • print.IDate would print both as NA like print.Date does
  • as.IDate would map +/- Inf as well as other doubles outside the int range to these values
  • as.Date would map these values to +/- Inf
  • +.IDate/-.IDate would need to ensure that the results are still ints

In math-like jargon, the goal is that the space of IDates is closed under +, min, max and that NAs in an IDate only mean missing data (while inexpressibly large and small IDates are distinguished from each other and from NA).

A little off-topic but I really think max(double()) returns -Inf is annoying.

Maybe I have just gotten used to it, but I find the behavior makes sense.

@MichaelChirico
Copy link
Member

sounds good. will do this but it'll be breaking in case anyone was relying on the bounds for other purposes. will have to wait for next release

@MichaelChirico MichaelChirico added the breaking-change issues whose solution would require breaking existing behavior label Sep 12, 2019
@MichaelChirico MichaelChirico added this to the 1.13.0 milestone Sep 12, 2019
@mattdowle mattdowle modified the milestones: 1.12.7, 1.12.9 Dec 8, 2019
@mattdowle mattdowle modified the milestones: 1.13.1, 1.13.3 Oct 17, 2020
@mattdowle mattdowle removed the breaking-change issues whose solution would require breaking existing behavior label Aug 27, 2021
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants