Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing behavior with DT[, min(var):max(var)] #2069

Closed
franknarf1 opened this issue Mar 21, 2017 · 5 comments · Fixed by #4460
Closed

Confusing behavior with DT[, min(var):max(var)] #2069

franknarf1 opened this issue Mar 21, 2017 · 5 comments · Fixed by #4460
Assignees
Labels
Milestone

Comments

@franknarf1
Copy link
Contributor

With this data...

library(data.table)

DT = data.table(
    id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L), 
    t = c(4L, 9L, 14L, 1L, 4L, 9L, 5L, 14L, 18L, 1L, 3L, 12L, 3L, 5L, 7L, 4L, 10L, 13L)
)


    id  t
 1:  1  4
 2:  1  9
 3:  1 14
 4:  2  1
 5:  2  4
 6:  2  9
 7:  3  5
 8:  3 14
 9:  3 18
10:  4  1
11:  4  3
12:  4 12
13:  5  3
14:  5  5
15:  5  7
16:  6  4
17:  6 10
18:  6 13

I expected DT[, min(t):max(t)] to be the same as with(DT, min(t):max(t)) -- just a vector of integers counting over that interval. I got something quite different:

>     DT[, min(id):max(id)]
    id
 1:  1
 2:  1
 3:  1
 4:  2
 5:  2
 6:  2
 7:  3
 8:  3
 9:  3
10:  4
11:  4
12:  4
13:  5
14:  5
15:  5
16:  6
17:  6
18:  6
>     DT[, seq(min(id), max(id))]
[1] 1 2 3 4 5 6
>     
> 
>     DT[, min(t):max(t)]
     t
 1:  4
 2:  9
 3: 14
 4:  1
 5:  4
 6:  9
 7:  5
 8: 14
 9: 18
10:  1
11:  3
12: 12
13:  3
14:  5
15:  7
16:  4
17: 10
18: 13
>     DT[, seq(min(t), max(t))]
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17

Maybe this is a side-effect of the move to make DT[, j] behave more like DF[, j]? I didn't check to see if it also happened in older versions.

@renkun-ken
Copy link
Member

This behavior looks quite inconsistent with data.table's evaluation rules.

@MichaelChirico
Copy link
Member

Seems like a bug to me.

In particular, note:

DT[, dput(min(id):max(id))]
# 1:6
# [1] 1 2 3 4 5 6

Indeed it's something funky with scoping rules, as this fixes things to work as expected:

DT[, (min(id):max(id))]
# [1] 1 2 3 4 5 6
DT[, (min(t):max(t))]
# [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18

But it certainly seems like an unnatural workaround...

@MichaelChirico
Copy link
Member

MichaelChirico commented Mar 22, 2017

Here's the culprit:

#root here is :
root = if (is.call(jsub)) as.character(jsub[[1L]])[1L] else ""
if (root == ":" || ...) with=FALSE

The logic isn't robust enough to allow for a:b as a seq(a, b) construct -- it's automatically interpreted as cola:colb.

I'm not sure if there's a simple way to fix the logic to accommodate this and still set with = FALSE automatically?

And anyway, I still have no idea why/how the output is what it is, given that DT[ , 1:6] is an error.

@franknarf1
Copy link
Contributor Author

franknarf1 commented Mar 22, 2017

@MichaelChirico I'm guessing it's somehow becoming DT[, id:id]. I can't figure out why though...

This produces the same result:

DT[, sum(t):paste(t)]

and

DT[, sum(t):paste(t):max(t):min(t)]

@renkun-ken
Copy link
Member

What about only symbol:symbol be interpreted as column selection, and others (e.g. call:call) should be regarded as a typical j?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants