Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patterns() did not consider the position of provided cols in the original DT #6498

Closed
hongyuanjia opened this issue Sep 15, 2024 · 10 comments · Fixed by #6510
Closed

patterns() did not consider the position of provided cols in the original DT #6498

hongyuanjia opened this issue Sep 15, 2024 · 10 comments · Fixed by #6510
Assignees
Labels
reshape dcast melt

Comments

@hongyuanjia
Copy link
Contributor

Please see the MRE below. patterns() returns the matched positions of input cols, which are 1 and 2, resulting using id_1 and id_2 as the measure.vars:

library(data.table)

dt <- data.table(
    id_1 = 1:3,
    id_2 = letters[1:3],
    letter_1 = LETTERS[1:3],
    letter_2 = LETTERS[4:6]
)

melt(dt, measure.vars = patterns("_\\d$", cols = paste0("letter_", 1:2)))
#> Warning in melt.data.table(dt, measure.vars = patterns("_\\d$", cols =
#> paste0("letter_", : 'measure.vars' [id_1, id_2] are not all of the same type.
#> By order of hierarchy, the molten data value column will be of type
#> 'character'. All measure variables not of type 'character' will be coerced too.
#> Check DETAILS in ?melt.data.table for more on coercion.
#>    letter_1 letter_2 variable  value
#>      <char>   <char>   <fctr> <char>
#> 1:        A        D     id_1      1
#> 2:        B        E     id_1      2
#> 3:        C        F     id_1      3
#> 4:        A        D     id_2      a
#> 5:        B        E     id_2      b
#> 6:        C        F     id_2      c
@hongyuanjia hongyuanjia changed the title patterns() did not consider the position of cols provided in the original DT patterns() did not consider the position of provided cols in the original DT Sep 15, 2024
@Anirban166
Copy link
Member

Can confirm the MRE, but I wonder if it's by intended design that patterns() is the primary driver of column selection, while cols just acts as a filter on top of it without getting precedence?

@tdhock
Copy link
Member

tdhock commented Sep 20, 2024

yes it is intended and working as documented in ?patterns:

     ‘patterns’ returns the matching indices in the argument ‘cols’
     corresponding to the regular expression patterns provided. 

For this example I would suggest using measure() instead of patterns()

> melt(dt, measure.vars=measure(letter, pattern="letter_([12])"))
    id_1   id_2 letter  value
   <int> <char> <char> <char>
1:     1      a      1      A
2:     2      b      1      B
3:     3      c      1      C
4:     1      a      2      D
5:     2      b      2      E
6:     3      c      2      F

please close if that is OK for you, or clarify.

@tdhock
Copy link
Member

tdhock commented Sep 20, 2024

or

> melt(dt, measure.vars=measure(value.name, number=as.integer, pattern="(.*)_([12])"))
   number     id letter
    <int> <char> <char>
1:      1      1      A
2:      1      2      B
3:      1      3      C
4:      2      a      D
5:      2      b      E
6:      2      c      F
Message d'avis :
Dans melt.data.table(dt, measure.vars = measure(value.name, number = as.integer,  :
  'measure.vars' [id_1, id_2] are not all of the same type. By order of hierarchy, the molten data value column will be of type 'character'. All measure variables not of type 'character' will be coerced too. Check DETAILS in ?melt.data.table for more on coercion.

@tdhock tdhock self-assigned this Sep 20, 2024
@tdhock tdhock added the reshape dcast melt label Sep 20, 2024
@hongyuanjia
Copy link
Contributor Author

@tdhock I am totally fine with the measure approach. But if this is the intended behavior, what would be the use cases of the cols then?

@tdhock
Copy link
Member

tdhock commented Sep 20, 2024

I don't think cols is meant to be provided.
related: #6077

@tdhock
Copy link
Member

tdhock commented Sep 20, 2024

I guess it could be supported if we change return value of patterns from indices to names (when cols are names), but in general I think it is better if the user just provides a more expressive regex

> melt(dt, measure.vars = patterns("letter"))
    id_1   id_2 variable  value
   <int> <char>   <fctr> <char>
1:     1      a letter_1      A
2:     2      b letter_1      B
3:     3      c letter_1      C
4:     1      a letter_2      D
5:     2      b letter_2      E
6:     3      c letter_2      F

@hongyuanjia
Copy link
Contributor Author

The measure approach is more flexible and featue-rich. If current behavior is the intended behavior, I would suggest either remove the cols parameter or at least document that it is not intended to be used directly to avoid confusing to users.

@Anirban166
Copy link
Member

I agree, it's redundant to have a second column filter so maybe omit/document (just patterns() should be enough for just about anything; may it be a regex like ^letter_ for the case here, or something more non-trivial)

@tdhock
Copy link
Member

tdhock commented Sep 20, 2024

After looking at the patterns code, adding this feature was really simple, and it did not cause any existing tests to break, so I created a PR, #6510, can you please review and tell me if that is a sufficient change in docs/behavior?

@hongyuanjia
Copy link
Contributor Author

Thanks! I believe #6510 is the right move.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reshape dcast melt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants