Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add broadcasting of selectors to the minilanguage #2918

Merged
merged 11 commits into from
Nov 1, 2021
Merged

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Oct 20, 2021

@nalimilan + @pdeffebach: I am sharing the preview of the functionality making e.g. Not(:x1) .=> sum work in the minilanguage.

The core mechanism should be working for intended cases.

What is todo:

  • implement expansion engine
  • inject the inner mechanics to select, subset etc. functions that should call it
  • add tests of intended use cases
  • add tests for target functions
  • add tests of corner cases (i.e. to check if we are good if we pass something that is not intended)
  • add NEWS.md
  • add docstring
  • add documentation

@bkamins bkamins added this to the 1.3 milestone Oct 20, 2021
@bkamins bkamins linked an issue Oct 20, 2021 that may be closed by this pull request
@bkamins bkamins added the bug label Oct 20, 2021
@bkamins
Copy link
Member Author

bkamins commented Oct 20, 2021

Additionally I have caught two bugs in the existing implementation:

julia> df = DataFrame(a=1,b=2,c=3)
1×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     1      2      3

julia> select(view(df, 1:1, 1:2), [:x1, :x2] .=> [sum length])
ERROR: ArgumentError: Unrecognized column selector: Pair{Symbol, _A} where _A[:x1 => sum :x1 => length; :x2 => sum :x2 => length]

julia> select(groupby(df, :a), Symbol[] .=> [sum length])
ERROR: MethodError: no method matching select(::GroupedDataFrame{DataFrame}, ::Matrix{Any})

(both are fixed)

@bkamins bkamins marked this pull request as ready for review October 21, 2021 09:43
@bkamins bkamins requested a review from nalimilan October 21, 2021 09:43
docs/src/man/split_apply_combine.md Outdated Show resolved Hide resolved
src/abstractdataframe/selection.jl Show resolved Hide resolved
src/abstractdataframe/selection.jl Outdated Show resolved Hide resolved
src/abstractdataframe/selection.jl Show resolved Hide resolved
src/abstractdataframe/selection.jl Show resolved Hide resolved
@@ -27,8 +27,11 @@ function _combine_prepare(gd::GroupedDataFrame,
keepkeys::Bool, ungroup::Bool, copycols::Bool,
keeprows::Bool, renamecols::Bool)
for cei in cs
@assert cei isa Union{Pair, Base.Callable, ColumnIndex, MultiColumnIndex,
AbstractVecOrMat{<:Pair}}
if !(cei isa AbstractMatrix && isempty(cei))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is again because of broadcasting returning Matrix{Any}? It's somewhat ugly to have to handle this special case here. Would there be any way to fix this elsewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - this is the Matrix{Any} case special handling. This is essentially the only place we can handle this in a short way. The point is that we want to do AbstractVecOrMat{<:Pair} check below (not to allow arbitrary matrices to leak further unless they are empty as then it does not matter what eltype they have - and thus the additional check).

test/select.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member Author

bkamins commented Oct 27, 2021

CI failure here is unrelated.

@@ -67,6 +67,10 @@ const TRANSFORMATION_COMMON_RULES =
`cols => function`. In particular the following expression `function => target_cols`
is not a valid transformation specification.

Note! If `cols` or `target_cols` is one of `All`, `Cols`, `Between`, or
`Not` and is used in a broadcasting context are properly expanded to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a syntax problem here. Just copy the text from above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have forgotten to sync the docstrings. Fixed.

src/abstractdataframe/selection.jl Show resolved Hide resolved
@bkamins bkamins merged commit cccf74d into main Nov 1, 2021
@bkamins bkamins deleted the bk/broadcast_selectors branch November 1, 2021 11:59
@bkamins
Copy link
Member Author

bkamins commented Nov 1, 2021

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for All, Between and Not broadcating
2 participants