api: plan for narwhals.stable.v2
#1657
Replies: 2 comments 2 replies
-
I think I'd also like to remove row-order-dependent expressions:
- arg_max
- arg_min
- cum_count
- cum_max
- cum_min
- cum_prod
- cum_sum
- diff
- drop_nulls
- ewm_mean
- filter
- gather_every
- head
- is_first_distinct
- is_last_distinct
- mode
- rolling_mean
- rolling_std
- rolling_sum
- rolling_var
- sample
- shift
- sort
- unique
These can all stay available for All of these would of course stay available in In #1689 (comment), we find:
So, it would take more planning to be able to do this safely Example (maybe?) of how def shift_by(expr, by):
idx = pl.arg_sort_by(by)
return pl.col(expr).gather(idx).shift().sort_by(idx) |
Beta Was this translation helpful? Give feedback.
-
Something else to discuss is
The signature means we need to have an (eager) Series to determine whether a column is ordered categorical, which is problematic for lazy implementations. What we could have instead is: is_ordered_categorical(frame, column_name) -> bool Then, the logic would go:
EDIT The above change might not actually be necessary. In Altair, they need the unique categories: So, we could do something like this there: if (
# Getting the unique categories would require materialising a LazyFrame,
# so we only auto-infer ordinal in the eager case.
isinstance(df, nw.DataFrame)
and nw.is_ordered_categorical(df[col_name])
and not (categories := df[col_name].cat.get_categories()).is_empty()
):
return "ordinal", categories.to_list() |
Beta Was this translation helpful? Give feedback.
-
I think we can aim for
narwhals.stable.v2
some time in 2025The main things I'd like to achieve are:
Series
and have a defined row order (pandas, polars.DataFrame, pyarrow.Table, modin.DataFrame, cudf.DataFrame)collect
, adding/removing/subsetting columns, filtering rows. Initially, at least, row-order-dependent operations (such asnw.cum_sum
) will be left out. This will include Polars.LazyFrame, DuckDBPyRelation, pyspark.DataFrame, ibis.Table, dask.DataFrameCandidate changes:
from_native
. It will get simpler once we remove interchange-level things, but it's still fairly complicated. Not sure how much I like it to be completely honest, but I also haven't come up with anything betterUsage of
narwhals.stable.v1
should remain unaffectedTentative date: March 2025
Why remove support for the dataframe interchange protocol?
Beta Was this translation helpful? Give feedback.
All reactions