-
-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PydanticModel
to pandera.engines.polars_engine
(polars engine support for pydantic models)
#1874
Comments
It would also be great to have some kind of portability between |
I've just noted the supported feature matrix across engines which is very useful (it's on the front page, so my bad). Consider this issue a +1 support for Pydantic integration with |
Yes! mind making a ticket for this?
Great idea, let me look into this early next year 💡
In a way, this is the reason... the pydantic + pandas-pandera integration is our first attempt at unlocking pydantic models on pandas, but it's not the most performant implementation.
This would be a more principled approach would be to have some sort of translation layer from pydantic-native types into pandera-native types (which itself translates to some underlying framework like polars). I think some constrained set of mappings from pydantic types to pandera dtypes would be a good start to unlocking this functionality. To start, can you give example pydantic models that you would like to "just work" with pandera polars schemas? |
just wanted to suggest that perhaps narwhals can be part of the solution as the dataframe compatibility layer. Recently projects have replaced the pandas dependency with narwhals, too. |
@dkapitan would love to get narwhals as a backend for pandera. Mind making an issue? |
@cosmicBboy will do! I am getting my head around a 'composable data stack/pipeline for FHIR' (healthcare) and plan to refine the use case and issues in the coming week. |
See #1894. |
I would like to use Pydantic models in Pandera schemas but for polars, not pandas.
The current example shows the following:
Where
PydanticModel
is imported frompandas_engine
. This doesn't work withpolars
, and if I switch the import from modulepolars_engine
the object does not exist in that module.I have been considering this note:
Which suggests that maybe this is not the best way of doing things and I may lose out on
polars
vectorisation and speed benefits by doing this. Maybe that is why the feature has not been developed yet.To add further context, what I would like is be able to define my data model and constraints once and only once. I am currently defining my models inheriting from
SQLModel
, since I want the functionality that brings. I have also been generating mock data withpolyfactory
which needs models that inherit from pydantic'sBaseModel
. I then want to use my data model to validate data files on ingest, and ideally I don't want to loop through applying the validation row by row - I want to validate in a vectorised way.Alternatives I've considered:
pa.DataFrameModel
in a manner consistent with the docs on use withpolars
.pydantic.BaseModel
into apa.DataFrameModel
dynamically - I don't know whether this is possible or desired, or whether something exists already for this.While these alternatives allow me to take advantage of the full vectorisation, option 1 requires maintaining the data model in two different places which could result in sync issues. Option 2 requires building something new - which may have some use to the package API.
Is this feature on the roadmap (polars support for pydantic models)? Or does anyone have any advice which may help me achieve my goals? Thanks!
The text was updated successfully, but these errors were encountered: