Ability to data profile node outputs for creating data quality checks #165

skrawcz · 2022-08-02T17:00:03Z

Is your feature request related to a problem? Please describe.
Data profiling is a way to help bootstrap creating data quality checks.
Data profiling is also a way to facilitate data exploration, by providing summary statistics over data.

Describe the solution you'd like
A user should be able to profile their DAG, or a set of nodes, and get out some summary statistics.
Those statistics could then be used to bootstrap data quality, i.e. check_output(), decorators, but the output should be standalone.

Describe alternatives you've considered
Haven't considered many options. But there are a few libraries that do data profiling already.

Additional context
Systems like whylogs, great expectations, use profiling to help with the user experience.
Standalone libraries like https://github.com/capitalone/DataProfiler also exist.

#149 does a little to prototype in this area too.

elijahbenizzy · 2023-02-26T17:13:15Z

We are moving repositories! Please see the new version of this issue at DAGWorks-Inc/hamilton#40. Also, please give us a star/update any of your internal links.

Note that everything else (slack community, pypi packages, etc...) will not change at all.

skrawcz added enhancement New feature or request product idea data quality labels Aug 2, 2022

skrawcz mentioned this issue Aug 2, 2022

Help bootstrap check_output() decorator. #166

Closed

HamiltonRepoMigrationBot mentioned this issue Feb 26, 2023

Ability to data profile node outputs for creating data quality checks DAGWorks-Inc/hamilton#40

Closed

elijahbenizzy closed this as completed Feb 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to data profile node outputs for creating data quality checks #165

Ability to data profile node outputs for creating data quality checks #165

skrawcz commented Aug 2, 2022 •

edited

Loading

elijahbenizzy commented Feb 26, 2023

Ability to data profile node outputs for creating data quality checks #165

Ability to data profile node outputs for creating data quality checks #165

Comments

skrawcz commented Aug 2, 2022 • edited Loading

elijahbenizzy commented Feb 26, 2023

skrawcz commented Aug 2, 2022 •

edited

Loading