Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer single distinct value columns as category instead of binary #2467

Merged
merged 3 commits into from
Sep 9, 2022

Conversation

arnavgarg1
Copy link
Contributor

This fixes an issue where we were seeing that columns with single distinct values (say all 1s) were being inferred as a binary value which caused downstream problems when trying to precompute the fille value with the following error:

Unable to determine False value for column X with distinct values: [1].

This fix changes these types of features to be inferred as categorical features, and also for them to be excluded from the dataset since a categorical feature with the same value doesn't add anything to a machine learning model.

@github-actions
Copy link

github-actions bot commented Sep 9, 2022

Unit Test Results

         6 files  +    1         6 suites  +1   3h 3m 50s ⏱️ + 59m 24s
  3 376 tests +    1  3 298 ✔️ +  1    78 💤 ±  0  0 ±0 
10 128 runs  +119  9 870 ✔️ +97  258 💤 +22  0 ±0 

Results for commit cec510c. ± Comparison against base commit d37abe3.

♻️ This comment has been updated with latest results.

@arnavgarg1 arnavgarg1 merged commit e60626f into master Sep 9, 2022
@arnavgarg1 arnavgarg1 deleted the automl_type_inference branch September 9, 2022 15:37
tgaddair pushed a commit that referenced this pull request Sep 9, 2022
)

* Added fix for converting single distinct value columns to category instead of binary

* remove print statements

* Fix failing test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants