Fix the bug in the discretize method caused by the misalignment of column order #90

ankh1999 · 2023-11-15T11:53:33Z

Bug description: The elements in continuous_columns passed into the discretize method are not necessarily strictly increasing according to the column indices in the data. However, the discretize_all method returns continuous_edges assuming this order. This leads to a consistent error triggering at line 59, pd.Categorical.from_codes, with the message: ValueError: codes need to be between -1 and len(categories)-1.

To reproduce the issue, in the example provided in the "Advanced discretizing continuous data" section, replace continuous_columns = ["mpg", "displacement", "horsepower", "weight", "acceleration"] with continuous_columns = ["mpg", "displacement", "horsepower", "acceleration", "weight"], or change the order in any way that differs from the order in the df. This will consistently trigger the error.

To fix the issue, sorting the continuous_columns in the correct order.

erdogant · 2023-11-15T21:47:47Z

great fix!

erdogant · 2023-11-15T21:51:19Z

I published a new release!

Fix the bug in the discretize method

6bee863

erdogant merged commit c1e4f27 into erdogant:master Nov 15, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the bug in the discretize method caused by the misalignment of column order #90

Fix the bug in the discretize method caused by the misalignment of column order #90

ankh1999 commented Nov 15, 2023

erdogant commented Nov 15, 2023

erdogant commented Nov 15, 2023

Fix the bug in the discretize method caused by the misalignment of column order #90

Fix the bug in the discretize method caused by the misalignment of column order #90

Conversation

ankh1999 commented Nov 15, 2023

erdogant commented Nov 15, 2023

erdogant commented Nov 15, 2023