Fix the bug in the discretize method caused by the misalignment of column order #90
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bug description: The elements in continuous_columns passed into the discretize method are not necessarily strictly increasing according to the column indices in the data. However, the discretize_all method returns continuous_edges assuming this order. This leads to a consistent error triggering at line 59, pd.Categorical.from_codes, with the message: ValueError: codes need to be between -1 and len(categories)-1.
To reproduce the issue, in the example provided in the "Advanced discretizing continuous data" section, replace
continuous_columns = ["mpg", "displacement", "horsepower", "weight", "acceleration"]
withcontinuous_columns = ["mpg", "displacement", "horsepower", "acceleration", "weight"]
, or change the order in any way that differs from the order in the df. This will consistently trigger the error.To fix the issue, sorting the continuous_columns in the correct order.