Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Refactoring Param system and adding new ones #1444

Merged
merged 22 commits into from
Mar 30, 2022

Conversation

svotaw
Copy link
Collaborator

@svotaw svotaw commented Mar 18, 2022

Summary

Modifications to parameters for LightBGM and Vowpal Wabbit APIs

  1. Create common unit testable utilities (core/utils/ParamsStringBuilder) for creating native library command-line argument strings
  2. Refactor VowpalWabbit and LightGBM to use the utilities
  3. Add more explicit setters for some requested LightGBM parameters
  4. Add ability for LightBGM user to set custom args that aren't defined explicitly by SynapseML (as Vowpal Wabbit does now)

Tests

Added unit tests for new arg string generation utilities

@svotaw svotaw changed the title Refactoring Param system and adding new ones feat: Refactoring Param system and adding new ones Mar 19, 2022
@svotaw
Copy link
Collaborator Author

svotaw commented Mar 19, 2022

/azp run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1444 in repo microsoft/SynapseML

Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Minor question on "out" folder and also we should send this by ilya and Markus C too if possible

@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2022

Codecov Report

Merging #1444 (260409c) into master (753b29a) will decrease coverage by 0.00%.
The diff coverage is 90.06%.

@@            Coverage Diff             @@
##           master    #1444      +/-   ##
==========================================
- Coverage   84.52%   84.52%   -0.01%     
==========================================
  Files         291      295       +4     
  Lines       14530    14717     +187     
  Branches      710      691      -19     
==========================================
+ Hits        12282    12440     +158     
- Misses       2248     2277      +29     
Impacted Files Coverage Δ
...rosoft/azure/synapse/ml/lightgbm/SharedState.scala 88.46% <ø> (ø)
...zure/synapse/ml/lightgbm/TaskTrainingMethods.scala 100.00% <ø> (ø)
...t/azure/synapse/ml/lightgbm/LightGBMDelegate.scala 50.00% <60.00%> (ø)
...re/synapse/ml/lightgbm/params/LightGBMParams.scala 78.43% <67.46%> (-3.42%) ⬇️
...crosoft/azure/synapse/ml/vw/VowpalWabbitBase.scala 74.85% <86.95%> (-3.27%) ⬇️
...re/synapse/ml/core/utils/ParamsStringBuilder.scala 96.96% <96.96%> (ø)
...osoft/azure/synapse/ml/lightgbm/LightGBMBase.scala 95.74% <100.00%> (+1.17%) ⬆️
...azure/synapse/ml/lightgbm/LightGBMClassifier.scala 91.39% <100.00%> (+0.28%) ⬆️
...oft/azure/synapse/ml/lightgbm/LightGBMRanker.scala 62.66% <100.00%> (+2.10%) ⬆️
.../azure/synapse/ml/lightgbm/LightGBMRegressor.scala 75.00% <100.00%> (+0.86%) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 753b29a...260409c. Read the comment docs.

* (including the unshown values in LibSVM/sparse matrices)
* Set to false to use na for representing missing values.
*/
case class DatasetParams(isEnableSparse: Option[Boolean],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Can you add tests for all of these new parameters? It's not always as simple as adding them here, and sometimes the distributed version of lightgbm does not support them (like forcedsplits_filename, which apparently only works in single node case https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.rst#forcedsplits_filename and I only found out the hard way after adding it everywhere in scala but then realizing it just didn't do anything for distributed case).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some tests for the new sparse parameters. I also did some code analysis of the LightGBM code, and I don't see any limitations of any of the new parameters. When looking at the one you mentioned (forcedsplits_filename), it does indeed show in the validation code that you can't run it under the types of parallelism we use (data and voting).

I'm not sure I see the utility of keeping those tests. They take time, and once we know it works it should work always (if they ever make a breaking change we would break anyways). Can I just remove them? They weren't fancy, they just proved that LightGBM had the right value for the parameters and that the fit succeeded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"I'm not sure I see the utility of keeping those tests."
It's useful to make sure that:
1.) The parameters actually do work right now, and with future code changes they will still work
2.) With new native lightgbm jar version updates in the future they still work, and names haven't changed, etc
3.) They serve as a good example for others trying to understand how the parameters work and what effect they have by looking at the test cases
At least that is my view anyway. Without tests we don't really know if they work for sure.

@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Mar 22, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Mar 22, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Mar 22, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Mar 23, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Mar 23, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Mar 24, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Mar 24, 2022

/azp run

1 similar comment
@svotaw
Copy link
Collaborator Author

svotaw commented Mar 24, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Mar 28, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Mar 28, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Mar 29, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Mar 29, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@svotaw
Copy link
Collaborator Author

svotaw commented Mar 30, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants