-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Refactoring Param system and adding new ones #1444
Conversation
/azp run |
Commenter does not have sufficient privileges for PR 1444 in repo microsoft/SynapseML |
lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/LightGBMClassifier.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Minor question on "out" folder and also we should send this by ilya and Markus C too if possible
out/production/SynapseML/com/microsoft/azure/synapse/ml/automl/FindBestModel.txt
Outdated
Show resolved
Hide resolved
vw/src/main/scala/com/microsoft/azure/synapse/ml/vw/VowpalWabbitBase.scala
Outdated
Show resolved
Hide resolved
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #1444 +/- ##
==========================================
- Coverage 84.52% 84.52% -0.01%
==========================================
Files 291 295 +4
Lines 14530 14717 +187
Branches 710 691 -19
==========================================
+ Hits 12282 12440 +158
- Misses 2248 2277 +29
Continue to review full report at Codecov.
|
* (including the unshown values in LibSVM/sparse matrices) | ||
* Set to false to use na for representing missing values. | ||
*/ | ||
case class DatasetParams(isEnableSparse: Option[Boolean], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Can you add tests for all of these new parameters? It's not always as simple as adding them here, and sometimes the distributed version of lightgbm does not support them (like forcedsplits_filename, which apparently only works in single node case https://github.com/Microsoft/LightGBM/blob/master/docs/Parameters.rst#forcedsplits_filename and I only found out the hard way after adding it everywhere in scala but then realizing it just didn't do anything for distributed case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some tests for the new sparse parameters. I also did some code analysis of the LightGBM code, and I don't see any limitations of any of the new parameters. When looking at the one you mentioned (forcedsplits_filename), it does indeed show in the validation code that you can't run it under the types of parallelism we use (data and voting).
I'm not sure I see the utility of keeping those tests. They take time, and once we know it works it should work always (if they ever make a breaking change we would break anyways). Can I just remove them? They weren't fancy, they just proved that LightGBM had the right value for the parameters and that the fit succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"I'm not sure I see the utility of keeping those tests."
It's useful to make sure that:
1.) The parameters actually do work right now, and with future code changes they will still work
2.) With new native lightgbm jar version updates in the future they still work, and names haven't changed, etc
3.) They serve as a good example for others trying to understand how the parameters work and what effect they have by looking at the test cases
At least that is my view anyway. Without tests we don't really know if they work for sure.
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
1 similar comment
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Summary
Modifications to parameters for LightBGM and Vowpal Wabbit APIs
Tests
Added unit tests for new arg string generation utilities