Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to add parameter use_missing, zero_as_missing #902

Closed
fucusy opened this issue Aug 4, 2020 · 8 comments
Closed

Request to add parameter use_missing, zero_as_missing #902

fucusy opened this issue Aug 4, 2020 · 8 comments

Comments

@fucusy
Copy link

fucusy commented Aug 4, 2020

Is your feature request related to a problem? Please describe.
I'm always frustrated when dealing with missing value in spark, in our existing pipeline we represent missing value with 0, but zero_as_missing is not in the TrainParams, I don't know how to specify zero_as_missing

Describe the solution you'd like
Add setZeroAsMissing to TrainParams and LightGBMRanker

Additional context
Add any other context or screenshots about the feature request here.

AB#1761983

@welcome
Copy link

welcome bot commented Aug 4, 2020

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

@imatiach-msft
Copy link
Contributor

@fucusy ah, this should be very easy to add. You can already specify it in any of the string parameters by a hack, eg any_string_param="value, zero_as_missing=True"

@imatiach-msft
Copy link
Contributor

in another issue a user used something like this recently:

objective='huber, bin_construct_sample_cnt=200000, min_gain_to_split=0, min_child_weight=0.001, min_data_in_leaf=20, tree_learner=data, num_threads=0'

@ffineis
Copy link

ffineis commented Apr 9, 2021

Hey @imatiach-msft - I've been trying to use this parameter string injection with scala mmlspark, specifically using setObjective:

String objective = "lambdarank, sigmoid=2";
LightGBMRanker ranker = new LightGBMRanker();
ranker.setObjective(objective);

But getting the following exception:

[LightGBM] [Fatal] Parameter sigmoid should be of type double, got "2,"
[LightGBM] [Info] Finished linking network in 0.039144 seconds
[LightGBM] [Info] Finished linking network in 0.039447 seconds
    [ERROR] Exception in task 1.0 in stage 4317.0 (TID 213901)
    java.lang.Exception: Booster call failed in LightGBM with error: Parameter sigmoid should be of type double, got "2,"
    	at com.microsoft.ml.spark.lightgbm.LightGBMUtils$.validate(LightGBMUtils.scala:30)
    	at com.microsoft.ml.spark.lightgbm.TrainUtils$.createBooster(TrainUtils.scala:162)
    	at com.microsoft.ml.spark.lightgbm.TrainUtils$.translate(TrainUtils.scala:389)
...

Got the same exception when setting objective = "lambdarank, sigmoid=2.0".
Any tips? Does this string injection workaround only work for pyspark?

@imatiach-msft
Copy link
Contributor

@ffineis hmm it should just work, I'm surprised you are seeing the error for
objective = "lambdarank, sigmoid=2.0"

@imatiach-msft
Copy link
Contributor

@ffineis my mistake, it looks like you need to remove the comma, see:

https://github.com/Azure/mmlspark/blob/8d4c405daec9adbe4482ba20849de6596e217bef/src/main/scala/com/microsoft/ml/spark/lightgbm/TrainParams.scala#L52

num_machines=$numMachines objective=$objective verbosity=$verbosity 

maybe try this:
objective = "lambdarank sigmoid=2.0"

@ffineis
Copy link

ffineis commented Apr 12, 2021

@imatiach-msft awesome, this works!! Thanks so much!

@imatiach-msft
Copy link
Contributor

closing as the params useMissing, zeroAsMissing have been added with PR:
#1444

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants