Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 'com.microsoft.azure.synapse.ml.lightgbm' has no attribute 'LightGBMClassificationModel' #1701

Open
2 of 19 tasks
sibyl1956 opened this issue Oct 31, 2022 · 7 comments
Open
2 of 19 tasks
Assignees

Comments

@sibyl1956
Copy link

sibyl1956 commented Oct 31, 2022

SynapseML version

0.10.1

System information

Language version: Python: 3.8.10, Scala 2.12
Spark Version : Apache Spark 3.2.1,
Spark Platform: Databricks

Describe the problem

When try to load a pipeline model for lightgbm, I encountered this error message:
'com.microsoft.azure.synapse.ml.lightgbm' has no attribute 'LightGBMClassificationModel'

But I imported from synapse.ml.lightgbm import LightGBMClassificationModel before I try to load the pipeline model

Code to reproduce issue

from pyspark.ml.pipeline import PipelineModel
from synapse.ml.lightgbm import LightGBMClassificationModel, LightGBMClassifier
clf = PipelineModel.load(model_savepath)

Other info / logs

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<command-2087039020756525> in <module>
      1 # Load model
      2 from pyspark.ml.pipeline import PipelineModel
----> 3 clf = PipelineModel.load(model_savepath)

/databricks/spark/python/pyspark/ml/util.py in load(cls, path)
    461     def load(cls, path):
    462         """Reads an ML instance from the input path, a shortcut of `read().load(path)`."""
--> 463         return cls.read().load(path)
    464 
    465 

/databricks/spark/python/pyspark/ml/pipeline.py in load(self, path)
    258             return JavaMLReader(self.cls).load(path)
    259         else:
--> 260             uid, stages = PipelineSharedReadWrite.load(metadata, self.sc, path)
    261             return PipelineModel(stages=stages)._resetUid(uid)
    262 

/databricks/spark/python/pyspark/ml/pipeline.py in load(metadata, sc, path)
    394             stagePath = \
    395                 PipelineSharedReadWrite.getStagePath(stageUid, index, len(stageUids), stagesDir)
--> 396             stage = DefaultParamsReader.loadParamsInstance(stagePath, sc)
    397             stages.append(stage)
    398         return (metadata['uid'], stages)

/databricks/spark/python/pyspark/ml/util.py in loadParamsInstance(path, sc)
    719         else:
    720             pythonClassName = metadata['class'].replace("org.apache.spark", "pyspark")
--> 721         py_type = DefaultParamsReader.__get_class(pythonClassName)
    722         instance = py_type.load(path)
    723         return instance

/databricks/spark/python/pyspark/ml/util.py in __get_class(clazz)
    630         m = __import__(module)
    631         for comp in parts[1:]:
--> 632             m = getattr(m, comp)
    633         return m
    634 

AttributeError: module 'com.microsoft.azure.synapse.ml.lightgbm' has no attribute 'LightGBMClassificationModel'

What component(s) does this bug affect?

  • area/cognitive: Cognitive project
  • area/core: Core project
  • area/deep-learning: DeepLearning project
  • area/lightgbm: Lightgbm project
  • area/opencv: Opencv project
  • area/vw: VW project
  • area/website: Website
  • area/build: Project build system
  • area/notebooks: Samples under notebooks folder
  • area/docker: Docker usage
  • area/models: models related issue

What language(s) does this bug affect?

  • language/scala: Scala source code
  • language/python: Pyspark APIs
  • language/r: R APIs
  • language/csharp: .NET APIs
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/synapse: Azure Synapse integrations
  • integrations/azureml: Azure ML integrations
  • integrations/databricks: Databricks integrations
@sibyl1956 sibyl1956 added the bug label Oct 31, 2022
@github-actions
Copy link

Hey @sibyl1956 👋!
Thank you so much for reporting the issue/feature request 🚨.
Someone from SynapseML Team will be looking to triage this issue soon.
We appreciate your patience.

@ppruthi
Copy link
Contributor

ppruthi commented Nov 7, 2022

@svotaw -- could you take a look at this issue ? Thanks !

@svotaw svotaw self-assigned this Nov 13, 2022
@svotaw
Copy link
Collaborator

svotaw commented Nov 13, 2022

Can you give more context here? How did you save the model? What was the code to create the original Pipeline?

@svotaw svotaw removed the triage label Nov 16, 2022
@anor4k
Copy link

anor4k commented Apr 28, 2023

Having the same issue.
Here's the code i used to train and save the model:

from synapse.ml.lightgbm import LightGBMRegressor
from synapse.ml.train import TrainedRegressorModel
from pyspark.ml.pipeline import PipelineModel

model = TrainRegressor(
    model=LightGBMRegressor(**model_params),
    inputCols=features,
    labelCol=target
)

trained_model = model.fit(df_train)
trained_model.getModel().save('trained_model_pipeline')

loaded_model = PipelineModel.load('trained_model_pipeline')

Running that last line gives me the same error as the OP. Running on SynapseML 0.11.1, PySpark 3.2.3.

I can save the TrainedregressorModel and use TrainedRegressorModel.load to load the model correctly, but using PipelineModel.load seems like a more general solution to loading models and I would prefer using that.

@tbrandonstevenson
Copy link

Here is an anecdotal experience, whatever it is worth:

I had the same problem and was able to get the pipeline to load by flattening the pipeline stages. It was erroring when my first stage in the pipeline was itself a pipeline of feature transformations. When I removed this nested pipeline structure I was able to load the saved pipeline.

@grzegorz-karas
Copy link

For a pyspark.ml.Pipeline where all stages were java stages (estimators and transformers that come from the spark MLlib library) the model could be saved and read without problems.

WORKS:

pipe = Pipeline(
    stages=[
        SomePysparkMLibTransformer, # is an instance of the JavaMLWritable
        LightGBMClassifier(**model_params),
    ]
)

The error occurred when one of the transformers were a custom and not a java stage.

DOESN'T WORK:

pipe = Pipeline(
    stages=[
        SomeCustomTransformer, # is NOT an instance of the JavaMLWritable
        LightGBMClassifier(**model_params),
    ]
)

In this case the PipelineModel.write method returned a non java writer. The classes synapse.ml.lightgbm.LightGBMClassifier and synapse.ml.lightgbm.LightGBMRegressor inherit correct java reader (pyspark.ml.util.JavaMLReadable) and writer (pyspark.ml.util.JavaMLWritable). The problem is with the superclass synapse.ml.core.schema.Utils.ComplexParamsMixin that inherits only from the pyspark.ml.util.MLReadable.

I could bypass the problem by wrapping the estimator with the pyspark.ml.Pipeline. In this situation the write method of the last stage will return the JavaMLWriter not the PipelineModelWriter.

pipe = Pipeline(
    stages=[
        SomeCustomTransformer, # is NOT an instance of the JavaMLWritable
        Pipeline(
            stages=[
                LightGBMClassifier(**model_params),
            ]
        )
    ]
)

@dsmith111
Copy link

dsmith111 commented Sep 5, 2024

Is this bug still being considered? Implementing

pipeline = Pipeline(
    stages=[
        custom_transformer,
        PipelineModel(stages=[lgbm_model]),
        custom_transformer
        ]
    )

seems like it should just be a temporary work around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants