Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added model type GBM (LightGBM tree learner), as an alternative to ECD #2027

Merged
merged 101 commits into from
Jun 29, 2022

Conversation

jppgks
Copy link
Contributor

@jppgks jppgks commented May 12, 2022

This PR introduces additional model type GBM (tree learner), as an alternative to ECD.

Limitations:

  • supports only single output feature of type binary, number or category

In scope:

  • adding model_type to the Ludwig config
  • adding type to the trainer section of the Ludwig config
  • introducing an BaseModel class, implemented by both ECD and GBM
  • moving trainers into a separate folder
  • implementing GBM training functionality in LightGBMTrainer and LightGBMRayTrainer

Out of scope:

  • support for non-trainable feature encoders (e.g. pretrained text embeddings) => follow-up PR

Todo:

  • update/add tests reflecting new model_type functionality
  • replace occurences of ECD with AbstractModel
  • add tests for new GBM functionality
  • finish GBM implementation

Help needed:

  • Alternative to having separate trainer registry per backend (local, ray) OK for now
  • How to correctly update the schema model_type and trainer type (Thanks @ksbrar!)

@github-actions
Copy link

github-actions bot commented May 12, 2022

Unit Test Results

       6 files  ±  0         6 suites  ±0   2h 28m 51s ⏱️ + 10m 54s
2 901 tests +21  2 855 ✔️ +22    46 💤 ±0  0  - 1 
8 703 runs  +63  8 561 ✔️ +64  142 💤 ±0  0  - 1 

Results for commit ca7db50. ± Comparison against base commit f654e82.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@justinxzhao justinxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jppgks, here's a first round of comments. Mostly minor nits with a question about how to organize AbstractModel.

tests/integration_tests/test_lightgbm.py Outdated Show resolved Hide resolved
@@ -69,6 +71,8 @@
{name: base_type.preprocessing_defaults() for name, base_type in base_type_registry.items()}
)

default_model_type = MODEL_ECD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both default_model_type and register_ray_trainer(default=True)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinxzhao default_model_type here refers to the default for the model_type field in the config. register_trainer and register_ray_trainer are used to register trainers that support certain models, as well as which trainer to use by default for a certain model type.

ludwig/trainers/registry.py Show resolved Hide resolved
ludwig/trainers/registry.py Show resolved Hide resolved
ludwig/trainers/base.py Show resolved Hide resolved
ludwig/models/abstractmodel.py Outdated Show resolved Hide resolved
ludwig/models/abstractmodel.py Outdated Show resolved Hide resolved

Returns:
A dictionary of output {feature name}::{tensor_name} -> output tensor.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we raise a NotImplementedError if this method isn't implemented?

Copy link
Contributor Author

@jppgks jppgks May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using ABCMeta metaclass, the current behavior is as follows if forward() is not implemented:

>>> from ludwig.models.base import BaseModel
>>> class Test(BaseModel):
...   pass
... 
>>> Test()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Test with abstract methods forward

logger = logging.getLogger(__name__)


class AbstractModel(LudwigModule, metaclass=ABCMeta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an abstract class, there's a lot of default implementation.

Perhaps an overkill (and curious to hear others' thoughts), but a potentially more decoupled design would be to define a truly implementation-free abstract class AbstractModel, define a base class which defines several default implementations for several methods, BaseModel, and then have GBM and ECD based off of the BaseModel.

ludwig/models/gbm.py Outdated Show resolved Hide resolved
@jppgks
Copy link
Contributor Author

jppgks commented May 24, 2022

Minimal reproducible example for hanging process.

Run the below script with --lgbm_ludwig for reproducing the hanging behavior and without --lgbm_ludwig to call LightGBM directly.

Error resolved in 7c4793b, issue was that CheckpointManager.close() was never called

# Reset the metrics at the start of the next epoch
self.model.reset_metrics()

self.callback(lambda c: c.on_epoch_start(self, progress_tracker, save_path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do tree models behave in steps-based vs. epochs-based training?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tree models have no concept like epochs/steps. There's a separate trainer config for tree based models. You can control the number of boosting rounds through that

Copy link
Contributor

@ShreyaR ShreyaR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! Added a few small comments but feel free to merge otherwise.

ludwig/data/utils.py Outdated Show resolved Hide resolved
@@ -27,7 +27,7 @@

@register_encoder("passthrough", [CATEGORY, NUMBER, VECTOR], default=True)
class PassthroughEncoder(Encoder):
def __init__(self, input_size, **kwargs):
def __init__(self, input_size=1, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we set the default input size to 1 here? Is it possible that this causes issues for a combiner downstream if the expected input size doesn't match the actual input size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShreyaR see the comment from @w4nderlust: #2027 (comment)

Not sure if I can sensibly answer your second question. Maybe @w4nderlust has some input here?

from ludwig.decoders.base import Decoder
from ludwig.decoders.registry import register_decoder
from ludwig.utils.torch_utils import Dense, get_activation

logger = logging.getLogger(__name__)


@register_decoder("passthrough", [BINARY, CATEGORY, NUMBER, SET, VECTOR, SEQUENCE, TEXT])
class PassthroughDecoder(Decoder):
def __init__(self, input_size: int = 1, num_classes: int = None, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below for default input size.

ludwig/models/base.py Show resolved Hide resolved
Copy link
Contributor

@justinxzhao justinxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big PR and LGTM with some small nits.

It would be good to get this merged soon so that we don't have to worry about hair merge conflicts.

assert torch.allclose(of1_w, of2_w)

# Test saving and loading the model explicitly
with tempfile.TemporaryDirectory() as tmpdir:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmpdir is a built-in pytest fixture

output_feature = number_feature()
output_features = [output_feature]

with tempfile.TemporaryDirectory() as tmpdir:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmpdir is a built-in pytest fixture.

output_feature = category_feature(vocab_size=vocab_size)
output_features = [output_feature]

with tempfile.TemporaryDirectory() as tmpdir:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmpdir is a built-in pytest fixture.

output_feature = binary_feature()
output_features = [output_feature]

with tempfile.TemporaryDirectory() as tmpdir:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmpdir is a built-in pytest fixture.

input_features = [number_feature(), category_feature(reduce_output="sum")]
output_features = [text_feature()]

with tempfile.TemporaryDirectory() as tmpdir:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmpdir is a built-in pytest fixture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually using the fixture before, but ran into issues so I switched to this style in accordance with test_ray.py

Copy link
Contributor

@connor-mccorm connor-mccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very awesome job. Just had one small comment on the schema naming but besides that looks awesome and am very impressed with all the work!

ludwig/schema/trainer.py Outdated Show resolved Hide resolved
@jppgks jppgks changed the title Integrating GBM models feat: add Tree learner Jun 28, 2022
ludwig/schema/trainer.py Outdated Show resolved Hide resolved
@tgaddair tgaddair changed the title feat: add Tree learner feat: Added model type GBM (LightGBM tree learner), as an alternative to ECD Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants