feat: Adding feature type shared parameter capability for hyperopt #2133

arnavgarg1 · 2022-06-13T20:15:32Z

This change enables using the defaults keyword within hyperopt to set default parameters for feature groups. This will help add search spaces more concisely for datasets with a large number of features while also reducing the search space to allow for a deeper search during hyperopt.

For e.g., a user can now add

"defaults.text.num_filters": {"space": "choice", "categories": [64, 128, 256, 512]}

in their hyperopt search space so that all text features (inputs and outputs) use the sampled cell_type for that particular trial.

The API for this change is as follows: defaults.<feature_type>.<parameter>, where:

feature_type: a valid input or output feature type, for e.g., text/category/sequence etc.
parameter: the parameter to search over for that particular feature type, for e.g., embedding_size for category features

Example Configuration:

hyperopt:
    goal: minimize
    parameters:
        trainer.learning_rate:
            lower: 0.0001
            upper: 0.01
            space: loguniform
        defaults.text.num_filters:
            space: choice
            categories: [64, 128, 256, 512]
        defaults.category.embedding_size:
            space: choice
            categories: [64, 128, 256, 512]
        ....
    ....

_Note: The output status that gets logged every few seconds will show the parameter as defaults.text.cell_type in the printed table with each trial status, but underneath the surface, this is getting updated correctly. This is unfortunately not something we can control, but I think it is okay to leave it this way anyway because the table width would become too large if we were to add a column for each feature whose default parameter was updated in the trial status table that gets printed.

_

for more information, see https://pre-commit.ci

arnavgarg1 · 2022-06-13T20:49:46Z

Working on writing some tests so will update this PR, but wanted to create the PR so we can test if it works.

github-actions · 2022-06-13T21:05:55Z

Unit Test Results

      6 files ±  0       6 suites ±0 2h 11m 2s ⏱️ - 17m 17s
2 887 tests +  7 2 841 ✔️ +  7   46 💤 ±0 0 ❌ ±0
8 661 runs +21 8 519 ✔️ +21 142 💤 ±0 0 ❌ ±0

Results for commit 1c9f65a. ± Comparison against base commit a53f9f8.

♻️ This comment has been updated with latest results.

for more information, see https://pre-commit.ci

ludwig/hyperopt/execution.py

ludwig/hyperopt/run.py

…g-ai/ludwig into hyperopt_shared_parameters

for more information, see https://pre-commit.ci

arnavgarg1 · 2022-06-23T23:20:32Z

Not sure where my comment went, but i have an issue with vocab_size. it is an internal only parameter, that is derived by most_common and preprocessing, it's not something to hyperopt on, but the pr searches over it, we should choose another parameter to search over

This should be fixed now!

w4nderlust · 2022-06-24T01:40:25Z

Not sure where my comment went, but i have an issue with vocab_size. it is an internal only parameter, that is derived by most_common and preprocessing, it's not something to hyperopt on, but the pr searches over it, we should choose another parameter to search over

This should be fixed now!

In test_hyperopt I can see it though

arnavgarg1 · 2022-06-24T20:11:32Z

Not sure where my comment went, but i have an issue with vocab_size. it is an internal only parameter, that is derived by most_common and preprocessing, it's not something to hyperopt on, but the pr searches over it, we should choose another parameter to search over

This should be fixed now!

In test_hyperopt I can see it though

Good catch, I've updated all hyperopt tests in test_hyperopt to use parameters that are user-facing now :)

w4nderlust · 2022-06-24T20:35:17Z

Not sure where my comment went, but i have an issue with vocab_size. it is an internal only parameter, that is derived by most_common and preprocessing, it's not something to hyperopt on, but the pr searches over it, we should choose another parameter to search over

This should be fixed now!

In test_hyperopt I can see it though

Good catch, I've updated all hyperopt tests in test_hyperopt to use parameters that are user-facing now :)

Just to provide additional clarification: in the tests, when we define features like category_feature(), we provide a vocab_size. The reason there is that that function has two purposes: it creates a dictionary that can be both used by synthesize_data and as part of a ludwig config. vocab_size is a parameter for synthesize_data (it determines how many categories will there be in the generated data) and not a parameter for Ludwig config (because it is internally determined by ludwig).

So, its use in, say tests/integration_tests/test_hyperopt.py line 89 and 92 is completely fine, and indeed helps generating data with a small vocabulary, which in turn helps making those tests faster (although I think for category features, it should be minimum 3). So that should be kept. So I suggest reintroducing those uses of vocab_size.

What I was arguing for removal, and sorry if I wasn't more precise about it before, is the presence of vocab_size as a parameter in the hyperopt section of the test config, as there it is basically a config override of an internal only parameter. embedding_size on the other hand, is a good replacement parameter you are using instead of vocab_size, because it is actually a parameter we expect users of hyperopt to be able to change.

Another unrelated comment:

num_filters_search_space = [128, 512]
embedding_size_search_space = [128, 512]

Those are reasonable values user may set, although in these tests we want to make them as small as possible to make tests quick. Here i suggest something like 4 and 8 to be the two options.

arnavgarg1 · 2022-06-24T20:53:35Z

tests/integration_tests/test_hyperopt.py

That makes a lot of sense @w4nderlust , thanks for the in-depth clarification! I've added vocab_size back to any instance where a categorical feature is being created, but removed all instances where it was in the hyperopt search space. I've also reduced the sample space to be [4, 8] to speed up the tests.

…aller values

arnavgarg1 · 2022-06-25T23:59:37Z

@justinxzhao @ShreyaR @w4nderlust Does this PR look good to go? Happy to make any additional changes if needed!

tests/integration_tests/test_hyperopt.py

tgaddair

Open question in my mind is whether it's necessary to specify the input_features section, or if we want to just specify the feature type?

arnavgarg1 · 2022-06-27T22:31:57Z

Open question in my mind is whether it's necessary to specify the input_features section, or if we want to just specify the feature type?

@tgaddair I think that's a good question.

Ideally we should allow users to do both so they can control the level of granularity. If they want to only set defaults for a specific feature group (say input_feature vs output_feature) then they can add this, otherwise they could just do something like defaults.text.cell_type.

For now, doing it in the way I've implemented it technically gives users both levels of control because the users can just copy and paste the same thing in their configs but just replace input_features with output_features, which is only a little bit of additional work. If we see requests or demands to not use this distinction and allow setting defaults for inputs + outputs together in one go, it can be added in a v2.

Another reason to consider this distinction is that the parameters for input features and output features of the same type (e.g. text) can be different, so it's just neater to specify it with this distinction. We could theoretically infer whether the parameter is for the encoder or the decoder based on the parameter names if we want to, but I think it might be more error-prone in terms of usability. It could be confusing for an end user to specify these things in one place without any distinction of whether the parameter works for the encoder or the decoder or both, or only have it apply to say the output feature of that type.

Happy to discuss this further if you feel differently!

tgaddair · 2022-06-28T17:28:35Z

@arnavgarg1, a few things I would say as a counter-point to the above:

In most cases the user will not have more than one output feature (in the extreme, a handful), so I wouldn't over-optimize for the case where the user has many input and output features that need their own distinct defaults. The 99% scenario for this would be the user having a lot of input features, and then any output feature specific logic would make sense to belong to the specific output feature.
Once we implement the hierarchical encoder and decoder sections, it should be easy for users to separate the input and output feature specific sections.
This would be consistent with the existing behavior of the global preprocessing section (which we wish to replace with this defaults section).

arnavgarg1 · 2022-06-28T20:07:15Z

@tgaddair Makes sense, I'll update this to not be feature group (input or output) specific for now and make sure both input and output features use the same shared parameters.

…hout distinction

arnavgarg1 added 9 commits June 10, 2022 10:06

Added method to track eligible input features for shared parameters

68d7fba

Passing shared params features into _run_experiment

0ee2aa7

Refactor

f060a39

Almost working shared params

b6badde

Completely working shared params

fb10eec

Aligning variable names for consistency

dcdbd4a

Refactor logic for updating shared params to be more readable

7966ccc

Added docstrings

08da9e1

formatting

9d73eab

arnavgarg1 requested review from w4nderlust, tgaddair and ShreyaR June 13, 2022 20:15

arnavgarg1 self-assigned this Jun 13, 2022

pre-commit-ci bot and others added 3 commits June 13, 2022 20:17

[pre-commit.ci] auto fixes from pre-commit.com hooks

54eabeb

for more information, see https://pre-commit.ci

Removing unnecessary import of unittest.mock.DEFAULT from constants.py

392b5fe

Remove unnecessary import

fe55df4

arnavgarg1 marked this pull request as draft June 13, 2022 20:36

arnavgarg1 changed the title ~~feat: Shared Parameters For Hyperopt~~ enh: Shared Parameters For Hyperopt Jun 13, 2022

arnavgarg1 and others added 2 commits June 13, 2022 23:45

Added integration test for shared params hyperopt

8be8374

[pre-commit.ci] auto fixes from pre-commit.com hooks

41b67af

for more information, see https://pre-commit.ci

arnavgarg1 marked this pull request as ready for review June 14, 2022 00:19

This comment was marked as outdated.

Sign in to view

ShreyaR reviewed Jun 14, 2022

View reviewed changes

ludwig/hyperopt/execution.py Outdated Show resolved Hide resolved

ludwig/hyperopt/run.py Outdated Show resolved Hide resolved

arnavgarg1 and others added 3 commits June 14, 2022 19:20

Adding type hints to function definitions

28918fd

Merge branch 'hyperopt_shared_parameters' of https://github.com/ludwi…

8a0b82b

…g-ai/ludwig into hyperopt_shared_parameters

[pre-commit.ci] auto fixes from pre-commit.com hooks

6f9a6ce

for more information, see https://pre-commit.ci

arnavgarg1 requested review from ShreyaR and justinxzhao June 14, 2022 21:23

Adding markers

5ddcdd9

arnavgarg1 added 2 commits June 24, 2022 17:33

Resolving parameter names to be non-private attributes in tests

c5b234a

Force push to fix failing tests

7c2eff4

Add vocab_size back as parameter for tests and make sample options sm…

c1d74ee

…aller values

arnavgarg1 changed the title ~~feat: Shared Parameters For Hyperopt~~ feat: Adding feature type shared parameter capability for hyperopt Jun 25, 2022

ShreyaR reviewed Jun 27, 2022

View reviewed changes

tests/integration_tests/test_hyperopt.py Outdated Show resolved Hide resolved

tests/integration_tests/test_hyperopt.py Outdated Show resolved Hide resolved

tgaddair reviewed Jun 27, 2022

View reviewed changes

Resolving merge conflict

623f95f

arnavgarg1 added 2 commits June 28, 2022 18:32

Refactoring tests to run more efficiently

fde7f91

Changing conditional ray check

6fc20fe

arnavgarg1 added 7 commits June 28, 2022 21:03

Update to apply shared defaults to both input and output features wit…

2531b74

…hout distinction

Resolving merge conflicts

3829a57

Fix typo causing tests to fail

1fe4ae9

Fix typo causing test to fail

c084df1

Better value error message

9b8a5dc

Simplfying functions

5c2517e

Fixed type hints

1c9f65a

ShreyaR approved these changes Jun 29, 2022

View reviewed changes

arnavgarg1 merged commit 236cf54 into master Jun 29, 2022

arnavgarg1 deleted the hyperopt_shared_parameters branch June 29, 2022 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Adding feature type shared parameter capability for hyperopt #2133

feat: Adding feature type shared parameter capability for hyperopt #2133

arnavgarg1 commented Jun 13, 2022 •

edited

Loading

arnavgarg1 commented Jun 13, 2022

github-actions bot commented Jun 13, 2022 •

edited

Loading

This comment was marked as outdated.

arnavgarg1 commented Jun 23, 2022

w4nderlust commented Jun 24, 2022

arnavgarg1 commented Jun 24, 2022

w4nderlust commented Jun 24, 2022

arnavgarg1 commented Jun 24, 2022

arnavgarg1 commented Jun 25, 2022

tgaddair left a comment

arnavgarg1 commented Jun 27, 2022 •

edited

Loading

tgaddair commented Jun 28, 2022

arnavgarg1 commented Jun 28, 2022

feat: Adding feature type shared parameter capability for hyperopt #2133

feat: Adding feature type shared parameter capability for hyperopt #2133

Conversation

arnavgarg1 commented Jun 13, 2022 • edited Loading

arnavgarg1 commented Jun 13, 2022

github-actions bot commented Jun 13, 2022 • edited Loading

Unit Test Results

This comment was marked as outdated.

arnavgarg1 commented Jun 23, 2022

w4nderlust commented Jun 24, 2022

arnavgarg1 commented Jun 24, 2022

w4nderlust commented Jun 24, 2022

arnavgarg1 commented Jun 24, 2022

arnavgarg1 commented Jun 25, 2022

tgaddair left a comment

Choose a reason for hiding this comment

arnavgarg1 commented Jun 27, 2022 • edited Loading

tgaddair commented Jun 28, 2022

arnavgarg1 commented Jun 28, 2022

arnavgarg1 commented Jun 13, 2022 •

edited

Loading

github-actions bot commented Jun 13, 2022 •

edited

Loading

arnavgarg1 commented Jun 27, 2022 •

edited

Loading