-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adding feature type shared parameter capability for hyperopt #2133
Conversation
Working on writing some tests so will update this PR, but wanted to create the PR so we can test if it works. |
This comment was marked as outdated.
This comment was marked as outdated.
…g-ai/ludwig into hyperopt_shared_parameters
for more information, see https://pre-commit.ci
This should be fixed now! |
In test_hyperopt I can see it though |
Good catch, I've updated all hyperopt tests in test_hyperopt to use parameters that are user-facing now :) |
Just to provide additional clarification: in the tests, when we define features like So, its use in, say tests/integration_tests/test_hyperopt.py line 89 and 92 is completely fine, and indeed helps generating data with a small vocabulary, which in turn helps making those tests faster (although I think for category features, it should be minimum 3). So that should be kept. So I suggest reintroducing those uses of What I was arguing for removal, and sorry if I wasn't more precise about it before, is the presence of Another unrelated comment:
Those are reasonable values user may set, although in these tests we want to make them as small as possible to make tests quick. Here i suggest something like 4 and 8 to be the two options. |
That makes a lot of sense @w4nderlust , thanks for the in-depth clarification! I've added |
@justinxzhao @ShreyaR @w4nderlust Does this PR look good to go? Happy to make any additional changes if needed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open question in my mind is whether it's necessary to specify the input_features
section, or if we want to just specify the feature type?
@tgaddair I think that's a good question. Ideally we should allow users to do both so they can control the level of granularity. If they want to only set defaults for a specific feature group (say For now, doing it in the way I've implemented it technically gives users both levels of control because the users can just copy and paste the same thing in their configs but just replace Another reason to consider this distinction is that the parameters for input features and output features of the same type (e.g. text) can be different, so it's just neater to specify it with this distinction. We could theoretically infer whether the parameter is for the encoder or the decoder based on the parameter names if we want to, but I think it might be more error-prone in terms of usability. It could be confusing for an end user to specify these things in one place without any distinction of whether the parameter works for the encoder or the decoder or both, or only have it apply to say the output feature of that type. Happy to discuss this further if you feel differently! |
@arnavgarg1, a few things I would say as a counter-point to the above:
|
@tgaddair Makes sense, I'll update this to not be feature group (input or output) specific for now and make sure both input and output features use the same shared parameters. |
This change enables using the
defaults
keyword within hyperopt to set default parameters for feature groups. This will help add search spaces more concisely for datasets with a large number of features while also reducing the search space to allow for a deeper search during hyperopt.For e.g., a user can now add
in their hyperopt search space so that all text features (inputs and outputs) use the sampled
cell_type
for that particular trial.The API for this change is as follows:
defaults.<feature_type>.<parameter>
, where:feature_type
: a valid input or output feature type, for e.g., text/category/sequence etc.parameter
: the parameter to search over for that particular feature type, for e.g.,embedding_size
for category featuresExample Configuration:
_Note: The output
status
that gets logged every few seconds will show the parameter asdefaults.text.cell_type
in the printed table with each trial status, but underneath the surface, this is getting updated correctly. This is unfortunately not something we can control, but I think it is okay to leave it this way anyway because the table width would become too large if we were to add a column for each feature whose default parameter was updated in the trial status table that gets printed._