-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow user to specify huggingface link or local path to pretrained lora weights #3572
Conversation
…trained_adapter_weights
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one additional comment, it shouldn't be blocking if we want to land this.
tests/integration_tests/test_llm.py
Outdated
} | ||
config_obj = ModelConfig.from_dict(config) | ||
assert config_obj.input_features[0].preprocessing.max_sequence_length is None | ||
assert config_obj.output_features[0].preprocessing.max_sequence_length is None | ||
|
||
|
||
def test_load_pretrained_adapter_weights(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of tests we should add (possibly in a followup PR):
- Checking a null input
- Checking an invalid weights path
ludwig/schema/llms/peft.py
Outdated
target_modules: Optional[list] = schema_utils.List( | ||
str, | ||
default=None, | ||
allow_none=True, | ||
description="List of modules to apply Lora to. If None, apply to all modules.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall this causing an error if this wasn't set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! I would be good to know what the error was exactly so we can understand it and also leave a comment to explain it - might be useful when we come back to it in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly, there was an error involving target_modules not being a parameter of a LoraConfig.
ludwig/trainers/trainer.py
Outdated
try: | ||
loss.requires_grad = True | ||
except RuntimeError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have to add this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a workaround I had because, when the lora weights were loaded in, some of the loss functions did not have requires_grad set to True. However, without the try-except block, this would try to set requires_grad to True for some intermediate loss functions, which isn't valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions:
- Why would loading lora weights have loss functions with
requires_grad != True
? - What does the training error look like when some loss functions have
requires_grad != True
? - Do you have an example of an intermediate loss function that raises an error when you try to set
requires_grad=True
?
At least, we should be adding a comment explaining why this is here, e.g.
"When loading adpater weights from huggingface or a local path, some of the loss functions do not have
requires_grad=True
.requires_grad=True
is necessary for back-propogation, but we wrap this in a try/except because some intermediate losses like __ raise an error ifrequires_grad
is explicitly set toTrue
in this way."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate a bit more? What are the intermediate loss functions? I'm also not totally sure how wrapping the model with the adapter is causing these issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @justinxzhao and have the same questions as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I'm not quite sure on why this was the case. My hypothesis is that, when lora weights are loaded, they overwrite some of the parameters of certain layers.
- When some loss functions have requires_grad != True, training stops and errors out.
- I don't have an example of this, but I think my terminology was incorrect here. Specifically, requires_grad can only be changed on leaf variables, so if there is a loss function that is not a leaf node, setting requires_grad=True would cause an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, when a PEFT pretrained adapter is loaded in, is it set to training mode or inference mode? Does that maybe have something to do with requires_grad not being set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, definitely worth checking if the module is being loaded in eval mode, which would explain requires_grad=False
. Some useful references:
ludwig/constants.py
Outdated
@@ -282,6 +282,7 @@ | |||
GENERATION = "generation" | |||
PROMPT = "prompt" | |||
ADAPTER = "adapter" | |||
PRETRAINED_WEIGHTS = "pretrained_weights" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, might be more clear to call it pretrained_adapter_weights
since pretrained weights also come from the model! So just to avoid confusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On it
ludwig/models/llm.py
Outdated
if param_name is None: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would param_name
be None? This dictionary is the parameters for the PEFT adapter that we already have in the schema right? For e.g., for LoRA, it will have r
, alpha
, bias
etc. If so, I'd assume the value can be none, but name being None feels a bit odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct. This should be param_value.
ludwig/config_validation/checks.py
Outdated
@@ -477,7 +477,7 @@ def check_llm_finetuning_output_feature_config(config: "ModelConfig"): # noqa: | |||
if config.model_type != MODEL_LLM: | |||
return | |||
|
|||
if config.trainer.type != "finetune": | |||
if config.trainer.type != "finetune" and config.adapter.pretrained_adapter_weights is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an OR? Why does specifying pretrained_adapter_weights
no longer require that the first output feature be TEXT?
Or is it that we want to make it so that using the none trainer type doesn't require an output feature?
if config.trainer.type == "none":
return
CC: @arnavgarg1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was an oversight on my part. I was trying to go through the code to see where my change might break something down the line, and I might have gotten a little overzealous.
ludwig/config_validation/checks.py
Outdated
@@ -493,6 +493,9 @@ def check_llm_finetuning_trainer_config(config: "ModelConfig"): # noqa: F821 | |||
if config.model_type != MODEL_LLM: | |||
return | |||
|
|||
if config.trainer.type == "none" and config.adapter.pretrained_adapter_weights is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be more simply
if config.trainer.type == "none":
# The NoneTrainer for ZS is valid.
return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in this case, we would load in untrained LoRA weights if pretrained adapter weights weren't specified in the config, right? Would that be a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Allows user to specify huggingface link or local path to adapter weights. Tested using Arnav's Code Alpaca V3 model (loaded the V3 adapter weights using this change and ran the results through human-eval -- scores matched those of the original V3 model)