Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow user to specify huggingface link or local path to pretrained lora weights #3572

Merged
merged 18 commits into from
Sep 12, 2023

Conversation

Infernaught
Copy link
Contributor

@Infernaught Infernaught commented Aug 31, 2023

Allows user to specify huggingface link or local path to adapter weights. Tested using Arnav's Code Alpaca V3 model (loaded the V3 adapter weights using this change and ran the results through human-eval -- scores matched those of the original V3 model)

ludwig/models/llm.py Outdated Show resolved Hide resolved
ludwig/schema/llms/peft.py Outdated Show resolved Hide resolved
ludwig/models/llm.py Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Sep 1, 2023

Unit Test Results

  6 files  ±       0    6 suites  ±0   1h 12m 13s ⏱️ - 17m 25s
34 tests  - 2 792  29 ✔️  - 2 759    5 💤  - 7  0  - 26 
88 runs   - 2 781  72 ✔️  - 2 750  16 💤  - 5  0  - 26 

Results for commit effcce0. ± Comparison against base commit 63f4924.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@jeffkinnison jeffkinnison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one additional comment, it shouldn't be blocking if we want to land this.

}
config_obj = ModelConfig.from_dict(config)
assert config_obj.input_features[0].preprocessing.max_sequence_length is None
assert config_obj.output_features[0].preprocessing.max_sequence_length is None


def test_load_pretrained_adapter_weights():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of tests we should add (possibly in a followup PR):

  • Checking a null input
  • Checking an invalid weights path

Comment on lines 76 to 81
target_modules: Optional[list] = schema_utils.List(
str,
default=None,
allow_none=True,
description="List of modules to apply Lora to. If None, apply to all modules.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall this causing an error if this wasn't set.

Copy link
Contributor

@arnavgarg1 arnavgarg1 Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I would be good to know what the error was exactly so we can understand it and also leave a comment to explain it - might be useful when we come back to it in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly, there was an error involving target_modules not being a parameter of a LoraConfig.

Comment on lines 275 to 278
try:
loss.requires_grad = True
except RuntimeError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to add this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a workaround I had because, when the lora weights were loaded in, some of the loss functions did not have requires_grad set to True. However, without the try-except block, this would try to set requires_grad to True for some intermediate loss functions, which isn't valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions:

  1. Why would loading lora weights have loss functions with requires_grad != True?
  2. What does the training error look like when some loss functions have requires_grad != True?
  3. Do you have an example of an intermediate loss function that raises an error when you try to set requires_grad=True?

At least, we should be adding a comment explaining why this is here, e.g.

"When loading adpater weights from huggingface or a local path, some of the loss functions do not have requires_grad=True. requires_grad=True is necessary for back-propogation, but we wrap this in a try/except because some intermediate losses like __ raise an error if requires_grad is explicitly set to True in this way."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate a bit more? What are the intermediate loss functions? I'm also not totally sure how wrapping the model with the adapter is causing these issues?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @justinxzhao and have the same questions as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm not quite sure on why this was the case. My hypothesis is that, when lora weights are loaded, they overwrite some of the parameters of certain layers.
  2. When some loss functions have requires_grad != True, training stops and errors out.
  3. I don't have an example of this, but I think my terminology was incorrect here. Specifically, requires_grad can only be changed on leaf variables, so if there is a loss function that is not a leaf node, setting requires_grad=True would cause an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, when a PEFT pretrained adapter is loaded in, is it set to training mode or inference mode? Does that maybe have something to do with requires_grad not being set?

Copy link
Contributor

@justinxzhao justinxzhao Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, definitely worth checking if the module is being loaded in eval mode, which would explain requires_grad=False. Some useful references:

@@ -282,6 +282,7 @@
GENERATION = "generation"
PROMPT = "prompt"
ADAPTER = "adapter"
PRETRAINED_WEIGHTS = "pretrained_weights"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, might be more clear to call it pretrained_adapter_weights since pretrained weights also come from the model! So just to avoid confusion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On it

Comment on lines 237 to 238
if param_name is None:
continue
Copy link
Contributor

@arnavgarg1 arnavgarg1 Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would param_name be None? This dictionary is the parameters for the PEFT adapter that we already have in the schema right? For e.g., for LoRA, it will have r, alpha, bias etc. If so, I'd assume the value can be none, but name being None feels a bit odd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct. This should be param_value.

@Infernaught Infernaught changed the title Add functionality for pretrained lora weights Allow user to specify huggingface link or local path to pretrained lora weights Sep 7, 2023
@@ -477,7 +477,7 @@ def check_llm_finetuning_output_feature_config(config: "ModelConfig"): # noqa:
if config.model_type != MODEL_LLM:
return

if config.trainer.type != "finetune":
if config.trainer.type != "finetune" and config.adapter.pretrained_adapter_weights is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an OR? Why does specifying pretrained_adapter_weights no longer require that the first output feature be TEXT?

Or is it that we want to make it so that using the none trainer type doesn't require an output feature?

if config.trainer.type == "none":
    return

CC: @arnavgarg1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was an oversight on my part. I was trying to go through the code to see where my change might break something down the line, and I might have gotten a little overzealous.

@@ -493,6 +493,9 @@ def check_llm_finetuning_trainer_config(config: "ModelConfig"): # noqa: F821
if config.model_type != MODEL_LLM:
return

if config.trainer.type == "none" and config.adapter.pretrained_adapter_weights is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be more simply

if config.trainer.type == "none":
    # The NoneTrainer for ZS is valid.
    return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in this case, we would load in untrained LoRA weights if pretrained adapter weights weren't specified in the config, right? Would that be a problem?

ludwig/models/llm.py Show resolved Hide resolved
tests/integration_tests/test_llm.py Outdated Show resolved Hide resolved
Copy link
Contributor

@justinxzhao justinxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@Infernaught Infernaught merged commit 6178b48 into master Sep 12, 2023
@Infernaught Infernaught deleted the pretrained_adapter_weights branch September 12, 2023 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants