QLoRA with bias + Llama 3.2 Vision QLoRA configs #1726

ebsmothers · 2024-10-01T00:57:37Z

After opening pytorch/ao#979 on torchao, it was pointed out to me that I was overcomplicating things.. we can just keep the bias in bf16, which is apparently a pretty standard thing to do (ref).

So this PR does exactly that.. just let bias stay in the higher precision for our FrozenNF4Linear, LoRALinear, and DoRALinear when we set quantize_base=True.

Test plan

Added test cases for LoRALinear and FrozenNF4Linear with bias=True.

Fun fact I discovered while writing the LoRALinear test: if x, weight, bias are all bf16 then F.linear(x, weight, bias) != F.linear(x, weight) + bias (repro here). So I left that test case out.

E2E tests:

tune run lora_finetune_single_device --config llama3_2_vision/11B_qlora_single_device \
metric_logger=torchtune.training.metric_logging.WandBLogger metric_logger.project=qlora-32-vision \
metric_logger.name=single-device max_steps_per_epoch=500

and

tune run --nproc_per_node 2 lora_finetune_distributed --config llama3_2_vision/11B_qlora \
metric_logger=torchtune.training.metric_logging.WandBLogger metric_logger.project=qlora-32-vision \
metric_logger.name=distributed max_steps_per_epoch=500

Loss curves from both these runs (compared to analogous LoRA runs as a baseline):

Also run with DoRA applied to MLP and output layers as a sanity check:

tune run lora_finetune_single_device --config llama3_2_vision/11B_qlora_single_device \
metric_logger=torchtune.training.metric_logging.WandBLogger metric_logger.project=qlora-32-vision \
metric_logger.name=single-device max_steps_per_epoch=5 model.apply_lora_to_mlp=True model.apply_lora_to_output=True model.use_dora=True max_steps_per_epoch=5 gradient_accumulation_steps=1
...

pytorch-bot · 2024-10-01T00:57:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1726

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 09491e6 with merge base 17ba37d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

joecummings

Brilliant work. The kind of work they sing about in old minstrel songs.

One question.

joecummings · 2024-10-24T14:29:02Z

recipes/configs/llama3_2_vision/11B_lora.yaml

@@ -81,7 +81,7 @@ enable_activation_offloading: False
 dtype: bf16

 # Logging
-output_dir: /tmp/full-llama3.2-vision-finetune
+output_dir: /tmp/lora-llama3.2-vision-finetune


joecummings · 2024-10-24T14:30:51Z

tests/torchtune/modules/low_precision/test_nf4_linear.py

@@ -59,9 +55,10 @@ def test_state_dict(self, dtype):
        assert isinstance(state_dict["weight"], NF4Tensor)

    @pytest.mark.parametrize("dtype", [torch.bfloat16, torch.float32])
-    def test_output_dtype(self, dtype):
+    @pytest.mark.parametrize("bias", [True, False])


What is the point of adding bias to this test? The dtype isn't changing and you're only checking the dtype?

Agreed it's pretty trivial but I'd like to at least build FrozenNF4Linear with bias somewhere in our unit tests, and the overhead of this unit test is tiny

joecummings · 2024-10-24T14:31:21Z

tests/torchtune/modules/peft/test_lora.py

+                    use_bias=use_bias,
+                    quantize_base=True,
+                )
+                # fixed_init_model(qlora_linear, dtype=torch.bfloat16)


Did you mean to comment this out?

Oh yeah oops, lemme update

pbontrager

Thank you Evan for adding this! Added mostly questions, but overall looks good. Could you also run a test with lora applied to mlp and output as well, since those also have bias values?

pbontrager · 2024-10-24T14:30:20Z

torchtune/modules/low_precision/nf4_linear.py


    Args:
        in_dim (int): input dimension
        out_dim (int): output dimension
        device (Optional[torch.device]): device to use for the underlying weight. If ``None``, uses the default
            device given by `torch.get_default_device()`.
+        bias (bool): whether to include bias in the original linear layer. Default: False


nit: remove "original"

torchtune/modules/low_precision/nf4_linear.py

pbontrager · 2024-10-24T14:37:23Z

torchtune/modules/peft/lora.py

@@ -123,6 +119,8 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
        """
        if self._quantize_base:
            out = linear_nf4(input=x, weight=self.weight)
+            if self.use_bias:


Why do we use torchao linear_nf4 here and not our own?

Not sure I follow.. we don't have our own linear_nf4. Even our FrozenNF4Linear is just a wrapper around their linear_nf4

I meant, why do we not reuse the FrozenNF4Linear class here so we don't have to define this bias solution in multiple places?

Discussed a bit offline. It's a nice idea but going to punt on it for now

pbontrager · 2024-10-24T14:57:34Z

recipes/configs/llama3_2_vision/11B_qlora.yaml

+
+# Model arguments
+model:
+  _component_: torchtune.models.llama3_2_vision.qlora_llama3_2_vision_11b


Can you confirm that this is identical to lora except for this line? Whenever you do a merge you should re-check that assumption.

Yep, did this

pbontrager · 2024-10-24T17:41:01Z

torchtune/modules/low_precision/nf4_linear.py

        self.weight.requires_grad_(False)
+        if self.bias is not None:
+            self.bias.requires_grad_(False)
        self.nf4_weight = to_nf4(self.weight)


nit: do we want this "self." here?

Yeah good point, maybe not strictly necessary

codecov-commenter · 2024-10-25T00:03:27Z

Codecov Report

Attention: Patch coverage is 92.63158% with 7 lines in your changes missing coverage. Please review.

Project coverage is 67.89%. Comparing base (73aa126) to head (09491e6).
Report is 10 commits behind head on main.

Files with missing lines	Patch %	Lines
...tune/models/llama3_2_vision/_component_builders.py	46.15%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1726      +/-   ##
==========================================
- Coverage   70.25%   67.89%   -2.36%     
==========================================
  Files         309      308       -1     
  Lines       16285    16301      +16     
==========================================
- Hits        11441    11068     -373     
- Misses       4844     5233     +389

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ebsmothers · 2024-10-25T00:33:29Z

torchtune/models/llama3_2_vision/_component_builders.py

+    encoder = Llama3VisionEncoder(clip=clip, projection_head=projection_head)
+
+    if quantize_base:
+        # For QLoRA, we reparametrize 4-bit tensors to bf16, and offload to CPU on the fly
+        # so as to not increase peak memory
+        encoder._register_state_dict_hook(
+            partial(reparametrize_as_dtype_state_dict_post_hook, offload_to_cpu=True)
+        )
+
+    return encoder


A bunch of miscellaneous linter changes in this file, this is the only one of substance

ebsmothers · 2024-10-25T00:34:15Z

torchtune/modules/model_fusion/_fusion.py

-        new_key = prefix + "embedding.weight"
-        state_dict[new_key] = state_dict[key]
-        del state_dict[key]
+        if state_dict:


@pbontrager this is my hack to support DoRA. Lmk if any concerns

I saw why you do this. This seems like a general safety check that would be good to have in all of the load_state_dict hooks as this case could come up anytime "strict=False". I'll approve this but could you add this change here or in a followup in the other load hooks?

ebsmothers added 3 commits September 30, 2024 17:51

[wip] QLoRA with bias + Llama 3.2 Vision QLoRA configs

c4a12c0

couple config fixes

fc662fd

couple config fixes

8c7cd3d

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 1, 2024

ebsmothers marked this pull request as draft October 1, 2024 14:07

joecummings mentioned this pull request Oct 15, 2024

v0.4.0 release tracker #1747

Closed

34 tasks

ebsmothers added 2 commits October 23, 2024 08:08

merge

61f20d2

unit tests

63057b3

ebsmothers changed the title ~~[wip] QLoRA with bias + Llama 3.2 Vision QLoRA configs~~ QLoRA with bias + Llama 3.2 Vision QLoRA configs Oct 23, 2024

ebsmothers marked this pull request as ready for review October 23, 2024 20:59

ebsmothers requested review from joecummings, pbontrager and felipemello1 October 23, 2024 20:59

ebsmothers added 3 commits October 23, 2024 14:28

bias=true in clip_mlp

d5960b1

remove debug code

9112c6a

some config merges

aaaaffb

joecummings approved these changes Oct 24, 2024

View reviewed changes

pbontrager reviewed Oct 24, 2024

View reviewed changes

ebsmothers added 4 commits October 24, 2024 13:43

address comments

7e2c953

update dora (and unit test), fix checkpoint save

46f6dd0

cleanup debug code, v nice state dict hook workaround

b197368

fix failing unit test

09491e6

ebsmothers commented Oct 25, 2024

View reviewed changes

pbontrager merged commit e030626 into pytorch:main Oct 25, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QLoRA with bias + Llama 3.2 Vision QLoRA configs #1726

QLoRA with bias + Llama 3.2 Vision QLoRA configs #1726

ebsmothers commented Oct 1, 2024 •

edited

Loading

pytorch-bot bot commented Oct 1, 2024 •

edited

Loading

joecummings left a comment

joecummings Oct 24, 2024

joecummings Oct 24, 2024

ebsmothers Oct 24, 2024

joecummings Oct 24, 2024

ebsmothers Oct 24, 2024

pbontrager left a comment •

edited

Loading

pbontrager Oct 24, 2024

pbontrager Oct 24, 2024

ebsmothers Oct 24, 2024

pbontrager Oct 24, 2024

ebsmothers Oct 25, 2024

pbontrager Oct 24, 2024

ebsmothers Oct 24, 2024

pbontrager Oct 24, 2024

ebsmothers Oct 24, 2024

codecov-commenter commented Oct 25, 2024

ebsmothers Oct 25, 2024

ebsmothers Oct 25, 2024

pbontrager Oct 25, 2024

QLoRA with bias + Llama 3.2 Vision QLoRA configs #1726

QLoRA with bias + Llama 3.2 Vision QLoRA configs #1726

Conversation

ebsmothers commented Oct 1, 2024 • edited Loading

Test plan

pytorch-bot bot commented Oct 1, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1726

✅ No Failures

joecummings left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pbontrager left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Oct 25, 2024

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebsmothers commented Oct 1, 2024 •

edited

Loading

pytorch-bot bot commented Oct 1, 2024 •

edited

Loading

pbontrager left a comment •

edited

Loading