1/n - remove TiedEmbeddingTransformerDecoder from qwen #1547

felipemello1 · 2024-09-11T23:38:18Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

we dont need TiedEmbeddingTransformerDecoder if we pass output_proj as a lambda

Changelog

pass output_proj as lambda to the model TransformerDecoder
deprecate TiedEmbeddingTransformerDecoder, as qwen is the only model using it. Gemma has its own transformer, and will be deprecated in a follow up PR

Test plan

tune run full_finetune_single_device --config qwen2/0.5B_full_single_device batch_size=64 max_steps_per_epoch=30 metric_logger=torchtune.training.metric_logging.WandBLogger

tune run --nnodes 1 --nproc_per_node 8 full_finetune_distributed --config qwen2/1.5B_full batch_size=8 max_steps_per_epoch=20 metric_logger=torchtune.training.metric_logging.WandBLogger gradient_accumulation_steps=1 epochs=1

resume from checkpoint:

ran it twice, the second time using the previous checkpoint

tune run --nnodes 1 --nproc_per_node 8 lora_finetune_distributed --config qwen2/0.5B_lora batch_size=8 max_steps_per_epoch=20 metric_logger=torchtune.training.metric_logging.WandBLogger gradient_accumulation_steps=1 epochs=2 compile=True

also added to the transformer this code to check if the weights were still tied:

all_close = torch.allclose(self.output(h), torch.nn.functional.linear(h, self.tok_embeddings.weight), atol=1e-5)
if not all_close:
    print("not all close")
else:
    print("all close")

pytorch-bot · 2024-09-11T23:38:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1547

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b26b4fc with merge base d7fae96 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

RdoubleA

Great to see this. I remember the lambda has some weird interactions with FSDP in the past but that may not be the case with FSDP2. As long as you're able to test on a distributed recipe and get identical loss, I have no concerns. Stamping to unblock

felipemello1 · 2024-09-12T00:25:23Z

torchtune/modules/transformer.py

@@ -10,8 +10,8 @@
 import torch.nn.functional as F
 from torch import nn
 from torchtune.modules import MultiHeadAttention
-


PS: I forgot to update the typehint of the TransformerDecoder, saying that output can now be a callable. To avoid rerunning tests, this will come in a followin gemma PR

ebsmothers

This looks great. One other request for a sanity check prior to landing: please make sure that you're able to save the checkpoint then resuming training again properly (especially for a distributed run). Other than that no concerns from me!

codecov-commenter · 2024-09-12T02:56:53Z

Codecov Report

Attention: Patch coverage is 76.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 73.35%. Comparing base (221031a) to head (a0bd26b).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
torchtune/modules/tied_linear.py	60.00%	4 Missing ⚠️
torchtune/models/qwen2/_component_builders.py	75.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1547      +/-   ##
==========================================
- Coverage   73.36%   73.35%   -0.02%     
==========================================
  Files         287      288       +1     
  Lines       14142    14151       +9     
==========================================
+ Hits        10375    10380       +5     
- Misses       3767     3771       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

joecummings · 2024-09-12T17:37:27Z

torchtune/modules/__init__.py

@@ -32,7 +33,7 @@
    "KVCache",
    "RotaryPositionalEmbeddings",
    "RMSNorm",
-    "Fp32LayerNorm",
+    "TiedLinear" "Fp32LayerNorm",


formatting weird?

joecummings

two nits

joecummings · 2024-09-12T17:38:31Z

torchtune/modules/transformer.py

@@ -516,6 +516,10 @@ def forward(
        return output


+@deprecated(


Tell them to use TransformerDecoder WITH TiedLinear.

Co-authored-by: Felipe Mello <[email protected]>

Felipe Mello added 9 commits September 4, 2024 12:28

use identity for dropout if 0

4690b9b

update model builders

17e6d79

add lora dropout to configs

b0154b9

typo

e27f736

Merge branch 'main' into set_dropout_zero

051f472

add missing lora dropout

002d67f

Merge branch 'main' into remove_tied_embeddings

d430c1f

update qwen

6adf19f

change typehint

c6dd298

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 11, 2024

Felipe Mello added 2 commits September 11, 2024 16:41

import deprecated

a55f9ae

update import

b427bf5

RdoubleA approved these changes Sep 11, 2024

View reviewed changes

felipemello1 commented Sep 12, 2024

View reviewed changes

ebsmothers reviewed Sep 12, 2024

View reviewed changes

Felipe Mello added 2 commits September 11, 2024 19:15

add tied linear

f54904e

remove unused import

a0bd26b

felipemello1 mentioned this pull request Sep 12, 2024

2/n - Make Gemma use regular TransformerDecoder #1553

Merged

4 tasks

felipemello1 changed the title ~~remove TiedEmbeddingTransformerDecoder from qwen~~ 1/n - remove TiedEmbeddingTransformerDecoder from qwen Sep 12, 2024

felipemello1 mentioned this pull request Sep 12, 2024

Remove TiedEmbeddingTransformerDecoder (and GemmaTransformerDecoder) #1454

Closed

joecummings reviewed Sep 12, 2024

View reviewed changes

joecummings approved these changes Sep 12, 2024

View reviewed changes

fix comments

b26b4fc

felipemello1 merged commit 7c51100 into pytorch:main Sep 12, 2024
17 checks passed

felipemello1 deleted the remove_tied_embeddings branch September 12, 2024 18:20

ebsmothers pushed a commit that referenced this pull request Sep 17, 2024

1/n - remove TiedEmbeddingTransformerDecoder from qwen (#1547)

7cf9b26

Co-authored-by: Felipe Mello <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1/n - remove TiedEmbeddingTransformerDecoder from qwen #1547

1/n - remove TiedEmbeddingTransformerDecoder from qwen #1547

felipemello1 commented Sep 11, 2024 •

edited

Loading

pytorch-bot bot commented Sep 11, 2024 •

edited

Loading

RdoubleA left a comment

felipemello1 Sep 12, 2024

ebsmothers left a comment

codecov-commenter commented Sep 12, 2024

joecummings Sep 12, 2024

joecummings left a comment

joecummings Sep 12, 2024

1/n - remove TiedEmbeddingTransformerDecoder from qwen #1547

1/n - remove TiedEmbeddingTransformerDecoder from qwen #1547

Conversation

felipemello1 commented Sep 11, 2024 • edited Loading

Context

Changelog

Test plan

resume from checkpoint:

pytorch-bot bot commented Sep 11, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1547

✅ No Failures

RdoubleA left a comment

Choose a reason for hiding this comment

felipemello1 Sep 12, 2024

Choose a reason for hiding this comment

ebsmothers left a comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 12, 2024

Codecov Report

joecummings Sep 12, 2024

Choose a reason for hiding this comment

joecummings left a comment

Choose a reason for hiding this comment

joecummings Sep 12, 2024

Choose a reason for hiding this comment

felipemello1 commented Sep 11, 2024 •

edited

Loading

pytorch-bot bot commented Sep 11, 2024 •

edited

Loading