-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E2E Tutorial #690
E2E Tutorial #690
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/690
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ac0aae2 with merge base a783aca (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
will depend on factors such as the model, amount and nature of training data, your hardware | ||
setup and the end task for which the model will be used | ||
- Evaluate the model on some benchmarks to validate model quality | ||
- Run some generations to make sure the model output looks reasonable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would switch "run generations" and "evaluate model". I find it faster and easier to run generations to check reasonable output first.
docs/source/examples/e2e_flow.rst
Outdated
TorchTune, and how TorchTune makes it easy to use popular tools and libraries from the ecosystem. | ||
|
||
We'll use the Llama2 7B model for this tutorial. You can find a complete set of models supported | ||
by TorchTune `here <https://github.com/pytorch/torchtune/blob/main/README.md#introduction>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change this eventually. The official set of models should live in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not seeing Gemma2b in that link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh we need to update the README to include Gemma2 - thats next on deck!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add it here - #668
freezes the base LLM and adds a very small percentage of learnable parameters. This helps keep | ||
memory associated with gradients and optimizer state low. Using TorchTune, you should be able to | ||
fine-tune a Llama2 7B model with LoRA in less than 16GB of GPU memory using bfloat16 on a | ||
RTX 3090/4090. For more information on how to use LoRA, take a look at our |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add link for RTX 3090/4090
docs/source/examples/e2e_flow.rst
Outdated
The "merged weights" (see the :ref:`LoRA Tutorial <lora_finetune_label>` for more details) | ||
are split across two checkpoint files similar to the source checkpoints from the HF Hub. | ||
In fact the keys would be identical between these checkpoints. For more details see the | ||
checkpointing tutorial. We also have a third checkpoint file which is much smaller in size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to checkpointing tutorial
Run Evaluation using EleutherAI's Eval Harness | ||
---------------------------------------------- | ||
|
||
We've fine-tuned a model. But how well does this model really do? Let's run some Evaluations! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've fine-tuned a model. But how well does this model really do? Let's run some Evaluations! | |
We've fine-tuned a model. But how well does this model really do? Let's run some evaluations! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about this one!
docs/source/examples/e2e_flow.rst
Outdated
|
||
|
||
Once the config is updated, let's kick off evaluation! We'll use the | ||
``truthfulqa_mc2`` task which is also the default in the config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link to what the TruthfulQA MC2 task is and also add a sentence explaining it.
docs/source/examples/e2e_flow.rst
Outdated
|
||
[evaluator.py:324] Running loglikelihood requests | ||
[eleuther_eval.py:195] Eval completed in 121.27 seconds. | ||
[eleuther_eval.py:197] truthfulqa_mc2: {'acc,none': 0.48919958543950917 ...} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stay tuned, I want to change our output for this.
docs/source/examples/e2e_flow.rst
Outdated
[eleuther_eval.py:195] Eval completed in 121.27 seconds. | ||
[eleuther_eval.py:197] truthfulqa_mc2: {'acc,none': 0.48919958543950917 ...} | ||
|
||
So seems like our fine-tuned model gets ~48% on this task. Which is pretty good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So seems like our fine-tuned model gets ~48% on this task. Which is pretty good. | |
So seems like our fine-tuned model gets ~48% on this task, which is pretty good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this line is awkward. Why not just combine with the subsequent sentences plus a bit describing what truthfulqa_mc2 is actually doing? Basically something like:
The Truthful QA dataset measures a model's propensity to be truthful when answering questions. Specifically, we will evaluate our model on the truthfulqa_mc2 task, which measures the model's zero-shot accuracy on a question followed by one or more true responses and one or more false responses. We can run evaluation on our downloaded checkpoints first as a baseline
... (hopefully the command should just be a config change to the directory)
Now, we evaluate our fine-tuned model
...
We can see that the fine-tuned model yields an 8% overall improvement in zero-shot accuracy on this task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to Evan's paragraphs, much more compelling and informative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
Once the config is updated, let's kick off quantization! We'll use the default | ||
quantization method from the config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if for each of these recipes (where it's relevant), we want to point to something like "for all available quantization/(fine-tuning) methods available in TorchTune, see here"
|
||
.. code-block:: yaml | ||
|
||
checkpointer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, I am seeing a lot of checkpointer across all these various tasks. Wondering if for some of the cases we want to just override things via CLI? E.g. in some cases (though not here) can't we just set checkpointer.checkpoint_dir=<checkpoint_dir>
and call it a day? Or is that too much of a black box/defeats the purpose of showing off tune cp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope this is a good point. I modifed this in the first one but then leaving as is when we have a ton of files to change
@@ -53,7 +53,7 @@ def _setup_model( | |||
with utils.set_default_dtype(self._dtype), self._device: | |||
model = config.instantiate(model_cfg) | |||
|
|||
model.load_state_dict(model_state_dict, assign=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why's this change needed when using HF?
docs/source/examples/e2e_flow.rst
Outdated
-------- | ||
|
||
Fine-tuning an LLM is almost never itself the end goal. Usually this is one step in a much | ||
larger worfklow. An example workflow might look something like this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
larger worfklow. An example workflow might look something like this: | |
larger workflow. An example workflow might look something like this: |
docs/source/examples/e2e_flow.rst
Outdated
setup and the end task for which the model will be used | ||
- Evaluate the model on some benchmarks to validate model quality | ||
- Run some generations to make sure the model output looks reasonable | ||
- Quantize the model for efficient inference followed by optionally exporting it for specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a standard offramp or should this bullet be about more general offramping?
docs/source/examples/e2e_flow.rst
Outdated
TorchTune, and how TorchTune makes it easy to use popular tools and libraries from the ecosystem. | ||
|
||
We'll use the Llama2 7B model for this tutorial. You can find a complete set of models supported | ||
by TorchTune `here <https://github.com/pytorch/torchtune/blob/main/README.md#introduction>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add it here - #668
|
||
Indeed, the bridge is pretty cool! Seems like our LLM knows what it's talking | ||
about. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add some intuition on what we should be looking for when checking the generated output? How do we know something is off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too subjective I think
docs/source/examples/e2e_flow.rst
Outdated
----------------------------------------- | ||
|
||
We saw that the generation recipe took around 11.6 seconds to generate 300 tokens. | ||
One technique commonly used to speed up inference is quantization. TorchTune provides |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain briefly what quantization is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this and/or provide a link to a reference or 4-bit weights-only quantization
docs/source/examples/e2e_flow.rst
Outdated
.. note:: | ||
Unlike the fine-tuned checkpoints, this output a single checkpoint file. This is | ||
because our quantization APIs currently don't support any conversion across formats. | ||
As a result you won't be able to use these quantized models outside of TorchTune. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but can you export these quantized models with executorch? or HF inference? (sorry not familiar with our offramp options)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now unfortunately no. Quantized models tie to a particular model definition. So we dont have any way to use these outside tune.
One high level comment - will this replace the first fine-tuning tutorial? It seems there's a lot of overlap between the two |
docs/source/examples/e2e_flow.rst
Outdated
------------------------------------------------ | ||
|
||
As we mentioned above, one of the benefits of handling of the checkpoint | ||
conversion is that users can directly work with standard formats. This helps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😃
conversion is that users can directly work with standard formats. This helps | |
conversion is that you can directly work with standard formats. This helps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually personally I'm 50/50 on these whole first two sentences. It's a nice sentiment but I feel you show the benefits below, no need to spell it out here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly use it as a flow thing - otherwise it's too much of an abrupt change?
docs/source/examples/e2e_flow.rst
Outdated
sd_1 = torch.load('/tmp/Llama-2-7b-hf/hf_model_0001_0.pt', mmap=True, map_locations='cpu') | ||
sd_2 = torch.load('/tmp/Llama-2-7b-hf/hf_model_0002_0.pt', mmap=True, map_location='cpu') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the folder here be <checkpoint_dir>? (I might just be misunderstanding)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope you're right! good catch
Context
Adding an e2e tutorial which captures the complete flow within TorchTune. This tutorial is a good overview of the capabilities of the library and so adding this to the Getting Started section instead of burying it in the tutorials section.
Changelog
Test plan
Tutorial Rendered
Updates to the entry page rendered