-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update docs #1602
update docs #1602
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1602
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d5d5986 with merge base 6b55c1d (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1602 +/- ##
===========================================
- Coverage 72.27% 26.81% -45.46%
===========================================
Files 290 290
Lines 14552 14570 +18
===========================================
- Hits 10517 3907 -6610
- Misses 4035 10663 +6628 ☔ View full report in Codecov by Sentry. |
|
||
mistral | ||
------- | ||
|
||
All models from `Mistral AI family <https://mistral.ai/technology/#models>`_. | ||
|
||
Request Access on `Hugging Face <https://huggingface.co/mistralai/Mistral-7B-v0.3>`__. | ||
Important: You need to request access on `Hugging Face <https://huggingface.co/mistralai/Mistral-7B-v0.1>`__ to download this model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did we land on this? Why not just use 0.2 if the arch is the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can run it and see if loss goes down, but thats not the most robust test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh np, fine to just keep it as is for now
.. autosummary:: | ||
:toctree: generated/ | ||
:nosignatures: | ||
.. .. autosummary:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these we will just expose later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense. Joe was against it in the past, and Philip suggested the same. While we are not ready for MM, it doesnt make much sense to expose it, IMO
@@ -5,11 +5,10 @@ LoRA Single Device Finetuning | |||
============================= | |||
|
|||
This recipe supports finetuning on next-token prediction tasks using parameter efficient fine-tuning techniques (PEFT) | |||
such as `LoRA <https://arxiv.org/abs/2106.09685>`_ and `QLoRA <https://arxiv.org/abs/2305.14314>`_. These techniques | |||
such as :ref:`glossary_lora` and :ref:`glossary_qlora`. These techniques |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not keep the links to the original papers here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we have pages for them, explaining the details, how to use it, the these pages have hte link for the papers.
|
||
For a deeper understanding of the different levers you can pull when using this recipe, | ||
see our documentation for the different PEFT training paradigms we support: | ||
|
||
* :ref:`glossary_lora` | ||
* :ref:`glossary_qlora` | ||
|
||
Many of our other memory optimization features can be used in this recipe, too: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit would at least keep this first sentence as a lead-in the the "You can learn more about all of our memory optimization features". Otherwise I agree that it's nice to drop the full list here though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a bunch of small comments, but no major concerns
Co-authored-by: ebsmothers <[email protected]>
using a variety of :ref:`memory optimization features <memory_optimization_overview_label>`. Our fine-tuning recipes support all of our models and all our dataset types. | ||
This includes continued pre-training, and various supervised funetuning paradigms, which can be customized through our datasets. Check out our | ||
:ref:`dataset tutorial <dataset_tutorial_label>` for more information. | ||
Our recipes include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh Felipe you minimalist
LGTM, some small nits pls address. |
Co-authored-by: Salman Mohammadi <[email protected]>
Co-authored-by: Salman Mohammadi <[email protected]>
Co-authored-by: Salman Mohammadi <[email protected]>
Co-authored-by: ebsmothers <[email protected]>
Co-authored-by: ebsmothers <[email protected]>
Co-authored-by: ebsmothers <[email protected]>
Co-authored-by: Felipe Mello <[email protected]> Co-authored-by: ebsmothers <[email protected]> Co-authored-by: Salman Mohammadi <[email protected]>
Context
What is the purpose of this PR? Is it to
Update:
Removed unnecessary paragraphs so its more lean
Test plan
Built the docs after the changes and double checked