-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Paged Optimizers (Adam, Adamw), 8-bit optimizers, and new optimizers: LARS, LAMB and LION #3588
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
for more information, see https://pre-commit.ci
Unit Test Results 6 files ±0 6 suites ±0 1h 40m 56s ⏱️ + 16m 20s For more details on these failures, see this check. Results for commit 0158d03. ± Comparison against base commit 3d2ff0b. ♻️ This comment has been updated with latest results. |
arnavgarg1
changed the title
[WIP] Add support for Paged Optimizers (Adam, Adamw), 8-bit optimizers, and new optimizers: LARS, LAMB and LION
Add support for Paged Optimizers (Adam, Adamw), 8-bit optimizers, and new optimizers: LARS, LAMB and LION
Sep 6, 2023
tgaddair
reviewed
Sep 6, 2023
arnavgarg1
requested review from
w4nderlust,
justinxzhao,
jeffkinnison and
Infernaught
September 6, 2023 20:46
tgaddair
approved these changes
Sep 6, 2023
arnavgarg1
commented
Sep 6, 2023
arnavgarg1
commented
Sep 6, 2023
justinxzhao
approved these changes
Sep 7, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for a variety of new optimizers, including paged and 8-bit variants. All of these are useful for fine-tuning. I will add better descriptions for parameters in a follow-up PR when I better understand some of the underlying papers for LAMB, LARS and LION. I will also add tests once #3578 PR lands.
What are paged optimizers?
Paged Optimizers use NVIDIA unified memory 3 feature which does automatic page-to-page transfers between the CPU and GPU for error-free GPU processing in the scenario where the GPU occasionally runs out-of-memory. The feature works like regular memory paging between CPU RAM and the disk. This allocates paged memory for the optimizer states which are then automatically evicted to CPU RAM when the GPU runs out-of-memory and paged back into GPU memory when the memory is needed in the optimizer update step.
What new optimizers are being added to Ludwig?
Here's a summary of the new optimizers that will be added with this PR:
Here's an example of how you can set these different variants:
Regular AdamW
8-Bit AdamW
Paged AdamW
Paged AdamW 8-bit
All of this has been made possible through a deeper integration with the bitsandbytes library directly. Note that the 8-bit and paged optimizers only work on GPU machines, and they are not compatible with Deepspeed.