Update lr_schedules.py #4563

CoinCheung · 2023-10-25T07:47:16Z

add cosine annealing scheduler

this scheduler is widely used in image classification task, and many llm (e.g. llama) use this also.

add cosine annealing scheduler

CoinCheung · 2023-10-25T07:48:57Z

@microsoft-github-policy-service agree

tjruwase · 2023-10-26T01:02:17Z

@CoinCheung, thanks for the PR. A few items to address.

To fix formatting issues use this guide.

Please add unit test: example

Inspect failing CI tests.

CoinCheung · 2023-10-27T01:04:39Z

@tjruwase @wjessup @dfyz @manuelciosici I have no experience with pr for deepspeed, what is the status of this now? Is there any further operation that needs me to work on?

tjruwase · 2023-10-27T01:10:44Z

@CoinCheung, thanks for making the changes. We will review and merge once the CI passes.

CoinCheung · 2023-10-27T01:29:29Z

Hi @tjruwase ,

I have made some fixes, would you please help me launch CI test one more time?

CoinCheung · 2023-10-27T06:17:50Z

@tjruwase Would you please launch CI one more time ?

CoinCheung · 2023-10-31T01:08:08Z

Hi @jeffra @mrwyattii I think the problem is not with my fix, it is a inference error, but my fix is about training learning rate scheduler. Can this fix be merged ? Or is there other things that need me to commit?

tjruwase · 2023-10-31T16:24:02Z

@CoinCheung, sorry for the delay. It seems the issue is with our CI system. Please bear with us while we resolve the problem.

CoinCheung · 2023-11-06T01:13:13Z

Hi @tjruwase , What is the status of this thread?

tjruwase · 2023-11-07T14:38:17Z

@CoinCheung, I have restarted CI. Let's see how it goes.

CoinCheung · 2023-11-08T01:15:32Z

Hi @tjruwase ,

Is this associated with my changes?

tjruwase · 2023-11-08T01:23:59Z

@CoinCheung, no I don't think it is related to your changes.

kmn1024 · 2023-11-16T14:36:51Z

Should WarmupCosineLR inherit from WarmupLR?
https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/lr_schedules.py#L774

tjruwase · 2023-11-16T19:29:30Z

Should WarmupCosineLR inherit from WarmupLR? https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/runtime/lr_schedules.py#L774

Yes, you are correct. It should.

@CoinCheung, are you able to refactor your changes? Thanks!

CoinCheung · 2023-11-17T01:45:26Z

Hi @tjruwase @kmn1024 , I do not think WarmupCosineLR can be interited from this WarmupLR in this case.

SInce they use different methods to determine the learning rates. For WarmupCosineLR, I use "ratio of original lr values", which I think should be more scientific, while WarmupLR uses specific lr values.

For example, when using WarmupLR, by setting warmup_min_lr=1e-5, warmup_num_steps=100, the scheduler will set lr from 1e-5 to max lr within 100 steps.
When using WarmupCosineLR, by setting warmup_min_ratio=0.1, warmup_num_steps=100, and we assume lr=1e-3 when we defined the optimizer, the scheduler would set lr from 0,1 * lr = 1e-4 to max lr within 100 steps. We do not set specific lr values to a scheduler.

The reason why I feel using ratio is better: we do not need to set specific lr values everywhere in both optimizer and scheduler. When we define an optimizer, we need to consider the learning rate. When we define a scheduler, what we only need to do is to determine the shape of the learning rate curves, rather than its specific values. When you want to keep shape of lr curve and only tune peak lr, you only need to change one place. This follows the principle of "each module only does its own work, and their settings are not impacted by each other".

From my experience of tuning models, this method is less likely to cause mistakes that I change optimizer lr but forgot to change scheduler.

Also in some paper, if I recally correctly, they claimed that they use CosineLR to train their model, and the learning rate anneals from max_ lr to 0.1 * max_lr. I think many other people accept this method of tuning learning rates.

tjruwase · 2023-11-17T02:37:18Z

@CoinCheung, thanks for your response. I agree with the differences that you identify between WarmupLR and WarmupCosineLR, but these differences are to me simply in the implementation and logic. At the high-level they are similar because of they provide two phases of lr changes: (1) initial phase of warmup/increase, and (2) final phase of no change or decay. Looking more closely we observe significant similarity or duplication in many of the methods including step, state_dict, load_state_dict, get_last_lr, _format_param. These similarities suggest to me opportunities for to code refactor and reuse.

CoinCheung · 2023-11-17T04:40:30Z

@tjruwase Would it be acceptable if we change args (init args used for define the scheduler object) of WarmupLR ? It has only one sub-class WarmupDecayLR, and I think its usage frequency is not very high.

add cosine annealing scheduler this scheduler is widely used in image classification task, and many llm (e.g. llama) use this also. --------- Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]>

Update lr_schedules.py

c9ebb1e

add cosine annealing scheduler

CoinCheung requested review from jeffra and tjruwase as code owners October 25, 2023 07:47

fix format, and add unit test

6a6872d

CoinCheung requested a review from mrwyattii as a code owner October 26, 2023 03:41

CoinCheung added 2 commits October 26, 2023 11:41

Merge branch 'master' into master

2d79bf7

Merge branch 'master' into master

c366bee

fix syntax for test script

38bce82

fix test instance

a689fa0

tjruwase approved these changes Oct 27, 2023

View reviewed changes

Merge branch 'master' into master

df00b14

Merge branch 'master' into master

2edbb21

tjruwase added this pull request to the merge queue Oct 31, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 31, 2023

CoinCheung added 3 commits November 2, 2023 18:52

Merge branch 'master' into master

de1bb93

Merge branch 'master' into master

1874856

Merge branch 'master' into master

feb8fc3

Merge branch 'master' into master

066d4f6

Merge branch 'master' into master

6243f16

Merge branch 'master' into master

4826f4a

CoinCheung requested a review from tjruwase November 10, 2023 01:11

tjruwase approved these changes Nov 10, 2023

View reviewed changes

tjruwase added this pull request to the merge queue Nov 10, 2023

Merged via the queue into microsoft:master with commit 4388a60 Nov 10, 2023
15 checks passed

kmn1024 added a commit to kmn1024/axolotl that referenced this pull request Nov 16, 2023

min_lr via deepspeed config (microsoft/DeepSpeed#4563)

df0726a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update lr_schedules.py #4563

Update lr_schedules.py #4563

CoinCheung commented Oct 25, 2023

CoinCheung commented Oct 25, 2023

tjruwase commented Oct 26, 2023

CoinCheung commented Oct 27, 2023

tjruwase commented Oct 27, 2023

CoinCheung commented Oct 27, 2023

CoinCheung commented Oct 27, 2023

CoinCheung commented Oct 31, 2023

tjruwase commented Oct 31, 2023

CoinCheung commented Nov 6, 2023

tjruwase commented Nov 7, 2023

CoinCheung commented Nov 8, 2023

tjruwase commented Nov 8, 2023

kmn1024 commented Nov 16, 2023

tjruwase commented Nov 16, 2023

CoinCheung commented Nov 17, 2023 •

edited

Loading

tjruwase commented Nov 17, 2023 •

edited

Loading

CoinCheung commented Nov 17, 2023 •

edited

Loading

Update lr_schedules.py #4563

Update lr_schedules.py #4563

Conversation

CoinCheung commented Oct 25, 2023

CoinCheung commented Oct 25, 2023

tjruwase commented Oct 26, 2023

CoinCheung commented Oct 27, 2023

tjruwase commented Oct 27, 2023

CoinCheung commented Oct 27, 2023

CoinCheung commented Oct 27, 2023

CoinCheung commented Oct 31, 2023

tjruwase commented Oct 31, 2023

CoinCheung commented Nov 6, 2023

tjruwase commented Nov 7, 2023

CoinCheung commented Nov 8, 2023

tjruwase commented Nov 8, 2023

kmn1024 commented Nov 16, 2023

tjruwase commented Nov 16, 2023

CoinCheung commented Nov 17, 2023 • edited Loading

tjruwase commented Nov 17, 2023 • edited Loading

CoinCheung commented Nov 17, 2023 • edited Loading

CoinCheung commented Nov 17, 2023 •

edited

Loading

tjruwase commented Nov 17, 2023 •

edited

Loading

CoinCheung commented Nov 17, 2023 •

edited

Loading