Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable Layer Dropout #1

Draft
wants to merge 4 commits into
base: paper_default
Choose a base branch
from

Conversation

mostafaelhoushi
Copy link
Collaborator

What does this PR do? Please describe:
It enables specifying a different dropout rate for each layer.
Basically you can set a layer drop probabilities using:

# all layers will have drop_p set to 0.2 (same as before)
>>> model.layers.drop_p = 0.2

# all layers will have drop_p set to 0.2
>>> model.layers.drop_p = get_values(scale_period=len(model.layers.drop_p), max_val=0.2) 

# every other layer to have dropout of 0.2 This implements the [LayerDrop](https://paperswithcode.com/paper/reducing-transformer-depth-on-demand-with-1) paper
>>> model.layers.drop_p = get_values(scale_period=len(model.layers.drop_p), max_val=0.2, slice_str="1::2")

# dropout increasing linearly from 0 to 0.2 across layers. This implements the [Progressive Layer Dropout](https://arxiv.org/abs/2010.13369) paper.
>>> model.layers.drop_p = get_values(scale_type="linear", scale_period=len(model.layers.drop_p), max_val=0.2)

# dropout increasing exponentially from 0 to 0.2 across layers. This implements the training recipe of [LayerSkip](https://arxiv.org/abs/2404.16710) paper.
>>> model.layers.drop_p = get_values(scale_type="exp", scale_period=len(model.layers.drop_p), max_val=0.2)

!closes facebookresearch#640

Does your PR introduce any breaking changes? If yes, please list them:
No. I ensured that setting a single value for drop_p works by overriding setters and getters.

Check list:

  • Was the content of this PR discussed and approved via a GitHub issue? (no need for typos or documentation improvements)
  • Did you read the contributor guideline?
  • Did you make sure that your PR does only one thing instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (no need for typos, documentation, or minor internal changes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant