Skip to content

Commit

Permalink
Minor formatting fix on model_parallel docs (#16565)
Browse files Browse the repository at this point in the history
  • Loading branch information
tupini07 authored Jan 30, 2023
1 parent 8fc4fb1 commit d634846
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions docs/source-pytorch/advanced/model_parallel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ When Shouldn't I use an Optimized Distributed Strategy?
=======================================================

Sharding techniques help when model sizes are fairly large; roughly 500M+ parameters is where we've seen benefits. However, in the following cases, we recommend sticking to ordinary distributed strategies

* When your model is small (ResNet50 of around 80M Parameters), unless you are using unusually large batch sizes or inputs.
* Due to high distributed communication between devices, if running on a slow network/interconnect, the training might be much slower than expected and then it's up to you to determince the tradeoff here.

Expand Down

0 comments on commit d634846

Please sign in to comment.