From d634846b5e4f309032b51b90c93ce6faf826ca15 Mon Sep 17 00:00:00 2001
From: Andrea Tupini <tupini07@gmail.com>
Date: Mon, 30 Jan 2023 11:40:03 -0600
Subject: [PATCH] Minor formatting fix on model_parallel docs (#16565)

---
 docs/source-pytorch/advanced/model_parallel.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/source-pytorch/advanced/model_parallel.rst b/docs/source-pytorch/advanced/model_parallel.rst
index 789490e76118d..fd1610a232dda 100644
--- a/docs/source-pytorch/advanced/model_parallel.rst
+++ b/docs/source-pytorch/advanced/model_parallel.rst
@@ -48,6 +48,7 @@ When Shouldn't I use an Optimized Distributed Strategy?
 =======================================================
 
 Sharding techniques help when model sizes are fairly large; roughly 500M+ parameters is where we've seen benefits. However, in the following cases, we recommend sticking to ordinary distributed strategies
+
 * When your model is small (ResNet50 of around 80M Parameters), unless you are using unusually large batch sizes or inputs.
 * Due to high distributed communication between devices, if running on a slow network/interconnect, the training might be much slower than expected and then it's up to you to determince the tradeoff here.