dont save moe lb-loss tensors if args.moe_loss_weight=0 #119

michael-go · 2024-07-11T12:28:37Z

Megablocks accumulates lb-loss tensors here and expects the user to call clear_load_balancing_loss() to release the memory. In our case we compute the lb-loss outside of Megablocks and had a GPU memory leak before we noticed this behaviour.
We can call clear_load_balancing_loss() after every Megablocks forward(), but it's even better to just avoid accumulating these tensors if Megablocks' lb-loss calculation is not needed - which can already be signaled by passing 0 to Arguments.args.moe_loss_weight

it takes GPU memory, and can also cause a leak if clear_load_balancing_loss() is not called

mvpatel2000

LGTM!

dont save moe lb-loss tensors if args.moe_loss_weight=0

b8b5c1c

it takes GPU memory, and can also cause a leak if clear_load_balancing_loss() is not called

mvpatel2000 approved these changes Jul 11, 2024

View reviewed changes

mihir-db approved these changes Jul 11, 2024

View reviewed changes

mihir-db merged commit d2774b2 into databricks:main Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dont save moe lb-loss tensors if args.moe_loss_weight=0 #119

dont save moe lb-loss tensors if args.moe_loss_weight=0 #119

michael-go commented Jul 11, 2024 •

edited

Loading

mvpatel2000 left a comment

dont save moe lb-loss tensors if args.moe_loss_weight=0 #119

dont save moe lb-loss tensors if args.moe_loss_weight=0 #119

Conversation

michael-go commented Jul 11, 2024 • edited Loading

mvpatel2000 left a comment

Choose a reason for hiding this comment

michael-go commented Jul 11, 2024 •

edited

Loading