Sync Batch Norm & Sharded State
Keep stats and weights consistent.
The Small-Batch Problem
Batch norm computes stats from each GPU's local batch. Split across many GPUs, each per-device batch shrinks and those stats get noisy.
Enter SyncBatchNorm
SyncBatchNorm fixes this by computing mean and variance across all GPUs together, as if it saw the full global batch.
All lessons in this course
- Data vs Model Parallelism
- DistributedDataParallel Basics
- Sync Batch Norm & Sharded State
- Launch Jobs with torchrun