Launch Jobs with torchrun
Spawn and coordinate worker processes.
Who Starts the Processes
DDP needs one process per GPU, but who spawns them? You do not launch them by hand. torchrun is the tool that does it for you.
Meet torchrun
torchrun is PyTorch's launcher. You give it your script and how many processes to start, and it handles the coordination.
torchrun --nproc_per_node=4 train.pyAll lessons in this course
- Data vs Model Parallelism
- DistributedDataParallel Basics
- Sync Batch Norm & Sharded State
- Launch Jobs with torchrun