0Pricing
Deep Learning Academy · Lesson

Launch Jobs with torchrun

Spawn and coordinate worker processes.

Who Starts the Processes

DDP needs one process per GPU, but who spawns them? You do not launch them by hand. torchrun is the tool that does it for you.

Meet torchrun

torchrun is PyTorch's launcher. You give it your script and how many processes to start, and it handles the coordination.

torchrun --nproc_per_node=4 train.py

All lessons in this course

  1. Data vs Model Parallelism
  2. DistributedDataParallel Basics
  3. Sync Batch Norm & Sharded State
  4. Launch Jobs with torchrun
← Back to Deep Learning Academy