Deep Learning Academy · Lesson

Data vs Model Parallelism

Two ways to split the work.

Why Scale Out at All

One GPU is fine until your model or dataset gets big. Distributed training spreads the work across many GPUs so you finish hours faster.

There are two core strategies: split the data across GPUs, or split the model itself. Each solves a different bottleneck.