Mixed Precision with autocast & GradScaler
Half-precision math for big speedups.
What Mixed Precision Means
By default PyTorch does math in 32-bit floats. Mixed precision runs many operations in 16-bit instead, which is faster and uses far less memory.
Why Half Precision Is Faster
Modern GPUs have special tensor cores tuned for 16-bit math. Feeding them half-precision data can speed up training two or three times with little accuracy loss.
All lessons in this course
- Mixed Precision with autocast & GradScaler
- Gradient Accumulation for Big Batches
- Profile the Bottleneck
- Cut GPU Memory Usage