Mixed Precision Training with AMP
torch.cuda.amp.autocast(), GradScaler, FP16 vs BF16, memory savings, speed gains.
What is Mixed Precision
Mixed precision training runs most operations in 16-bit floating point (FP16) instead of 32-bit (FP32). FP16 uses half the memory and runs faster on modern GPU tensor cores, while a few sensitive operations stay in FP32 for stability.
The Benefits
Mixed precision gives you:
- ~2x memory savings, enabling larger batches or models
- Faster matrix multiplications on tensor cores
- Little to no loss in final model accuracy when done correctly
All lessons in this course
- Multi-GPU Training with DataParallel
- DistributedDataParallel (DDP)
- Mixed Precision Training with AMP
- Efficient Training with Hugging Face Accelerate