Mixed Precision: FP16, BF16, TF32
Trading precision for throughput.
Trading Bits for Speed
Tensor Cores get their speed from mixed precision: feeding smaller number formats in so the hardware can push far more math per cycle. ⚡
What FP32 Costs
Standard FP32 uses 32 bits per value. It is accurate but heavy, so moving and multiplying many FP32 numbers eats bandwidth and time.
All lessons in this course
- What Tensor Cores Compute
- Mixed Precision: FP16, BF16, TF32
- The WMMA Fragment API
- Numerical Stability Tradeoffs