0Pricing
CUDA Academy · Lesson

Mixed Precision: FP16, BF16, TF32

Trading precision for throughput.

Trading Bits for Speed

Tensor Cores get their speed from mixed precision: feeding smaller number formats in so the hardware can push far more math per cycle. ⚡

What FP32 Costs

Standard FP32 uses 32 bits per value. It is accurate but heavy, so moving and multiplying many FP32 numbers eats bandwidth and time.

All lessons in this course

  1. What Tensor Cores Compute
  2. Mixed Precision: FP16, BF16, TF32
  3. The WMMA Fragment API
  4. Numerical Stability Tradeoffs
← Back to CUDA Academy