0PricingLogin
CUDA Academy · Lesson

The WMMA Fragment API

Load, mma_sync, and store fragments.

The WMMA API

To program Tensor Cores directly you use WMMA, the warp matrix multiply-accumulate API in the nvcuda::wmma namespace. 🧩

A Whole Warp Cooperates

WMMA is warp-wide: all 32 threads in a warp work together on one tile. You think in tiles, not in single threads.

All lessons in this course

  1. What Tensor Cores Compute
  2. Mixed Precision: FP16, BF16, TF32
  3. The WMMA Fragment API
  4. Numerical Stability Tradeoffs
← Back to CUDA Academy