CUDA Academy · Lesson

The WMMA Fragment API

Load, mma_sync, and store fragments.

The WMMA API

To program Tensor Cores directly you use WMMA, the warp matrix multiply-accumulate API in the nvcuda::wmma namespace. 🧩

WMMA is warp-wide: all 32 threads in a warp work together on one tile. You think in tiles, not in single threads.