The WMMA Fragment API
Load, mma_sync, and store fragments.
The WMMA API
To program Tensor Cores directly you use WMMA, the warp matrix multiply-accumulate API in the nvcuda::wmma namespace. 🧩
A Whole Warp Cooperates
WMMA is warp-wide: all 32 threads in a warp work together on one tile. You think in tiles, not in single threads.