CUDA Academy · Lesson

Warps, Lanes, and Masks

The 32-thread unit and active masks.

Threads Run in Warps

The GPU does not schedule threads one by one. It groups them into a warp of 32 threads that march together, executing the same instruction in lockstep.

Why 32 Matters

On NVIDIA hardware a warp is always 32 threads. It is the real unit of execution, so good kernels think in groups of 32, not single threads.

All lessons in this course

Warps, Lanes, and Masks
__shfl_down_sync for Reductions
Ballot and Vote Functions
Cooperative Groups

← Back to CUDA Academy