Warps, Lanes, and Masks
The 32-thread unit and active masks.
Threads Run in Warps
The GPU does not schedule threads one by one. It groups them into a warp of 32 threads that march together, executing the same instruction in lockstep.
Why 32 Matters
On NVIDIA hardware a warp is always 32 threads. It is the real unit of execution, so good kernels think in groups of 32, not single threads.
All lessons in this course
- Warps, Lanes, and Masks
- __shfl_down_sync for Reductions
- Ballot and Vote Functions
- Cooperative Groups