0PricingLogin
CUDA Academy · Lesson

Warps, Lanes, and Masks

The 32-thread unit and active masks.

Threads Run in Warps

The GPU does not schedule threads one by one. It groups them into a warp of 32 threads that march together, executing the same instruction in lockstep.

Why 32 Matters

On NVIDIA hardware a warp is always 32 threads. It is the real unit of execution, so good kernels think in groups of 32, not single threads.

All lessons in this course

  1. Warps, Lanes, and Masks
  2. __shfl_down_sync for Reductions
  3. Ballot and Vote Functions
  4. Cooperative Groups
← Back to CUDA Academy