0PricingLogin
CUDA Academy · Lesson

__shfl_down_sync for Reductions

Warp reductions without barriers.

Shuffle Moves Registers

A shuffle lets one lane read another lane's register value directly. Data hops between threads with no shared memory and no barrier in between.

Meet shfl_down_sync

The workhorse for reductions is shfl_down_sync. Each lane grabs a value from a lane a fixed number of positions higher in the warp.

float v = __shfl_down_sync(mask, val, offset);

All lessons in this course

  1. Warps, Lanes, and Masks
  2. __shfl_down_sync for Reductions
  3. Ballot and Vote Functions
  4. Cooperative Groups
← Back to CUDA Academy