__shfl_down_sync for Reductions
Warp reductions without barriers.
Shuffle Moves Registers
A shuffle lets one lane read another lane's register value directly. Data hops between threads with no shared memory and no barrier in between.
Meet shfl_down_sync
The workhorse for reductions is shfl_down_sync. Each lane grabs a value from a lane a fixed number of positions higher in the warp.
float v = __shfl_down_sync(mask, val, offset);All lessons in this course
- Warps, Lanes, and Masks
- __shfl_down_sync for Reductions
- Ballot and Vote Functions
- Cooperative Groups