Avoiding Bank Conflicts
Why padding can speed up shared access.
Shared Memory Has Banks
Shared memory is split into 32 banks, one for each thread in a warp. Spreading accesses across them lets all 32 threads read at full speed.
How Addresses Map
Consecutive 4-byte words land in consecutive banks, wrapping around after 32. So word 0 is bank 0, word 32 is bank 0 again.
All lessons in this course
- Declaring __shared__ Arrays
- Synchronizing with __syncthreads
- Avoiding Bank Conflicts
- Dynamic Shared Memory