CUDA Academy · Lesson

Avoiding Bank Conflicts

Why padding can speed up shared access.

Shared Memory Has Banks

Shared memory is split into 32 banks, one for each thread in a warp. Spreading accesses across them lets all 32 threads read at full speed.

Consecutive 4-byte words land in consecutive banks, wrapping around after 32. So word 0 is bank 0, word 32 is bank 0 again.