0PricingLogin
CUDA Academy · Lesson

Avoiding Bank Conflicts

Why padding can speed up shared access.

Shared Memory Has Banks

Shared memory is split into 32 banks, one for each thread in a warp. Spreading accesses across them lets all 32 threads read at full speed.

How Addresses Map

Consecutive 4-byte words land in consecutive banks, wrapping around after 32. So word 0 is bank 0, word 32 is bank 0 again.

All lessons in this course

  1. Declaring __shared__ Arrays
  2. Synchronizing with __syncthreads
  3. Avoiding Bank Conflicts
  4. Dynamic Shared Memory
← Back to CUDA Academy