0PricingLogin
CUDA Academy · Lesson

Declaring __shared__ Arrays

Fast scratchpad memory per block.

A Scratchpad Per Block

Every block gets a tiny, fast pool of on-chip memory called shared memory. Think of it as a scratchpad the whole block can write to and read from together.

Why It Is So Fast

Shared memory sits right on the streaming multiprocessor, so it is roughly 100x faster than global memory. Use it to avoid hammering slow off-chip DRAM. 🚀

All lessons in this course

  1. Declaring __shared__ Arrays
  2. Synchronizing with __syncthreads
  3. Avoiding Bank Conflicts
  4. Dynamic Shared Memory
← Back to CUDA Academy