CUDA Academy · Lesson

Declaring shared Arrays

Fast scratchpad memory per block.

A Scratchpad Per Block

Every block gets a tiny, fast pool of on-chip memory called shared memory. Think of it as a scratchpad the whole block can write to and read from together.

Why It Is So Fast

Shared memory sits right on the streaming multiprocessor, so it is roughly 100x faster than global memory. Use it to avoid hammering slow off-chip DRAM. 🚀

All lessons in this course

Declaring __shared__ Arrays
Synchronizing with __syncthreads
Avoiding Bank Conflicts
Dynamic Shared Memory

← Back to CUDA Academy