0Pricing
CUDA Academy · Lesson

Overlapping Copy and Compute

Hiding transfers behind kernels.

The Big Idea

Real speedups come from doing two things at once: copying one chunk of data while the GPU computes on another. 🔀

Separate Hardware Engines

A GPU has distinct copy engines and compute units. Overlap works because these can run truly in parallel, not just take turns.

All lessons in this course

  1. The Default Stream Trap
  2. Creating and Using Streams
  3. Events for Timing and Sync
  4. Overlapping Copy and Compute
← Back to CUDA Academy