cudaMemcpyAsync in a Stream
Non-blocking transfers that overlap.
Blocking by Default
Plain cudaMemcpy stops your CPU until the copy finishes. Your host thread just waits, doing nothing useful while bytes move.
The Async Cousin
Meet cudaMemcpyAsync. It queues the copy and returns to your CPU immediately, so the host can keep working while the transfer happens.
All lessons in this course
- Why Pageable Memory Is Slow
- Pinned Memory with cudaMallocHost
- cudaMemcpyAsync in a Stream
- The Double-Buffering Pipeline