Fusing Filters into One Kernel
Cutting launches and global traffic.
Why Fuse At All
Running five kernels means five launches and five round-trips to global memory. Fusing filters into one kernel cuts both, often giving a big speedup. 🚀
Launch Overhead Adds Up
Every kernel launch costs a few microseconds. On a tiny image that fixed launch overhead can dwarf the real work, so fewer launches means more useful time.
All lessons in this course
- Designing the Processing Pipeline
- Fusing Filters into One Kernel
- Streaming Tiles for Big Images
- Profile, Optimize, Ship