Real-Time Online Inference
Serve low-latency predictions per request.
What Online Inference Is
Online inference answers one request at a time, the instant it arrives, so a user or service gets a fresh prediction right away. ⚡
A User Is Waiting
Unlike batch, here someone waits on the other end. The prediction must come back in milliseconds, not minutes, to feel responsive.
All lessons in this course
- Batch Scoring on a Schedule
- Real-Time Online Inference
- Latency, Throughput, and Cost Trade-offs
- Precompute and Cache Predictions