MLOps Academy · Lesson

Real-Time Online Inference

Serve low-latency predictions per request.

What Online Inference Is

Online inference answers one request at a time, the instant it arrives, so a user or service gets a fresh prediction right away. ⚡

Unlike batch, here someone waits on the other end. The prediction must come back in milliseconds, not minutes, to feel responsive.