MLOps Academy · Lesson

Real-Time Online Inference

Serve low-latency predictions per request.

What Online Inference Is

Online inference answers one request at a time, the instant it arrives, so a user or service gets a fresh prediction right away. ⚡

A User Is Waiting

Unlike batch, here someone waits on the other end. The prediction must come back in milliseconds, not minutes, to feel responsive.

All lessons in this course

  1. Batch Scoring on a Schedule
  2. Real-Time Online Inference
  3. Latency, Throughput, and Cost Trade-offs
  4. Precompute and Cache Predictions
← Back to MLOps Academy