0Pricing
MLOps Academy · Lesson

Run Training as a Kubernetes Job

Execute batch training to completion on the cluster.

Training Is Not a Server

A model server runs forever, but a training run should start, finish, and stop. Kubernetes has a different object built for that: the Job. 🏁

A Job Runs to Completion

A Job creates one or more Pods and watches them until they exit successfully. Once training succeeds, the Job is done and frees its resources.

apiVersion: batch/v1
kind: Job
metadata:
  name: train-ranker
spec:
  template:
    spec:
      restartPolicy: Never

All lessons in this course

  1. Pods, Deployments, and Services for Models
  2. Request CPU, Memory, and GPU
  3. Configure with ConfigMaps and Secrets
  4. Run Training as a Kubernetes Job
← Back to MLOps Academy