AI Engineering Academy · Lesson

Schema Evolution and Backward Compatibility

Manage breaking schema changes in long-running extraction pipelines by versioning schemas, migrating historical extractions, and running parallel validation during transitions.

Why Schemas Change Over Time

Extraction schemas are not static. Business requirements evolve, new document types appear, and you discover fields you should have captured from the start. Changing a schema in a live pipeline creates a backward compatibility problem: existing extracted records use the old schema, while new records use the new one. Managing this transition safely is what schema evolution is about.

Versioning Your Schemas

Assign a version number to each schema and store it alongside every extracted record. When you change the schema, increment the version. This lets you query records by schema version, run migrations on old records, and maintain separate validation logic for each version. A simple string field schema_version in every output model is sufficient.

from pydantic import BaseModel
from typing import Literal

class InvoiceV1(BaseModel):
    schema_version: Literal['1.0'] = '1.0'
    vendor: str
    total_amount: float

class InvoiceV2(BaseModel):
    schema_version: Literal['2.0'] = '2.0'
    vendor: str
    vendor_tax_id: str | None = None  # new field
    total_amount: float
    currency: str = 'USD'  # new field with default

All lessons in this course

Instructor: Typed Extraction with Pydantic
Handling Partial and Missing Data
Batch Processing with Async and Queues
Schema Evolution and Backward Compatibility

← Back to AI Engineering Academy