0PricingLogin
AI Engineering Academy · Lesson

Handling Partial and Missing Data

Design schemas with Optional fields and confidence scores, implement fallback extraction strategies for ambiguous documents, and log low-confidence extractions for human review.

The Reality of Incomplete Documents

Real-world documents rarely contain every field your schema expects. An invoice might be missing a PO number, a resume might omit dates, and a news article might not mention a location. Designing your extraction schema to handle partial and missing data gracefully is as important as extracting what is present.

Optional Fields in Pydantic

Mark fields that might not appear in every document as Optional[type] and give them a None default. Pydantic v2 treats these fields as nullable, and the model is instructed not to hallucinate values when information is absent. Always prefer None over an empty string for missing data — it is easier to filter downstream.

from pydantic import BaseModel, Field
from typing import Optional

class JobPosting(BaseModel):
    title: str
    company: str
    salary_min: Optional[float] = Field(None, description='Minimum salary if stated')
    salary_max: Optional[float] = Field(None, description='Maximum salary if stated')
    remote: Optional[bool] = Field(None, description='True if remote, False if on-site, None if unspecified')

All lessons in this course

  1. Instructor: Typed Extraction with Pydantic
  2. Handling Partial and Missing Data
  3. Batch Processing with Async and Queues
  4. Schema Evolution and Backward Compatibility
← Back to AI Engineering Academy