AI Engineering Academy · Lesson

Validating and Retrying Bad Outputs

Implement a validation layer that checks extracted data against business rules, automatically retries with corrective feedback when validation fails, and logs failure patterns.

Why LLM Outputs Need Validation

Even with structured outputs and Pydantic schemas, LLM extraction can produce outputs that are syntactically valid but semantically wrong. A confidence score of 1.5 (outside the 0-1 range), a price of -99.99, a date string that cannot be parsed, or a phone number with letters — all of these pass JSON parsing but fail your business rules.

Validation is a separate concern from extraction. Extraction asks: 'Did we get structured data?' Validation asks: 'Is the structured data correct and usable?' Both layers are necessary for a production-grade pipeline. Think of it as a two-stage filter: the LLM extracts, your validator accepts or rejects.

Layers of Validation

A robust output validation system operates at multiple levels:

Schema validation (Pydantic): correct field types, required fields present, enums match allowed values — handled automatically by structured outputs
Format validation: phone numbers match a regex, emails are valid, dates are parseable, amounts are within realistic ranges
Business logic validation: invoice total equals sum of line items, end date is after start date, quantity is a positive integer
Cross-field validation: a field's value depends on another field's value (e.g., discount percent cannot exceed 100)
Semantic validation: extracted company name matches a known company in your database

All lessons in this course

← Back to AI Engineering Academy