Self-Correction
Extract calculated_total and stated_total to catch drift.
The Trust Gap
When Claude extracts data from an invoice or report, the output can be perfectly valid JSON and still be wrong. A line item might be misread, a number transposed, or a subtotal silently drifted.
Schema validation catches structural problems: missing required fields, wrong types, bad enums. It does not catch a number that is well-formed but inconsistent with the rest of the document.
This lesson teaches self-correction: a technique where you extract enough information to let the system check its own arithmetic and catch drift before it reaches your database.
What 'Drift' Looks Like
Consider an invoice that lists line items and prints a total at the bottom. Two numbers are in play:
- stated_total — the total literally printed on the document
- calculated_total — the sum of the individual line items
On a clean document they match. But OCR noise, a misread digit, or a hallucinated line item makes them diverge. That divergence is drift.
The core idea of this lesson: if you only capture one of these two numbers, you can never detect the disagreement. Capture both, and the discrepancy becomes visible and machine-checkable.