0Pricing
AI Prompt Engineering · Lesson

OCR and Document Analysis Prompts

Extracting text, tables, and structure from document images.

LLMs as Document Readers

OCR (Optical Character Recognition) traditionally required specialized software to extract text from images. Vision LLMs can now read text from images and also understand the content — not just extract characters, but parse structure, tables, handwriting, and context.

Common document analysis tasks:

  • Extracting text from scanned documents
  • Reading receipts, invoices, and forms
  • Parsing tables and charts
  • Transcribing handwritten notes
  • Reading printed labels and signs

Basic Text Extraction Prompt

For simple text extraction from a document image:

import anthropic, base64

client = anthropic.Anthropic(api_key='YOUR_API_KEY')

def extract_text(image_path, extraction_prompt):
    with open(image_path, 'rb') as f:
        img_b64 = base64.standard_b64encode(f.read()).decode('utf-8')

    r = client.messages.create(
        model='claude-opus-4-5', max_tokens=1000,
        messages=[{'role': 'user', 'content': [
            {'type': 'image', 'source': {'type': 'base64', 'media_type': 'image/jpeg', 'data': img_b64}},
            {'type': 'text', 'text': extraction_prompt}
        ]}]
    )
    return r.content[0].text

# Basic extraction
basic_prompt = 'Extract all text from this document image exactly as it appears. Preserve line breaks.'

# Structure-preserving extraction
structured_prompt = 'Extract all text from this document image. Preserve: paragraph structure, line breaks, and any visible formatting. Do not add any text not present in the image.'

print('Text extraction functions defined.')

All lessons in this course

  1. Image Description and Captioning Prompts
  2. Visual Question Answering
  3. Multi-Image Comparison Prompts
  4. OCR and Document Analysis Prompts
← Back to AI Prompt Engineering