AI Prompt Engineering · Lesson

OCR and Document Analysis Prompts

Extracting text, tables, and structure from document images.

LLMs as Document Readers

OCR (Optical Character Recognition) traditionally required specialized software to extract text from images. Vision LLMs can now read text from images and also understand the content — not just extract characters, but parse structure, tables, handwriting, and context.

Common document analysis tasks:

Extracting text from scanned documents
Reading receipts, invoices, and forms
Parsing tables and charts
Transcribing handwritten notes
Reading printed labels and signs

Basic Text Extraction Prompt

For simple text extraction from a document image:

import anthropic, base64

client = anthropic.Anthropic(api_key='YOUR_API_KEY')

def extract_text(image_path, extraction_prompt):
    with open(image_path, 'rb') as f:
        img_b64 = base64.standard_b64encode(f.read()).decode('utf-8')

    r = client.messages.create(
        model='claude-opus-4-5', max_tokens=1000,
        messages=[{'role': 'user', 'content': [
            {'type': 'image', 'source': {'type': 'base64', 'media_type': 'image/jpeg', 'data': img_b64}},
            {'type': 'text', 'text': extraction_prompt}
        ]}]
    )
    return r.content[0].text

# Basic extraction
basic_prompt = 'Extract all text from this document image exactly as it appears. Preserve line breaks.'

# Structure-preserving extraction
structured_prompt = 'Extract all text from this document image. Preserve: paragraph structure, line breaks, and any visible formatting. Do not add any text not present in the image.'

print('Text extraction functions defined.')

All lessons in this course

Image Description and Captioning Prompts
Visual Question Answering
Multi-Image Comparison Prompts
OCR and Document Analysis Prompts

← Back to AI Prompt Engineering