OCR and Document Analysis Prompts
Extracting text, tables, and structure from document images.
LLMs as Document Readers
OCR (Optical Character Recognition) traditionally required specialized software to extract text from images. Vision LLMs can now read text from images and also understand the content — not just extract characters, but parse structure, tables, handwriting, and context.
Common document analysis tasks:
- Extracting text from scanned documents
- Reading receipts, invoices, and forms
- Parsing tables and charts
- Transcribing handwritten notes
- Reading printed labels and signs
Basic Text Extraction Prompt
For simple text extraction from a document image:
import anthropic, base64
client = anthropic.Anthropic(api_key='YOUR_API_KEY')
def extract_text(image_path, extraction_prompt):
with open(image_path, 'rb') as f:
img_b64 = base64.standard_b64encode(f.read()).decode('utf-8')
r = client.messages.create(
model='claude-opus-4-5', max_tokens=1000,
messages=[{'role': 'user', 'content': [
{'type': 'image', 'source': {'type': 'base64', 'media_type': 'image/jpeg', 'data': img_b64}},
{'type': 'text', 'text': extraction_prompt}
]}]
)
return r.content[0].text
# Basic extraction
basic_prompt = 'Extract all text from this document image exactly as it appears. Preserve line breaks.'
# Structure-preserving extraction
structured_prompt = 'Extract all text from this document image. Preserve: paragraph structure, line breaks, and any visible formatting. Do not add any text not present in the image.'
print('Text extraction functions defined.')All lessons in this course
- Image Description and Captioning Prompts
- Visual Question Answering
- Multi-Image Comparison Prompts
- OCR and Document Analysis Prompts