Image Description and Captioning Prompts
Directing model focus: objects, relationships, mood, and technical details.
Vision Models and Prompting
Vision language models (VLMs) like GPT-4o and Claude can process images alongside text. The prompt you send with an image dramatically affects the quality, focus, and format of the model's description.
Without a guiding prompt, the model decides what to describe — which may not match what you need. A structured description prompt tells the model exactly which elements to attend to and how to organize its output.
Sending an Image with a Prompt
The Anthropic API accepts images as base64-encoded content or URLs. Here is the basic structure:
import anthropic, base64
client = anthropic.Anthropic(api_key='YOUR_API_KEY')
with open('image.jpg', 'rb') as f:
image_data = base64.standard_b64encode(f.read()).decode('utf-8')
response = client.messages.create(
model='claude-opus-4-5',
max_tokens=500,
messages=[{
'role': 'user',
'content': [
{
'type': 'image',
'source': {
'type': 'base64',
'media_type': 'image/jpeg',
'data': image_data
}
},
{
'type': 'text',
'text': 'Describe this image in detail.'
}
]
}]
)
print(response.content[0].text)All lessons in this course
- Image Description and Captioning Prompts
- Visual Question Answering
- Multi-Image Comparison Prompts
- OCR and Document Analysis Prompts