AI Prompt Engineering · Lesson

Image Description and Captioning Prompts

Directing model focus: objects, relationships, mood, and technical details.

Vision Models and Prompting

Vision language models (VLMs) like GPT-4o and Claude can process images alongside text. The prompt you send with an image dramatically affects the quality, focus, and format of the model's description.

Without a guiding prompt, the model decides what to describe — which may not match what you need. A structured description prompt tells the model exactly which elements to attend to and how to organize its output.

Sending an Image with a Prompt

The Anthropic API accepts images as base64-encoded content or URLs. Here is the basic structure:

import anthropic, base64

client = anthropic.Anthropic(api_key='YOUR_API_KEY')

with open('image.jpg', 'rb') as f:
    image_data = base64.standard_b64encode(f.read()).decode('utf-8')

response = client.messages.create(
    model='claude-opus-4-5',
    max_tokens=500,
    messages=[{
        'role': 'user',
        'content': [
            {
                'type': 'image',
                'source': {
                    'type': 'base64',
                    'media_type': 'image/jpeg',
                    'data': image_data
                }
            },
            {
                'type': 'text',
                'text': 'Describe this image in detail.'
            }
        ]
    }]
)
print(response.content[0].text)

All lessons in this course

Image Description and Captioning Prompts
Visual Question Answering
Multi-Image Comparison Prompts
OCR and Document Analysis Prompts

← Back to AI Prompt Engineering