AI Prompt Engineering · Lesson

Visual Question Answering

Asking specific questions about image content, quantities, and attributes.

Visual Question Answering

Visual Question Answering (VQA) is the task of answering natural language questions about an image. Unlike image description (which describes everything), VQA focuses the model on answering a specific question.

VQA prompts are precise, direct, and often require counting, identifying, comparing, or reasoning about visual content. The quality of the prompt determines whether you get a precise, useful answer or a vague general response.

Basic VQA Prompt Structure

A VQA prompt pairs an image with a specific question. The key is making the question precise enough to produce a direct, usable answer:

import anthropic, base64

client = anthropic.Anthropic(api_key='YOUR_API_KEY')

def ask_about_image(image_path, question, answer_format='Direct answer. No extra explanation.'):
    with open(image_path, 'rb') as f:
        img_b64 = base64.standard_b64encode(f.read()).decode('utf-8')

    prompt = f'{question}\n\n{answer_format}'

    r = client.messages.create(
        model='claude-opus-4-5', max_tokens=150,
        messages=[{'role': 'user', 'content': [
            {'type': 'image', 'source': {'type': 'base64', 'media_type': 'image/jpeg', 'data': img_b64}},
            {'type': 'text', 'text': prompt}
        ]}]
    )
    return r.content[0].text

# Example VQA calls (replace image.jpg with actual image)
print('VQA function defined. Ready for image questions.')

All lessons in this course

Image Description and Captioning Prompts
Visual Question Answering
Multi-Image Comparison Prompts
OCR and Document Analysis Prompts

← Back to AI Prompt Engineering