AI Engineering Academy · Lesson

Crafting the Augmented Prompt

Learn to inject retrieved context into the LLM prompt effectively, structure citations, tell the model to refuse when the answer is not in the context, and prevent prompt leakage.

The Prompt Is Where RAG Happens

After retrieval, the magic of RAG happens in the augmented prompt. You have retrieved the top-K document chunks relevant to the user's query. Now you must inject them into the LLM's context in a way that the model can read, trust, and reason over effectively. A poorly structured prompt wastes the best retrieval results; a well-crafted one produces precise, grounded answers even with imperfect retrieval.

Basic Prompt Structure for RAG

A minimal RAG prompt has three parts: a system instruction telling the model to use only the provided context, a context block containing the retrieved chunks, and the user question. Separating these clearly reduces confusion about which text is the question versus the supporting evidence. Use explicit delimiters and labels so the model treats them as distinct sections.

def build_rag_prompt(user_question, retrieved_chunks):
    context_text = '\n\n'.join([
        f'[Document {i+1}]: {chunk["text"]}'
        for i, chunk in enumerate(retrieved_chunks)
    ])

    system_msg = (
        'You are a helpful assistant. Answer the user question '
        'using ONLY the information in the documents below. '
        'Do not use any outside knowledge.'
    )
    user_msg = f'Documents:\n{context_text}\n\nQuestion: {user_question}'
    return system_msg, user_msg

All lessons in this course

The Problem RAG Solves
The RAG Architecture: Indexing and Retrieval
Crafting the Augmented Prompt
RAG vs Fine-Tuning: When to Use Which

← Back to AI Engineering Academy