AI Prompt Engineering · Lesson

Beyond Naive RAG

Limitations of basic retrieval.

What Naive RAG Does

Naive RAG is the baseline: chunk documents, embed them, store vectors, embed the query, retrieve top-k by cosine similarity, stuff the chunks into the prompt, and generate. It is a strong starting point and fails in predictable ways at scale.

Understanding those failure modes is the prerequisite for the advanced techniques (re-ranking, compression, query rewriting) covered in this course.

def naive_rag(query, k=5):
    q = embed(query)
    chunks = vector_store.search(q, k)        # top-k by cosine
    context = '\n\n'.join(c.text for c in chunks)
    return llm('Context:\n' + context + '\n\nQ: ' + query)

Retrieval Recall vs Precision

Naive top-k optimizes raw vector similarity, which conflates relevance with surface semantic closeness. You face a tension: a small k risks missing the answer (low recall); a large k floods the context with distractors (low precision).

The embedding similarity that drives retrieval is a coarse proxy for true relevance, and it is the root of several downstream problems.

# The core dilemma
# small k -> may miss the gold chunk (recall problem)
# large k -> distractors crowd context (precision + cost problem)
# Advanced RAG decouples 'retrieve many' from 'use few'

All lessons in this course

← Back to AI Prompt Engineering