AI Engineering Academy · Lesson

The RAG Architecture: Indexing and Retrieval

Map out the two phases of RAG: the offline indexing phase that chunks, embeds, and stores documents, and the online retrieval phase that finds relevant context for each query.

RAG Has Two Distinct Phases

A RAG system operates in two fundamentally different phases that run at different times. The offline indexing phase processes your documents once (or when they change) and prepares a searchable index. The online retrieval phase runs in real time for every user query. Understanding this split is essential for designing systems that are both fast at query time and maintainable over time.

The Indexing Phase: Step One — Load

Indexing starts with document loading: reading raw files from your source systems. Documents can be PDFs, Word files, HTML pages, Markdown files, database rows, or any text source. Each document is loaded into memory as plain text, preserving structure where possible. Libraries like pypdf, python-docx, and unstructured handle the heavy lifting of format-specific parsing.

from pypdf import PdfReader

def load_pdf(path):
    reader = PdfReader(path)
    pages = []
    for i, page in enumerate(reader.pages):
        text = page.extract_text()
        pages.append({'text': text, 'page': i + 1, 'source': path})
    return pages

docs = load_pdf('company_policy.pdf')
print(f'Loaded {len(docs)} pages')

All lessons in this course

The Problem RAG Solves
The RAG Architecture: Indexing and Retrieval
Crafting the Augmented Prompt
RAG vs Fine-Tuning: When to Use Which

← Back to AI Engineering Academy