The RAG Architecture: Indexing and Retrieval
Map out the two phases of RAG: the offline indexing phase that chunks, embeds, and stores documents, and the online retrieval phase that finds relevant context for each query.
RAG Has Two Distinct Phases
A RAG system operates in two fundamentally different phases that run at different times. The offline indexing phase processes your documents once (or when they change) and prepares a searchable index. The online retrieval phase runs in real time for every user query. Understanding this split is essential for designing systems that are both fast at query time and maintainable over time.
The Indexing Phase: Step One — Load
Indexing starts with document loading: reading raw files from your source systems. Documents can be PDFs, Word files, HTML pages, Markdown files, database rows, or any text source. Each document is loaded into memory as plain text, preserving structure where possible. Libraries like pypdf, python-docx, and unstructured handle the heavy lifting of format-specific parsing.
from pypdf import PdfReader
def load_pdf(path):
reader = PdfReader(path)
pages = []
for i, page in enumerate(reader.pages):
text = page.extract_text()
pages.append({'text': text, 'page': i + 1, 'source': path})
return pages
docs = load_pdf('company_policy.pdf')
print(f'Loaded {len(docs)} pages')All lessons in this course
- The Problem RAG Solves
- The RAG Architecture: Indexing and Retrieval
- Crafting the Augmented Prompt
- RAG vs Fine-Tuning: When to Use Which