Parent-Child and Small-to-Big Retrieval
Store small child chunks for precise retrieval but return their larger parent chunks to the LLM for richer context, balancing retrieval precision with generation quality.
The Precision vs Context Dilemma
RAG systems face a tension: small chunks are retrieved with high precision because each chunk is focused on one idea, but they lack the surrounding context the LLM needs to generate a complete answer. Large chunks provide rich context but reduce retrieval precision because they match many queries weakly instead of one query strongly. Parent-child chunking solves this dilemma.
The Parent-Child Architecture
In parent-child chunking, you create two layers of chunks from the same document. Child chunks are small (e.g., 1-3 sentences) and are embedded and indexed for retrieval. Parent chunks are larger sections (e.g., entire paragraphs or pages) that are stored separately. When a child is retrieved, you return its parent to the LLM instead.
All lessons in this course
- Why Naive Chunking Hurts Retrieval
- Semantic Chunking with Embedding Similarity
- Parent-Child and Small-to-Big Retrieval
- Document-Specific Strategies for Code and HTML