AI Engineering Academy · Lesson

Parent-Child and Small-to-Big Retrieval

Store small child chunks for precise retrieval but return their larger parent chunks to the LLM for richer context, balancing retrieval precision with generation quality.

The Precision vs Context Dilemma

RAG systems face a tension: small chunks are retrieved with high precision because each chunk is focused on one idea, but they lack the surrounding context the LLM needs to generate a complete answer. Large chunks provide rich context but reduce retrieval precision because they match many queries weakly instead of one query strongly. Parent-child chunking solves this dilemma.

The Parent-Child Architecture

In parent-child chunking, you create two layers of chunks from the same document. Child chunks are small (e.g., 1-3 sentences) and are embedded and indexed for retrieval. Parent chunks are larger sections (e.g., entire paragraphs or pages) that are stored separately. When a child is retrieved, you return its parent to the LLM instead.

All lessons in this course

Why Naive Chunking Hurts Retrieval
Semantic Chunking with Embedding Similarity
Parent-Child and Small-to-Big Retrieval
Document-Specific Strategies for Code and HTML

← Back to AI Engineering Academy