AI Agents · Lesson

Chunking Strategies (Fixed, Sentence, Semantic)

Trade-offs between fixed-size, sentence-aware, and semantic chunking for retrieval quality.

Why Chunk?

You cannot embed an entire 200-page PDF as one vector — the vector would be too abstract to match specific queries. You chunk the document into smaller pieces (each ~200-800 tokens), embed each, and search at chunk level.

Fixed-Size Chunking

Simplest approach — split every N characters or tokens:

def fixed_chunks(text, chunk_size=1000, overlap=200):
    chunks = []
    i = 0
    while i < len(text):
        chunks.append(text[i:i + chunk_size])
        i += chunk_size - overlap
    return chunks

All lessons in this course

What RAG Solves (Knowledge Cut-off, Hallucinations)
Chunking Strategies (Fixed, Sentence, Semantic)
Indexing a Document Set
Building a Naive RAG with FAISS or Chroma

← Back to AI Agents