Retrieval-Augmented Generation Pipelines
Assemble RAG flows that ground model responses in retrieved document context.
Why RAG?
Retrieval-Augmented Generation (RAG) grounds an LLM's answers in your own documents instead of relying solely on what the model memorized during training.
- Fresh & private data — answer questions about internal docs the model never saw.
- Less hallucination — the model cites retrieved context rather than inventing facts.
- Cheaper than fine-tuning — you update a vector store, not model weights.
A Spring AI RAG pipeline has two phases: an ingestion phase (read → split → embed → store) and a query phase (embed question → retrieve → augment prompt → generate).
The Pipeline at a Glance
Spring AI gives you composable building blocks for both phases. The core types you will assemble are:
DocumentReader— loads raw sources (PDF, Markdown, JSON, web pages).DocumentTransformer— splits documents into chunks (e.g.TokenTextSplitter).EmbeddingModel— turns text into vectors.VectorStore— stores and similarity-searches those vectors.ChatClientwith a RAG advisor — wires retrieval into the prompt automatically.
The first three feed ingestion; the last two power querying.
All lessons in this course
- ChatClient, Prompts, and Structured Output
- Embeddings and Vector Store Retrieval
- Retrieval-Augmented Generation Pipelines
- Tool Calling and Agent Advisors