Designing the Production Architecture
Choose a capstone project, sketch the full architecture including RAG pipeline, agent layer, caching, observability, and API, then document the design decisions and trade-offs.
Choosing a Capstone Project
The capstone project ties together everything from the track: RAG, agents, streaming, caching, observability, and security. A good capstone project is meaningfully complex — requiring at least three distinct AI components — yet scoped small enough to ship in days rather than months. Classic examples: a production-grade document Q&A assistant, an autonomous research agent with human-in-the-loop oversight, or an enterprise data extraction pipeline.
Identifying System Components
Start by listing the distinct components your system needs. A production AI engineering project typically includes: an ingestion layer (document loading, chunking, embedding, vector store indexing), a retrieval layer (hybrid search, re-ranking), an agent layer (function calling, tool execution), an API layer (FastAPI backend, streaming endpoints), and an observability layer (tracing, metrics, alerts). Sketch the data flow between them before writing code.
# Component inventory for a Document QA Assistant:
COMPONENTS = [
'document_ingestion', # PDF/Word -> chunks -> embeddings -> pgvector
'hybrid_retriever', # BM25 + dense + RRF
'reranker', # Cohere rerank
'qa_agent', # GPT-4o with RAG + function calling
'semantic_cache', # Redis + embedding similarity
'streaming_api', # FastAPI StreamingResponse
'tracing', # LangSmith or Langfuse
'prompt_injection_filter', # Input sanitization
'eval_pipeline', # Automated quality scoring
]All lessons in this course
- Designing the Production Architecture
- Implementing Core RAG and Agent Features
- Hardening: Security, Caching, and Reliability
- Evaluation, Deployment, and Retrospective