HyDE: Hypothetical Document Embeddings
Generate a fake 'ideal' answer with the LLM, embed it, and search with that — beats short-query embeddings.
The Problem with Query Embeddings
User queries are usually short and look nothing like the documents you want to retrieve.
- Query: "fix my keyboard"
- Doc: "If your mechanical keyboard's keys are sticking, you can clean the switches with isopropyl alcohol..."
The vectors are semantically distant; retrieval misses the doc.
HyDE Idea
HyDE = Hypothetical Document Embeddings (Gao et al. 2022).
- Ask the LLM to write a fake "ideal answer" to the query
- Embed the fake answer (not the query)
- Use that vector to search the real corpus
All lessons in this course
- Re-ranking with Cross-Encoders
- HyDE: Hypothetical Document Embeddings
- Multi-Vector Retrieval (ColBERT)
- RAG Evaluation (RAGAS, Recall@K)