AI Agents · Lesson

HyDE: Hypothetical Document Embeddings

Generate a fake 'ideal' answer with the LLM, embed it, and search with that — beats short-query embeddings.

The Problem with Query Embeddings

User queries are usually short and look nothing like the documents you want to retrieve.

Query: "fix my keyboard"
Doc: "If your mechanical keyboard's keys are sticking, you can clean the switches with isopropyl alcohol..."

The vectors are semantically distant; retrieval misses the doc.

HyDE = Hypothetical Document Embeddings (Gao et al. 2022).