6
Vector search (dense) is great for concepts, but terrible for exact names or IDs.
- Hybrid Search: The industry standard. Combine vector search (for meaning) with keyword search (BM25, for exact terms) to get the best of both worlds.
- Reranking: Retrieve 50 documents quickly, then use a slower, highly accurate Cross-Encoder model to re-sort them and pick the top 5.
7
Users write terrible search queries. Don't search using what they typed.
- HyDE (Hypothetical Document Embeddings): Ask the LLM to write a fake answer to the user's question, then search the vector database using that fake answer. It works shockingly well.
- Query Expansion: Have the LLM rewrite the user's query into 3-4 different variations, search for all of them, and combine the results.
8
Once you have the documents, you must force the LLM to stick to the script.
- Lost in the Middle: LLMs pay attention to the beginning and end of a prompt, but ignore the middle. Put your most important retrieved documents at the very top or very bottom.
- Strict Citations: Force the model to append `[Doc 1]` citations to every claim it makes, ensuring traceability.
The Bottom Line: Naive RAG (just embedding the user's query) fails in production. Advanced retrieval requires query rewriting, hybrid search, and reranking to find the right context.