By Team & Stage
Solo developer / prototype:
LangChain or LlamaIndex + Chroma + OpenAI embeddings + GPT-4o. Deploy with FastAPI. Total cost: ~$0 to start.
Startup / small team:
LangChain + Pinecone or Qdrant Cloud + OpenAI embeddings + GPT-4o. LangSmith for observability. Ragas for evaluation.
Enterprise / regulated:
Cloud provider solution (Bedrock/Azure AI Search/Vertex) or LangChain + self-hosted Qdrant/Weaviate + open-source embeddings (BGE). Full audit trail.
Just want it to work:
Vectara or Cohere RAG. Upload docs, get API. Minimal engineering required.
The Recommended Stack (2024-2025)
For most teams building custom RAG:
# The "standard" RAG stack
Orchestration: LangChain + LangGraph
Embeddings: OpenAI text-embedding-3-small
(or BGE/Nomic for self-hosted)
Vector Store: Qdrant Cloud or Pinecone
(or pgvector if already on Postgres)
LLM: GPT-4o or Claude 3.5 Sonnet
Reranker: Cohere Rerank or BGE Reranker
Observability: LangSmith
Evaluation: Ragas
Ingestion: Unstructured (complex docs)
or LangChain loaders (simple docs)
Start simple, add complexity only when needed. Begin with the basic stack. Measure retrieval quality with Ragas. If retrieval is the bottleneck, add hybrid search and reranking. If document parsing is failing, add Unstructured or LlamaParse. If you need agents, add LangGraph. Every component you add is a component you maintain.