Ch 10 — RAG Solutions Landscape

Frameworks, platforms, and managed services for building RAG systems
High Level
code
Frameworks
arrow_forward
database
Vector DBs
arrow_forward
model_training
Embeddings
arrow_forward
cloud
Platforms
arrow_forward
deployed_code
Enterprise
arrow_forward
open_in_new
Open Source
arrow_forward
check_circle
Choose
-
Click play or press Space to begin the journey...
Step- / 7
code
Orchestration Frameworks
Libraries that wire together the RAG pipeline
What They Do
Orchestration frameworks provide the glue code between your LLM, vector store, embedding model, and retrieval logic. They offer pre-built chains, document loaders, text splitters, retrievers, and output parsers so you don't have to build everything from scratch.
LangChain
The most widely adopted RAG framework. Huge ecosystem of integrations (700+ components). LCEL for composable chains. LangGraph for agentic workflows.
PythonTypeScriptOpen Source
LlamaIndex
Purpose-built for RAG. Stronger data ingestion and indexing primitives. Excellent for structured data, multi-modal, and complex query engines.
PythonTypeScriptOpen Source
Haystack (deepset)
Pipeline-based framework with a visual pipeline editor. Strong focus on production readiness and evaluation. Used by enterprise teams.
PythonOpen Source
Semantic Kernel (Microsoft)
Microsoft's SDK for AI orchestration. Deep Azure integration. Supports C#, Python, Java. Built-in memory and planner abstractions.
C#PythonJava
When to Use a Framework
Use a framework when: You're building a standard RAG pipeline and want pre-built integrations for your LLM, vector store, and embedding model. Frameworks save weeks of boilerplate code.

Skip a framework when: You have a simple use case (single LLM call + one vector store) or need maximum control over every step. A thin wrapper around your LLM API and vector store client may be simpler.
LangChain vs LlamaIndex: LangChain is broader (agents, chains, tools, memory). LlamaIndex is deeper on data (ingestion, indexing, query engines). Many teams use both: LlamaIndex for data pipelines, LangChain/LangGraph for the application layer. They are complementary, not competing.
database
Vector Databases & Search
Where your embeddings live and get queried
Managed / Cloud-Native
Pinecone
Fully managed, serverless vector database. Zero infrastructure to manage. Fast hybrid search (dense + sparse). Widely used in production.
ManagedServerless
Weaviate
Open-source vector database with a managed cloud option. Built-in vectorization modules. GraphQL API. Hybrid search with BM25.
Open SourceCloud
Qdrant
High-performance open-source vector database written in Rust. Rich filtering, payload indexing, quantization. Qdrant Cloud for managed hosting.
Open SourceRust
Milvus / Zilliz
Open-source vector database for massive scale (billions of vectors). GPU-accelerated indexing. Zilliz Cloud is the managed version.
Open SourceCloud
Embedded / Add-on
Chroma
Lightweight, in-process vector store. Great for prototyping and small datasets. Python-native. Can run as a server too.
Open SourceEmbedded
pgvector (PostgreSQL)
Vector search as a PostgreSQL extension. Keep vectors alongside your relational data. No separate infrastructure needed.
Open SourceSQL
FAISS (Meta)
In-memory similarity search library. Extremely fast. No persistence or API built-in. Best for batch processing and research.
LibraryC++/Python
LanceDB
Serverless, embedded vector database built on Lance columnar format. Multi-modal support. Zero-copy access. Good for local development.
Open SourceEmbedded
Decision guide: Prototyping → Chroma or LanceDB. Already using PostgreSQL → pgvector. Production with managed infra → Pinecone or Qdrant Cloud. Massive scale (1B+ vectors) → Milvus/Zilliz. Self-hosted control → Qdrant or Weaviate.
model_training
Embedding Models & Providers
Converting text to vectors for semantic search
Commercial APIs
OpenAI text-embedding-3
Two variants: small (1536d, cheap) and large (3072d, best quality). Matryoshka support for dimension reduction. Most widely used.
APIMatryoshka
Cohere Embed v3
Strong multilingual support (100+ languages). Input type parameter (search_document vs search_query). Compression to int8/binary.
APIMultilingual
Voyage AI
Specialized embedding models for code (voyage-code-3) and legal/finance domains. High MTEB scores. Acquired by Anthropic in 2024.
APIDomain-Specific
Google Vertex AI
text-embedding-005 model. 768 dimensions. Task-type parameter. Integrated with Google Cloud ecosystem and Gemini.
APIGoogle Cloud
Open-Source / Self-Hosted
BGE (BAAI)
Beijing Academy of AI. BGE-large-en-v1.5 and BGE-M3 (multilingual, multi-granularity). Top MTEB scores. Apache 2.0 license.
Open SourceHuggingFace
E5 (Microsoft)
E5-mistral-7b-instruct for highest quality. E5-small/base/large for efficiency. Prefix-based query/document asymmetry.
Open SourceMIT
Nomic Embed
nomic-embed-text-v1.5. 768d, Matryoshka support, 8192 token context. Fully open-source (weights + training code + data). Apache 2.0.
Open SourceLong Context
Jina Embeddings v3
8192 token context. Late chunking support. Task-specific LoRA adapters. Multilingual. Available via API and self-hosted.
Open SourceAPI
Start with OpenAI text-embedding-3-small for prototyping (cheap, good quality). Move to open-source (BGE, Nomic) for cost control at scale. Use Cohere for multilingual. Use Voyage for code/legal domains. Always benchmark on your own data before choosing.
cloud
End-to-End RAG Platforms
Managed services that handle the full pipeline
What They Offer
End-to-end platforms handle document ingestion, chunking, embedding, storage, retrieval, and generation as a managed service. You upload documents and get an API or chat interface. No need to choose individual components or manage infrastructure.
LangSmith (LangChain)
Observability, testing, and evaluation platform for LangChain apps. Trace every step, run evaluations, compare prompts. Not a full RAG platform but essential for debugging.
ObservabilityEval
LlamaCloud
Managed parsing and indexing by LlamaIndex. LlamaParse for document parsing (PDFs, tables, images). Managed retrieval API. Pairs with LlamaIndex framework.
ManagedParsing
Vectara
End-to-end RAG-as-a-service. Upload documents, get an API. Built-in chunking, embedding, hybrid search, reranking, and grounded generation. Hallucination detection included.
RAG-as-a-Service
Cohere RAG
Cohere's Command R+ model with built-in RAG capabilities. Grounded generation with inline citations. Connectors for data sources. Enterprise-focused.
LLM + RAGCitations
Ragas
Open-source RAG evaluation framework. Metrics: faithfulness, answer relevancy, context recall, context precision. Works with any RAG pipeline. Essential for measuring quality.
EvaluationOpen Source
Unstructured
Document parsing and preprocessing platform. Handles PDFs, images, HTML, DOCX, PPTX. Hosted API or self-hosted. The de facto standard for document ingestion.
IngestionOpen Source
Platforms trade flexibility for speed. Vectara and Cohere RAG get you to production fastest but limit customization. LlamaCloud and LangSmith augment your custom pipeline with managed services for the hardest parts (parsing, observability). For most teams, a framework (LangChain/LlamaIndex) + vector DB + evaluation (Ragas) is the sweet spot.
deployed_code
Enterprise & Cloud Provider Solutions
RAG built into major cloud platforms
Amazon Bedrock Knowledge Bases
Fully managed RAG on AWS. Upload to S3, auto-chunking, auto-embedding (Titan or Cohere), stores in OpenSearch or Pinecone. Query via Bedrock API with any Bedrock model.
AWSManaged
Azure AI Search + OpenAI
Azure's cognitive search with vector search capabilities. Integrated with Azure OpenAI. Hybrid search (BM25 + vectors), semantic ranker, built-in chunking. Enterprise-grade.
AzureEnterprise
Google Vertex AI Search
Google's enterprise search with RAG. Grounding with Google Search or your own data. Integrated with Gemini models. Auto-handles chunking, embedding, and retrieval.
GCPManaged
Databricks + MosaicML
Vector search built into Databricks lakehouse. Embed with open-source models on Databricks GPUs. Unified data + AI platform. MLflow for experiment tracking.
LakehouseData + AI
When to Choose Cloud Providers
Already on AWS: Bedrock Knowledge Bases is the path of least resistance. Tight S3 integration, IAM security, no new vendors.

Already on Azure: Azure AI Search + Azure OpenAI. Enterprise compliance (SOC2, HIPAA) built in. Best for regulated industries.

Already on GCP: Vertex AI Search with Gemini. Especially strong if you use BigQuery for analytics.

Data-heavy teams: Databricks if your data already lives in the lakehouse.
Cloud provider lock-in is real. These solutions are deeply integrated with their respective clouds. Migrating from Bedrock Knowledge Bases to Azure AI Search is a significant effort. If multi-cloud flexibility matters, use a framework (LangChain) with a standalone vector DB (Pinecone, Qdrant) instead.
open_in_new
Open-Source RAG Applications
Full-stack open-source RAG you can self-host
RAGFlow
Open-source RAG engine with deep document understanding. Template-based chunking for different document types. Built-in UI for document management and chat.
Open SourceSelf-Host
Dify
Open-source LLM application platform with visual workflow builder. RAG pipeline included. Supports multiple LLMs and vector stores. No-code/low-code interface.
Open SourceNo-Code
Anything LLM
All-in-one desktop and Docker app for RAG. Chat with documents locally. Supports many LLMs (OpenAI, Ollama, local models) and vector stores.
Open SourceDesktop
Verba (Weaviate)
Open-source RAG chatbot by Weaviate. Built on Weaviate vector store. Simple UI for document upload and chat. Good starting template.
Open SourceWeaviate
Quivr
Open-source "second brain" powered by RAG. Upload documents, chat with them. Supabase + pgvector backend. Active community.
Open SourceSupabase
PrivateGPT
Fully private RAG. Runs 100% locally with no data leaving your machine. Uses Ollama for local LLMs. Good for sensitive documents.
Open SourcePrivate
Open-source RAG apps are great for prototyping and internal tools. They get you a working RAG chatbot in hours, not weeks. But for production, you'll likely outgrow them and need a custom pipeline built with LangChain/LlamaIndex. Use these to validate the use case, then build custom when you need fine-grained control.
verified
Choosing Your RAG Stack
A practical decision framework
By Team & Stage
Solo developer / prototype:
LangChain or LlamaIndex + Chroma + OpenAI embeddings + GPT-4o. Deploy with FastAPI. Total cost: ~$0 to start.

Startup / small team:
LangChain + Pinecone or Qdrant Cloud + OpenAI embeddings + GPT-4o. LangSmith for observability. Ragas for evaluation.

Enterprise / regulated:
Cloud provider solution (Bedrock/Azure AI Search/Vertex) or LangChain + self-hosted Qdrant/Weaviate + open-source embeddings (BGE). Full audit trail.

Just want it to work:
Vectara or Cohere RAG. Upload docs, get API. Minimal engineering required.
The Recommended Stack (2024-2025)
For most teams building custom RAG:
# The "standard" RAG stack Orchestration: LangChain + LangGraph Embeddings: OpenAI text-embedding-3-small (or BGE/Nomic for self-hosted) Vector Store: Qdrant Cloud or Pinecone (or pgvector if already on Postgres) LLM: GPT-4o or Claude 3.5 Sonnet Reranker: Cohere Rerank or BGE Reranker Observability: LangSmith Evaluation: Ragas Ingestion: Unstructured (complex docs) or LangChain loaders (simple docs)
Start simple, add complexity only when needed. Begin with the basic stack. Measure retrieval quality with Ragas. If retrieval is the bottleneck, add hybrid search and reranking. If document parsing is failing, add Unstructured or LlamaParse. If you need agents, add LangGraph. Every component you add is a component you maintain.