Ch 5 — Retrieval-Augmented Generation (RAG)
Give your LLM knowledge it was never trained on — documents, databases, and live data
High Level
-
Click play or press Space to begin...
AThe Problem RAG SolvesLLMs have a knowledge cutoff — RAG fills the gap
1psychology
LLM AloneTrained on public data
up to a cutoff date
gap
help
Missing KnowledgeYour docs, recent data,
private databases
solution
auto_fix_high
RAGRetrieve relevant docs
and inject into prompt
2arrow_downward Step 1: Ingestion — loading and chunking documents
BDocument IngestionLoad → Split → Embed → Store
description
Raw DocumentsPDFs, HTML, Markdown,
CSV, databases
load
content_cut
Text SplitterChunk into smaller
overlapping pieces
embed
3scatter_plot
EmbeddingsText → dense vector
[0.02, -0.15, ...]
store
database
Vector StoreFAISS, Chroma,
Pinecone, Weaviate
4arrow_downward Step 2: Retrieval — finding relevant chunks at query time
CRetrievalUser question → embed → similarity search → top-k chunks
chat
User Query"What's our refund
policy?"
embed
scatter_plot
Query VectorSame embedding
model as ingestion
search
search
Similarity SearchCosine distance
→ top-k nearest
top-k
article
Retrieved Chunks3-5 most relevant
document pieces
5arrow_downward Step 3: Generation — LLM answers using retrieved context
DGenerationStuff retrieved chunks into the prompt → LLM answers
article
Retrieved ChunksContext from
your documents
+ query
edit_note
Prompt Template"Answer based on
this context: {ctx}"
invoke
psychology
LLMGenerates answer
grounded in context
6arrow_downward The full RAG chain in LangChain
EThe Full RAG ChainRetriever | Prompt | Model | Parser — composed with LCEL
chat
QuestionUser's natural
language query
retrieve
manage_search
Retrievervector_store
.as_retriever()
format
edit_note
PromptContext + question
injected
generate
psychology
LLMAnswer grounded
in documents
7arrow_downward Advanced patterns: multi-query, reranking, RAG as a tool
FAdvanced RAG PatternsBeyond basic retrieve-and-generate
dynamic_feed
Multi-QueryLLM generates 3-5
query variations
+
sort
RerankingCross-encoder scores
relevance more precisely
+
8build
RAG as a ToolAgent decides when
to search docs