Ch 5: Retrieval-Augmented Generation (RAG)

Ch 5 — Retrieval-Augmented Generation (RAG)

Give your LLM knowledge it was never trained on — documents, databases, and live data

Index Under the Hood →

High Level

Click play or press Space to begin...

Step- / 8

AThe Problem RAG SolvesLLMs have a knowledge cutoff — RAG fills the gap

psychology

LLM AloneTrained on public data
up to a cutoff date

gap

help

Missing KnowledgeYour docs, recent data,
private databases

solution

auto_fix_high

RAGRetrieve relevant docs
and inject into prompt

arrow_downward Step 1: Ingestion — loading and chunking documents

BDocument IngestionLoad → Split → Embed → Store

description

Raw DocumentsPDFs, HTML, Markdown,
CSV, databases

load

content_cut

Text SplitterChunk into smaller
overlapping pieces

embed

scatter_plot

EmbeddingsText → dense vector
[0.02, -0.15, ...]

store

database

Vector StoreFAISS, Chroma,
Pinecone, Weaviate

arrow_downward Step 2: Retrieval — finding relevant chunks at query time

CRetrievalUser question → embed → similarity search → top-k chunks

chat

User Query"What's our refund
policy?"

embed

scatter_plot

Query VectorSame embedding
model as ingestion

Similarity SearchCosine distance
→ top-k nearest

top-k

article

Retrieved Chunks3-5 most relevant
document pieces

arrow_downward Step 3: Generation — LLM answers using retrieved context

DGenerationStuff retrieved chunks into the prompt → LLM answers

article

Retrieved ChunksContext from
your documents

+ query

edit_note

Prompt Template"Answer based on
this context: {ctx}"

invoke

psychology

LLMGenerates answer
grounded in context

arrow_downward The full RAG chain in LangChain

EThe Full RAG ChainRetriever | Prompt | Model | Parser — composed with LCEL

chat

QuestionUser's natural
language query

retrieve

manage_search

Retrievervector_store
.as_retriever()

format

edit_note

PromptContext + question
injected

generate

psychology

LLMAnswer grounded
in documents

arrow_downward Advanced patterns: multi-query, reranking, RAG as a tool

FAdvanced RAG PatternsBeyond basic retrieve-and-generate

dynamic_feed

Multi-QueryLLM generates 3-5
query variations

sort

RerankingCross-encoder scores
relevance more precisely

build

RAG as a ToolAgent decides when
to search docs