Ch 6 — Retrieval Strategies — Under the Hood

BM25 internals, RRF math, cross-encoder architecture, and self-query parsing
Under the Hood
-
Click play or press Space to begin...
Step- / 10
ABM25 Scoring InternalsThe math behind keyword search
1
text_fields
Tokenize Query
Split into terms
IDF
1
functions
BM25 Formula
TF × IDF scoring
rank
1
sort
Ranked Results
Sorted by BM25 score
2
data_arraySparse Vectors: SPLADE learns term weights via MLM, producing sparse embeddings for neural keyword search
BDense Retrieval InternalsBi-encoder architecture and Maximum Marginal Relevance
3
call_split
Bi-Encoder
Separate q & d encoders
embed
3
scatter_plot
ANN Search
HNSW top-k
diversify
3
view_list
MMR
Maximal Marginal Relevance
4
tuneSimilarity Thresholds: score_threshold filtering removes low-confidence results before they reach the LLM
CHybrid Search & Reciprocal Rank FusionMerging dense and sparse result lists
5
scatter_plot
Dense List
Ranked by cosine sim
RRF
5
merge
Rank Fusion
1/(k + rank)
RRF
5
text_fields
Sparse List
Ranked by BM25
6
balanceWeighted Fusion: Convex combination with alpha (Weaviate) or explicit weights (LangChain EnsembleRetriever)
DCross-Encoder RerankingDeep relevance scoring with full attention
7
join_inner
Concat [q; d]
Query + doc as input
transformer
7
psychology
Full Attention
Cross q-d attention
classify
7
speed
Relevance Score
Single float output
8
compare_arrowsBi-Encoder vs Cross-Encoder: O(1) per doc vs O(n) per doc — why reranking is always a second pass
ESelf-Query Retrieval & RoutingLLM-powered filter extraction and index selection
9
smart_toy
LLM Parser
Extract query + filters
structured
9
filter_alt
Filter Object
Metadata conditions
route
9
alt_route
Index Router
Select best index
FLatency Budget & Retrieval EvaluationMeasuring and optimizing the retrieval pipeline
10
timer
Latency Budget
Embed + Search + Rerank
metrics
10
monitoring
Recall@k / MRR
Retrieval quality metrics
optimize
10
verified
Production Config
Recommended pipeline
1
Title