Ch 11 — RAG in Production & Evaluation — Under the Hood

Deployment internals, Ragas metrics, tracing, feedback systems, and CI/CD for RAG
Under the Hood
-
Click play or press Space to begin...
Step- / 10
AProduction Deployment InternalsAPI design, async patterns, and scaling
1
dns
FastAPI
Async endpoints
queue
1
sync_alt
Ingestion
Batch pipeline
scale
1
cloud
Deploy
Docker / K8s
2
streamStreaming: SSE endpoint, LLM token streaming, TTFT optimization
BRagas Evaluation InternalsHow LLM-as-judge metrics work under the hood
3
fact_check
Faithfulness
Claim verification
score
3
target
Relevancy
Question match
score
3
library_books
Context
Precision + recall
4
datasetBuilding golden test sets: manual curation, synthetic generation, production sampling
CTracing & Observability InternalsLangSmith, LangFuse, and OpenTelemetry integration
5
account_tree
Trace Spans
Parent / child
collect
5
analytics
Metrics
Latency, tokens, cost
alert
5
notifications
Alerts
Quality / cost drift
6
bug_reportDebugging: trace waterfall, input/output inspection, error root cause
DFeedback Collection & AnalysisBuilding the feedback flywheel into your system
7
thumb_up
Collect
Thumbs + implicit
store
7
analytics
Analyze
Cluster failures
act
7
auto_fix_high
Improve
Fix pipeline
8
scienceA/B testing: shadow pipelines, traffic splitting, statistical significance
ECI/CD for RAG PipelinesAutomated evaluation gates and regression testing
9
code
Code Change
PR / commit
test
9
science
Eval Gate
Ragas thresholds
deploy
9
rocket_launch
Ship
If metrics pass
FCost Management & ScalingToken budgets, model routing, and infrastructure scaling
10
attach_money
Token Budget
Per-request limits
route
10
alt_route
Model Router
Simple vs complex
scale
10
trending_up
Auto-Scale
Load-based
1
Title