Ch 11: RAG in Production & Evaluation

Ch 11 — RAG in Production & Evaluation — Under the Hood

Deployment internals, Ragas metrics, tracing, feedback systems, and CI/CD for RAG

Index ← High Level

Under the Hood

Click play or press Space to begin...

Step- / 10

AProduction Deployment InternalsAPI design, async patterns, and scaling

dns

FastAPI

Async endpoints

queue

sync_alt

Ingestion

Batch pipeline

scale

cloud

Deploy

Docker / K8s

streamStreaming: SSE endpoint, LLM token streaming, TTFT optimization

BRagas Evaluation InternalsHow LLM-as-judge metrics work under the hood

fact_check

Faithfulness

Claim verification

score

target

Relevancy

Question match

score

library_books

Context

Precision + recall

datasetBuilding golden test sets: manual curation, synthetic generation, production sampling

CTracing & Observability InternalsLangSmith, LangFuse, and OpenTelemetry integration

account_tree

Trace Spans

Parent / child

collect

analytics

Metrics

Latency, tokens, cost

alert

notifications

Alerts

Quality / cost drift

bug_reportDebugging: trace waterfall, input/output inspection, error root cause

DFeedback Collection & AnalysisBuilding the feedback flywheel into your system

thumb_up

Collect

Thumbs + implicit

store

analytics

Analyze

Cluster failures

act

auto_fix_high

Improve

Fix pipeline

scienceA/B testing: shadow pipelines, traffic splitting, statistical significance

ECI/CD for RAG PipelinesAutomated evaluation gates and regression testing

code

Code Change

PR / commit

test

science

Eval Gate

Ragas thresholds

deploy

rocket_launch

Ship

If metrics pass

FCost Management & ScalingToken budgets, model routing, and infrastructure scaling

attach_money

Token Budget

Per-request limits

route

alt_route

Model Router

Simple vs complex

scale

trending_up

Auto-Scale

Load-based