Ch 8 — Observability & Debugging
High-Level Overview
8 Steps
1
Click Next to begin exploring how to observe, debug, and measure your AI agents in production.
Problem
Why Observability Matters
Agents are non-deterministic black boxes
1
psychology
Agent Runs
LLM calls, tool use, retrieval, branching
produces
visibility_off
Black Box
No visibility into why it chose that path
need
monitoring
Observability
Traces, metrics, logs, cost tracking
2
account_tree Traces & Runs — the core primitive of LLM observability
Core Concept
Traces & Runs
Hierarchical recording of every step
account_tree
Trace
Root-level record of one agent invocation
contains
call_split
Runs (Spans)
chain, llm, tool, retriever — nested tree
captures
data_object
Run Data
inputs, outputs, tokens, latency, cost
Platform
LangSmith
LangChain's integrated observability platform
3
toggle_on
Enable Tracing
Set env vars — automatic for LangChain
sends to
cloud_upload
LangSmith API
Collects traces, runs, metadata
view in
dashboard
Studio UI
Inspect traces, debug, run experiments
4
payments Token & Cost Tracking — know what every agent run costs
Metrics
Token & Cost Tracking
Per-run and aggregated cost visibility
token
Token Counts
prompt_tokens, completion_tokens, total
priced
attach_money
Cost per Run
Automatic for known models, manual for custom
rolled up
bar_chart
Dashboards
Cost by model, user, feature, thread
Quality
Evaluations & Scoring
Measuring agent quality systematically
5
dataset
Datasets
Curated input/expected-output pairs
run against
science
Experiments
Batch-run agent over dataset, collect outputs
scored by
rate_review
Evaluators
LLM-as-Judge, heuristic, human annotation
6
lock_open Langfuse — open-source observability alternative
Open Source
Langfuse
Self-hostable, MIT-licensed LLM observability
hub
Langfuse
Traces, spans, generations, cost, scores
via
code
@observe()
Python decorator — minimal code changes
or
settings_ethernet
OpenTelemetry
Standard OTEL backend via /api/public/otel
Debug
Debugging Agent Failures
Practical strategies for finding and fixing issues
7
search
Inspect Trace
Find the failing run in the tree
check
input
Inputs / Outputs
Was the prompt right? Was the output wrong?
replay
replay
Replay & Fix
Edit prompt in Studio, re-run, compare
Landscape
Observability Landscape
Comparing the major platforms
8
cloud
LangSmith
Managed, deep LangChain integration
vs
lock_open
Langfuse
Open-source, self-host, OTEL native
vs
local_fire_department
Phoenix / Arize
OTEL-native, evals, open-source core