What to Monitor
Task completion rate: What percentage of dispatched tasks complete successfully?
Time to completion: How long do tasks take? Are some stuck?
Cost per task: Token consumption and API costs per completed task.
Error rate by type: Which failure modes are most common?
Review pass rate: What percentage pass automated review on first attempt?
Alerting
Stuck agents: Alert when an agent hasn’t produced output in N minutes.
Cost spikes: Alert when a single task exceeds the cost budget (likely a doom loop).
Error rate spikes: Alert when the failure rate exceeds the baseline (likely a harness or model issue).
Merge conflicts: Alert when parallel agents create conflicting changes.
Key insight: At scale, you’re not monitoring individual agents — you’re monitoring the fleet. The same observability principles that apply to microservices apply to agent fleets: health checks, metrics, alerting, and dashboards.