Ch 5 — Review Pipelines & Feedback Loops

Multi-agent review, self-verification, and preventing doom loops
High Level
code
Generate
arrow_forward
checklist
Self-Check
arrow_forward
group
Peer Review
arrow_forward
loop
Fix Loop
arrow_forward
person
Human
arrow_forward
merge
Merge
-
Click play or press Space to begin...
Step- / 8
checklist
Self-Verification Loops
The agent checks its own work before submitting
The Pattern
Before submitting code, the agent runs a pre-completion checklist: Does the code compile? Do tests pass? Does it match the architectural constraints? Are there any linter violations? This self-verification catches obvious mistakes before they enter the review pipeline, reducing noise for reviewers.
Implementation
Self-verification can be tool-based (the agent runs the linter and test suite) or prompt-based (the agent is instructed to review its own output against a checklist). Tool-based is more reliable but requires tool access. Prompt-based is simpler but the agent may miss issues it introduced.
Key insight: Self-verification is the cheapest review step. It catches 30–50% of issues before any external review. Always include a self-check step before the agent declares a task complete.
group
Multi-Agent Review
Using a second agent to review the first
How It Works
A reviewer agent (often a different model or the same model with a different system prompt) reviews the coding agent’s output. The reviewer checks for bugs, architectural violations, missing edge cases, and style issues. It provides feedback that the coding agent can use to improve its output.
Why a Separate Agent
A separate reviewer catches issues the coding agent is blind to. The coding agent has anchoring bias — it’s committed to its approach. A fresh reviewer sees the code without that bias. Using a different model or system prompt ensures genuinely independent review rather than the agent rubber-stamping its own work.
Key insight: Multi-agent review is the AI equivalent of code review. Just as human developers catch each other’s mistakes, a reviewer agent catches the coding agent’s blind spots. The cost of a second inference call is far less than the cost of a bug in production.
psychology
The Reasoning Sandwich
High reasoning for planning, medium for implementation
The Pattern
The reasoning sandwich uses different reasoning levels for different phases: High reasoning (expensive, slow) for planning and architecture decisions. Medium reasoning (balanced) for implementation. High reasoning again for review and verification. This optimizes cost by using expensive reasoning only where it matters most.
Why It Works
Planning mistakes are expensive to fix — a wrong architectural decision cascades through the entire implementation. Implementation is more mechanical — following the plan. Review needs high reasoning to catch subtle issues. The sandwich pattern allocates reasoning budget where mistakes are most costly.
Key insight: Not all steps need the same reasoning level. Using o1-level reasoning for every line of code is wasteful. Using it only for planning and review captures most of the quality benefit at a fraction of the cost.
loop
Doom Loops
When the fix-review cycle never converges
The Problem
A doom loop occurs when the coding agent and reviewer agent enter an infinite cycle: the coder fixes issue A, introducing issue B. The reviewer catches B. The coder fixes B, reintroducing A. Neither agent recognizes the cycle. Without detection, this loop runs until the token budget is exhausted.
Detection & Prevention
Maximum iterations: Cap fix-review cycles at 3–5 rounds. If the agent can’t resolve issues in 3 attempts, escalate to a human.

Diff tracking: Track what changes each iteration. If the same lines are being modified repeatedly, flag a loop.

Issue counting: If the number of issues isn’t decreasing across iterations, the agent is stuck.

Escalation: When a loop is detected, stop the cycle and present the current state to a human reviewer.
Critical in AI: Doom loops are one of the most expensive failure modes in agent systems. A single undetected loop can consume thousands of dollars in API calls. Loop detection is not optional — it’s a cost safety mechanism.
integration_instructions
CI/CD Integration
Connecting agent review to your existing pipeline
The Integration
Agent review pipelines should integrate with your existing CI/CD system, not replace it. The agent creates a PR, CI runs automated checks (linting, tests, structural tests), the reviewer agent adds comments, and the human reviewer sees both CI results and agent review comments in the same interface.
Workflow
// Agent-integrated CI/CD 1. Agent creates PR 2. CI runs: lint, test, structural 3. Reviewer agent comments on PR 4. If CI fails: agent auto-fixes 5. If reviewer flags issues: agent fixes 6. Human reviews final state 7. Human approves or requests changes 8. Merge on approval
Key insight: The best agent review pipelines are invisible to the human reviewer. By the time a human sees the PR, the agent has already fixed all the mechanical issues. The human focuses only on strategic decisions and subtle quality.
feedback
Feedback Quality
Making review feedback actionable for agents
Good vs Bad Feedback
Bad: “This code could be better.” (Vague, not actionable.)

Bad: “Consider refactoring.” (No specific direction.)

Good: “Function processOrder at line 45 is 60 lines long. Extract the validation logic (lines 12–30) into a separate validateOrder function.” (Specific, actionable, located.)
Structured Feedback
The reviewer agent should produce structured feedback with: the file and line number, the issue category (bug, style, architecture, performance), severity (must-fix, should-fix, nice-to-have), and a specific fix suggestion. This format makes it easy for the coding agent to act on and for humans to triage.
Rule of thumb: If a human reviewer wouldn’t understand the feedback, the coding agent won’t either. Review feedback should be as specific as a bug report: what’s wrong, where it is, and how to fix it.
person
The Human Review Gate
When and how humans should intervene
What Humans Review
After automated and agent review, humans should focus on what machines can’t assess: Strategic decisions (is this the right approach?), business logic correctness (does this match the requirements?), security implications (does this introduce vulnerabilities?), and user experience (will users understand this?).
Review Efficiency
With a good agent review pipeline, human review time drops by 60–80%. The human no longer checks for style violations, missing tests, or architectural issues — those are caught earlier. The human reviewer becomes a strategic approver rather than a line-by-line checker.
Key insight: The goal is not to eliminate human review. It’s to make human review efficient by ensuring humans only see issues that require human judgment. Mechanical issues should never reach a human reviewer.
trending_up
Measuring Pipeline Effectiveness
Metrics that tell you if your review pipeline is working
Key Metrics
First-pass approval rate: What percentage of agent PRs are approved without changes? Target: >70%.

Review cycle time: How long from PR creation to merge? Target: <2 hours for agent PRs.

Issues caught per stage: What percentage of issues are caught by self-check, agent review, CI, vs human review?

Doom loop rate: What percentage of PRs enter fix-review loops? Target: <5%.
Continuous Improvement
When human reviewers consistently catch the same type of issue, that’s a signal to add it to the automated pipeline. Every recurring human review comment should become a linter rule, structural test, or reviewer agent instruction. Over time, the pipeline catches more and humans catch less — that’s the goal.
Key insight: A review pipeline is a learning system. It should get better over time as you encode human review patterns into automated checks. If your first-pass approval rate isn’t improving quarter over quarter, your pipeline isn’t learning.