Ch 5: Review Pipelines & Feedback Loops

Ch 5 — Review Pipelines & Feedback Loops

Multi-agent review, self-verification, and preventing doom loops

Index

High Level

code

Generate

arrow_forward

checklist

Self-Check

arrow_forward

group

Peer Review

arrow_forward

loop

Fix Loop

arrow_forward

person

Human

arrow_forward

merge

Merge

Click play or press Space to begin...

Step- / 8

checklist

Self-Verification Loops

The agent checks its own work before submitting

The Pattern

Before submitting code, the agent runs a pre-completion checklist: Does the code compile? Do tests pass? Does it match the architectural constraints? Are there any linter violations? This self-verification catches obvious mistakes before they enter the review pipeline, reducing noise for reviewers.

Implementation

Self-verification can be tool-based (the agent runs the linter and test suite) or prompt-based (the agent is instructed to review its own output against a checklist). Tool-based is more reliable but requires tool access. Prompt-based is simpler but the agent may miss issues it introduced.

Key insight: Self-verification is the cheapest review step. It catches 30–50% of issues before any external review. Always include a self-check step before the agent declares a task complete.

group

Multi-Agent Review

Using a second agent to review the first

How It Works

A reviewer agent (often a different model or the same model with a different system prompt) reviews the coding agent’s output. The reviewer checks for bugs, architectural violations, missing edge cases, and style issues. It provides feedback that the coding agent can use to improve its output.

Why a Separate Agent

A separate reviewer catches issues the coding agent is blind to. The coding agent has anchoring bias — it’s committed to its approach. A fresh reviewer sees the code without that bias. Using a different model or system prompt ensures genuinely independent review rather than the agent rubber-stamping its own work.

Key insight: Multi-agent review is the AI equivalent of code review. Just as human developers catch each other’s mistakes, a reviewer agent catches the coding agent’s blind spots. The cost of a second inference call is far less than the cost of a bug in production.

psychology

The Reasoning Sandwich

High reasoning for planning, medium for implementation

The Pattern

The reasoning sandwich uses different reasoning levels for different phases: High reasoning (expensive, slow) for planning and architecture decisions. Medium reasoning (balanced) for implementation. High reasoning again for review and verification. This optimizes cost by using expensive reasoning only where it matters most.

Why It Works

Planning mistakes are expensive to fix — a wrong architectural decision cascades through the entire implementation. Implementation is more mechanical — following the plan. Review needs high reasoning to catch subtle issues. The sandwich pattern allocates reasoning budget where mistakes are most costly.

Key insight: Not all steps need the same reasoning level. Using o1-level reasoning for every line of code is wasteful. Using it only for planning and review captures most of the quality benefit at a fraction of the cost.

loop

Doom Loops

When the fix-review cycle never converges

The Problem

A doom loop occurs when the coding agent and reviewer agent enter an infinite cycle: the coder fixes issue A, introducing issue B. The reviewer catches B. The coder fixes B, reintroducing A. Neither agent recognizes the cycle. Without detection, this loop runs until the token budget is exhausted.

Detection & Prevention

Maximum iterations: Cap fix-review cycles at 3–5 rounds. If the agent can’t resolve issues in 3 attempts, escalate to a human.

Diff tracking: Track what changes each iteration. If the same lines are being modified repeatedly, flag a loop.

Issue counting: If the number of issues isn’t decreasing across iterations, the agent is stuck.

Escalation: When a loop is detected, stop the cycle and present the current state to a human reviewer.

Critical in AI: Doom loops are one of the most expensive failure modes in agent systems. A single undetected loop can consume thousands of dollars in API calls. Loop detection is not optional — it’s a cost safety mechanism.

integration_instructions

CI/CD Integration

Connecting agent review to your existing pipeline

The Integration

Agent review pipelines should integrate with your existing CI/CD system, not replace it. The agent creates a PR, CI runs automated checks (linting, tests, structural tests), the reviewer agent adds comments, and the human reviewer sees both CI results and agent review comments in the same interface.

Workflow

// Agent-integrated CI/CD 1. Agent creates PR 2. CI runs: lint, test, structural 3. Reviewer agent comments on PR 4. If CI fails: agent auto-fixes 5. If reviewer flags issues: agent fixes 6. Human reviews final state 7. Human approves or requests changes 8. Merge on approval

Key insight: The best agent review pipelines are invisible to the human reviewer. By the time a human sees the PR, the agent has already fixed all the mechanical issues. The human focuses only on strategic decisions and subtle quality.

feedback

Feedback Quality

Making review feedback actionable for agents

Good vs Bad Feedback

Bad: “This code could be better.” (Vague, not actionable.)

Bad: “Consider refactoring.” (No specific direction.)

Good: “Function processOrder at line 45 is 60 lines long. Extract the validation logic (lines 12–30) into a separate validateOrder function.” (Specific, actionable, located.)

Structured Feedback

The reviewer agent should produce structured feedback with: the file and line number, the issue category (bug, style, architecture, performance), severity (must-fix, should-fix, nice-to-have), and a specific fix suggestion. This format makes it easy for the coding agent to act on and for humans to triage.

Rule of thumb: If a human reviewer wouldn’t understand the feedback, the coding agent won’t either. Review feedback should be as specific as a bug report: what’s wrong, where it is, and how to fix it.

person

The Human Review Gate

When and how humans should intervene

What Humans Review

After automated and agent review, humans should focus on what machines can’t assess: Strategic decisions (is this the right approach?), business logic correctness (does this match the requirements?), security implications (does this introduce vulnerabilities?), and user experience (will users understand this?).

Review Efficiency

With a good agent review pipeline, human review time drops by 60–80%. The human no longer checks for style violations, missing tests, or architectural issues — those are caught earlier. The human reviewer becomes a strategic approver rather than a line-by-line checker.

Key insight: The goal is not to eliminate human review. It’s to make human review efficient by ensuring humans only see issues that require human judgment. Mechanical issues should never reach a human reviewer.

trending_up

Measuring Pipeline Effectiveness

Metrics that tell you if your review pipeline is working

Key Metrics

First-pass approval rate: What percentage of agent PRs are approved without changes? Target: >70%.

Review cycle time: How long from PR creation to merge? Target: <2 hours for agent PRs.

Issues caught per stage: What percentage of issues are caught by self-check, agent review, CI, vs human review?

Doom loop rate: What percentage of PRs enter fix-review loops? Target: <5%.

Continuous Improvement

When human reviewers consistently catch the same type of issue, that’s a signal to add it to the automated pipeline. Every recurring human review comment should become a linter rule, structural test, or reviewer agent instruction. Over time, the pipeline catches more and humans catch less — that’s the goal.

Key insight: A review pipeline is a learning system. It should get better over time as you encode human review patterns into automated checks. If your first-pass approval rate isn’t improving quarter over quarter, your pipeline isn’t learning.

arrow_back Ch 4: Architectural Constraints Ch 6: Memory & Entropy arrow_forward