Ch 6: The Agent Loop

psychology

The Core Loop: Reason → Act → Observe

The simplest idea that makes agents powerful

What Is an Agent Loop?

An AI coding agent is just an LLM in a loop. It receives your request, reasons about what to do, acts by calling a tool (read a file, run a command, edit code), observes the result, and then reasons again. This cycle repeats until the task is done or the agent decides it needs your input.

The ReAct Pattern

This architecture is called ReAct (Reasoning + Acting). The key insight: interleaving reasoning with real-world observations produces dramatically better results than generating an entire plan upfront. Each observation informs the next reasoning step, allowing the agent to adapt to what it discovers.

The Loop in Pseudocode

while task_not_complete: // 1. REASON: LLM thinks about next step thought = model.generate( system_prompt + conversation + tool_results ) // 2. ACT: LLM requests tool use tool_call = thought.extract_tool_call() if tool_call is None: break // No more tools needed → done // 3. OBSERVE: Execute tool, get result result = execute(tool_call) conversation.append(result) // Loop back to REASON with new info

Key insight: The loop terminates when the model produces a response with no tool calls. That’s the only exit condition. The model itself decides when it’s done — there’s no external orchestrator counting steps or checking completion criteria.

handyman

The Toolbox: What Agents Can Do

Every action is a tool call — the model never executes directly

Core Tool Categories

// FILE OPERATIONS Read Read file contents (with line ranges) Write Create or overwrite a file StrReplace Find-and-replace within a file Glob Find files matching a pattern // SEARCH Grep Regex search across files Semantic Find code by meaning, not text // EXECUTION Shell Run terminal commands (npm, git, etc.) Lint Check files for errors // EXTERNAL WebSearch Look up documentation WebFetch Read a URL’s content MCP Tools Connect to external services

The Model Requests, Your System Executes

The LLM never runs code directly. It outputs a structured tool request (tool name + arguments), your IDE or CLI intercepts it, decides whether to allow it (permission check), executes the tool, and sends the result back. The model only sees text in and text out.

Parallel Tool Calls

Modern agents can request multiple tools in a single turn. If the agent needs to read three files and they’re independent, it requests all three simultaneously. The system executes them in parallel and returns all results at once. This cuts round-trip latency dramatically for discovery-heavy tasks.

Why it matters: Tools are the agent’s hands. The quality and design of the tool set directly determines what the agent can accomplish. A model with great reasoning but poor tools will underperform a slightly weaker model with well-designed tools.

data_object

Context Injection: What the Agent Sees Before You Speak

The invisible context that shapes every response

Silently Injected Context

Before your message reaches the model, the system silently prepends a rich context payload:

• System prompt — personality, rules, capabilities
• Open files — what you’re currently looking at
• Recently viewed files — what you were working on
• Git status — branch, staged changes, diffs
• OS and workspace info — paths, shell, environment
• Rules files — project-specific instructions (AGENTS.md, .cursorrules)
• Linter errors — current problems in open files

Why This Matters

This injected context is invisible to you but gives the agent awareness of your project without you explaining anything. When you say “fix the error,” the agent already knows which file has the error, what the error message is, and what branch you’re on.

The Token Cost

All this context consumes tokens from the model’s context window. A typical agent session starts with 3,000–8,000 tokens of injected context before you type a single word. This is why context engineering (Ch 7) is critical — every wasted token in the system prompt is a token the agent can’t use for reasoning about your actual task.

Practical tip: Keep your rules files concise. A 500-line AGENTS.md file consumes ~2,000 tokens on every single request. Under context pressure, agents may skip advisory rules entirely. Put only critical, non-obvious instructions in rules files.

memory

Context Window Management: The Agent’s Working Memory

What happens when the conversation gets too long

The Accumulation Problem

Every turn in the agent loop — your message, the model’s reasoning, tool calls, tool results — gets appended to the conversation. A complex task might involve 20–50 tool calls, each adding hundreds or thousands of tokens. A single file read can add 3,000+ tokens. The context window fills up fast.

Automatic Compaction

When the conversation approaches the token limit, the system automatically summarizes older messages. Key decisions, file paths, and error messages are preserved. Exact tool outputs and intermediate reasoning are compressed. This happens transparently — the agent doesn’t know its earlier memories have been summarized.

What Gets Lost

• Exact code from early file reads — replaced with summaries
• Intermediate reasoning — only conclusions preserved
• Failed attempts — compressed to “tried X, didn’t work”
• Verbose tool outputs — trimmed to key information

This is why agents sometimes “forget” decisions made earlier in a long session or re-read files they already read.

Context drift: Nearly 65% of enterprise AI agent failures are caused by context drift — not raw token exhaustion, but subtle misalignments that compound across multi-step workflows. The agent’s understanding gradually diverges from reality as summaries lose nuance. For complex tasks, shorter focused sessions beat marathon sessions.

error

Error Recovery: When Things Go Wrong

The fix-the-fix loop and how to escape it

The Cascading Error Problem

The most common agent failure mode: the agent makes an edit, it introduces a bug, the agent tries to fix the bug, the fix introduces a new bug, and the cycle continues. Each “fix” moves further from the original working state. After 3–4 iterations, the code is often worse than where it started.

Why Agents Get Stuck

• Sunk cost reasoning — the agent tries to salvage its approach instead of reverting
• Lost context — the original working code was compacted away
• Narrow focus — fixing the symptom without understanding the root cause
• No test feedback — the agent doesn’t know if the fix actually works

Checkpoint & Revert

Modern agents implement a checkpoint system that snapshots file contents before each edit. When the agent spirals, you can revert to any previous checkpoint instantly — no git required. This is the single most important safety feature: the ability to undo everything the agent did and try a different approach.

When to Intervene

// Signs the agent is in a fix-the-fix loop: 1. Same file edited 3+ times in a row 2. Error count is increasing, not decreasing 3. Agent is undoing its own recent changes 4. New errors appear in files the agent didn’t touch 5. Agent says “let me try a different approach” for the second time Action: Stop. Revert to checkpoint. Restate the problem with more context or a different strategy.

Key insight: The best developers using AI agents aren’t the ones who let the agent run longest. They’re the ones who intervene earliest when they spot a spiral. Knowing when to stop and revert is a core skill.

security

Permissions & Safety: The Trust Boundary

What agents can do without asking — and what requires your approval

The Permission Model

Agent tools are categorized by risk level:

• Read-only (auto-approved) — reading files, searching code, listing directories
• Write (may require approval) — editing files, creating files
• Execute (usually requires approval) — running shell commands, installing packages
• Network (requires approval) — web requests, API calls
• Destructive (always requires approval) — deleting files, force-pushing git

Sandboxing

Commands run inside a sandbox that restricts filesystem access, network access, and system calls. The agent can write to your workspace but not to system directories. Network access is limited to known package registries and version control providers. Broader access requires explicit permission.

The Trust Spectrum

// Trust levels in practice: YOLO Mode Auto-approve everything. Maximum speed. Risk: agent runs rm -rf or pushes to main. Balanced Mode Auto-approve reads + writes. Prompt for shell commands + network. Best for most development work. Cautious Mode Approve every action individually. Slowest, but safest for production code. Good for unfamiliar codebases.

Practical guidance: Start cautious with a new agent or unfamiliar codebase. As you build trust and understand the agent’s patterns, gradually increase autonomy. Most experienced developers settle on balanced mode — auto-approve file operations, review shell commands.

account_tree

Planning Mode: Think Before You Code

Why the best agent sessions start with a plan

Plan Then Execute

The most effective pattern for complex tasks: ask the agent to plan first, code second. In planning mode, the agent reads the codebase, identifies affected files, outlines the approach, and presents it for your review. Only after you approve does it start making changes. This prevents the agent from charging down the wrong path.

What a Good Plan Looks Like

// Agent plan for “add user authentication”: 1. Create auth/middleware.ts → JWT verification, session handling 2. Update routes/api.ts → Wrap protected routes with auth middleware 3. Create auth/login.ts → Login endpoint, password hashing 4. Update db/schema.ts → Add users table with email, hash, role 5. Add tests in tests/auth.test.ts → Login, protected routes, token expiry // Dependencies: bcrypt, jsonwebtoken // Estimated: 5 files, ~200 lines

Why Planning Works

• Catches wrong assumptions early — you can correct the approach before any code is written
• Reduces fix-the-fix loops — the agent has a coherent strategy instead of improvising
• Creates documentation — the plan itself serves as a record of what was done and why
• Enables partial execution — you can approve steps 1–3 and handle step 4 yourself

When to Skip Planning

Not every task needs a plan. Skip it for:
• Single-file changes with clear scope
• Bug fixes where you already know the cause
• Boilerplate generation (tests, types, configs)
• Tasks you’ve done before and trust the agent with

Pro tip: Save good plans. When the agent produces a plan you approve, that plan becomes reusable documentation. Next time you need similar work, paste the plan structure and the agent will follow the same pattern.

trending_up

The Agent Gets Better: Self-Improvement Loops

How agents learn from your corrections across sessions

The Stateless Problem

Agents are stateless between sessions. Every new conversation starts from zero. The agent doesn’t remember that you prefer tabs over spaces, that your project uses a specific testing framework, or that you corrected it three times yesterday about import paths. This is why the same mistakes repeat.

Rules Files as Persistent Memory

The solution: rules files (AGENTS.md, .cursorrules, CLAUDE.md) act as persistent memory injected at session start. When you notice the agent making the same mistake repeatedly, add a rule. These files accumulate your project’s conventions, preferences, and gotchas — becoming a living style guide the agent follows.

Automated Self-Reflection

Emerging tools analyze past session transcripts to identify moments of friction — where you said “no,” “not that,” or had to repeat instructions. They automatically propose additions to your rules files. In practice, this has reduced correction rates from 0.45 to 0.07 per session over daily iterations.

The Compound Effect

Over weeks, a well-maintained rules file transforms the agent experience. The agent knows your naming conventions, testing patterns, error handling style, and architectural preferences. It stops making the mistakes you’ve corrected. The investment in rules pays compound returns on every future session.

Key insight: The agent loop isn’t just Reason → Act → Observe within a session. There’s a meta-loop across sessions: Use → Correct → Encode Rule → Better Next Time. The developers who get the most from AI agents are the ones who invest in this outer loop.

Ch 6 — The Agent Loop