Ch 6 — The Agent Loop

Reason → Act → Observe → Repeat — how coding agents think and work
High Level
chat
Prompt
arrow_forward
psychology
Reason
arrow_forward
build
Tool Call
arrow_forward
visibility
Observe
arrow_forward
replay
Loop
arrow_forward
check_circle
Done
-
Click play or press Space to begin...
Step- / 8
psychology
The Core Loop: Reason → Act → Observe
The simplest idea that makes agents powerful
What Is an Agent Loop?
An AI coding agent is just an LLM in a loop. It receives your request, reasons about what to do, acts by calling a tool (read a file, run a command, edit code), observes the result, and then reasons again. This cycle repeats until the task is done or the agent decides it needs your input.
The ReAct Pattern
This architecture is called ReAct (Reasoning + Acting). The key insight: interleaving reasoning with real-world observations produces dramatically better results than generating an entire plan upfront. Each observation informs the next reasoning step, allowing the agent to adapt to what it discovers.
The Loop in Pseudocode
while task_not_complete: // 1. REASON: LLM thinks about next step thought = model.generate( system_prompt + conversation + tool_results ) // 2. ACT: LLM requests tool use tool_call = thought.extract_tool_call() if tool_call is None: break // No more tools needed → done // 3. OBSERVE: Execute tool, get result result = execute(tool_call) conversation.append(result) // Loop back to REASON with new info
Key insight: The loop terminates when the model produces a response with no tool calls. That’s the only exit condition. The model itself decides when it’s done — there’s no external orchestrator counting steps or checking completion criteria.
handyman
The Toolbox: What Agents Can Do
Every action is a tool call — the model never executes directly
Core Tool Categories
// FILE OPERATIONS Read Read file contents (with line ranges) Write Create or overwrite a file StrReplace Find-and-replace within a file Glob Find files matching a pattern // SEARCH Grep Regex search across files Semantic Find code by meaning, not text // EXECUTION Shell Run terminal commands (npm, git, etc.) Lint Check files for errors // EXTERNAL WebSearch Look up documentation WebFetch Read a URL’s content MCP Tools Connect to external services
The Model Requests, Your System Executes
The LLM never runs code directly. It outputs a structured tool request (tool name + arguments), your IDE or CLI intercepts it, decides whether to allow it (permission check), executes the tool, and sends the result back. The model only sees text in and text out.
Parallel Tool Calls
Modern agents can request multiple tools in a single turn. If the agent needs to read three files and they’re independent, it requests all three simultaneously. The system executes them in parallel and returns all results at once. This cuts round-trip latency dramatically for discovery-heavy tasks.
Why it matters: Tools are the agent’s hands. The quality and design of the tool set directly determines what the agent can accomplish. A model with great reasoning but poor tools will underperform a slightly weaker model with well-designed tools.
data_object
Context Injection: What the Agent Sees Before You Speak
The invisible context that shapes every response
Silently Injected Context
Before your message reaches the model, the system silently prepends a rich context payload:

System prompt — personality, rules, capabilities
Open files — what you’re currently looking at
Recently viewed files — what you were working on
Git status — branch, staged changes, diffs
OS and workspace info — paths, shell, environment
Rules files — project-specific instructions (AGENTS.md, .cursorrules)
Linter errors — current problems in open files
Why This Matters
This injected context is invisible to you but gives the agent awareness of your project without you explaining anything. When you say “fix the error,” the agent already knows which file has the error, what the error message is, and what branch you’re on.
The Token Cost
All this context consumes tokens from the model’s context window. A typical agent session starts with 3,000–8,000 tokens of injected context before you type a single word. This is why context engineering (Ch 7) is critical — every wasted token in the system prompt is a token the agent can’t use for reasoning about your actual task.
Practical tip: Keep your rules files concise. A 500-line AGENTS.md file consumes ~2,000 tokens on every single request. Under context pressure, agents may skip advisory rules entirely. Put only critical, non-obvious instructions in rules files.
memory
Context Window Management: The Agent’s Working Memory
What happens when the conversation gets too long
The Accumulation Problem
Every turn in the agent loop — your message, the model’s reasoning, tool calls, tool results — gets appended to the conversation. A complex task might involve 20–50 tool calls, each adding hundreds or thousands of tokens. A single file read can add 3,000+ tokens. The context window fills up fast.
Automatic Compaction
When the conversation approaches the token limit, the system automatically summarizes older messages. Key decisions, file paths, and error messages are preserved. Exact tool outputs and intermediate reasoning are compressed. This happens transparently — the agent doesn’t know its earlier memories have been summarized.
What Gets Lost
Exact code from early file reads — replaced with summaries
Intermediate reasoning — only conclusions preserved
Failed attempts — compressed to “tried X, didn’t work”
Verbose tool outputs — trimmed to key information

This is why agents sometimes “forget” decisions made earlier in a long session or re-read files they already read.
Context drift: Nearly 65% of enterprise AI agent failures are caused by context drift — not raw token exhaustion, but subtle misalignments that compound across multi-step workflows. The agent’s understanding gradually diverges from reality as summaries lose nuance. For complex tasks, shorter focused sessions beat marathon sessions.
error
Error Recovery: When Things Go Wrong
The fix-the-fix loop and how to escape it
The Cascading Error Problem
The most common agent failure mode: the agent makes an edit, it introduces a bug, the agent tries to fix the bug, the fix introduces a new bug, and the cycle continues. Each “fix” moves further from the original working state. After 3–4 iterations, the code is often worse than where it started.
Why Agents Get Stuck
Sunk cost reasoning — the agent tries to salvage its approach instead of reverting
Lost context — the original working code was compacted away
Narrow focus — fixing the symptom without understanding the root cause
No test feedback — the agent doesn’t know if the fix actually works
Checkpoint & Revert
Modern agents implement a checkpoint system that snapshots file contents before each edit. When the agent spirals, you can revert to any previous checkpoint instantly — no git required. This is the single most important safety feature: the ability to undo everything the agent did and try a different approach.
When to Intervene
// Signs the agent is in a fix-the-fix loop: 1. Same file edited 3+ times in a row 2. Error count is increasing, not decreasing 3. Agent is undoing its own recent changes 4. New errors appear in files the agent didn’t touch 5. Agent says “let me try a different approach” for the second time Action: Stop. Revert to checkpoint. Restate the problem with more context or a different strategy.
Key insight: The best developers using AI agents aren’t the ones who let the agent run longest. They’re the ones who intervene earliest when they spot a spiral. Knowing when to stop and revert is a core skill.
security
Permissions & Safety: The Trust Boundary
What agents can do without asking — and what requires your approval
The Permission Model
Agent tools are categorized by risk level:

Read-only (auto-approved) — reading files, searching code, listing directories
Write (may require approval) — editing files, creating files
Execute (usually requires approval) — running shell commands, installing packages
Network (requires approval) — web requests, API calls
Destructive (always requires approval) — deleting files, force-pushing git
Sandboxing
Commands run inside a sandbox that restricts filesystem access, network access, and system calls. The agent can write to your workspace but not to system directories. Network access is limited to known package registries and version control providers. Broader access requires explicit permission.
The Trust Spectrum
// Trust levels in practice: YOLO Mode Auto-approve everything. Maximum speed. Risk: agent runs rm -rf or pushes to main. Balanced Mode Auto-approve reads + writes. Prompt for shell commands + network. Best for most development work. Cautious Mode Approve every action individually. Slowest, but safest for production code. Good for unfamiliar codebases.
Practical guidance: Start cautious with a new agent or unfamiliar codebase. As you build trust and understand the agent’s patterns, gradually increase autonomy. Most experienced developers settle on balanced mode — auto-approve file operations, review shell commands.
account_tree
Planning Mode: Think Before You Code
Why the best agent sessions start with a plan
Plan Then Execute
The most effective pattern for complex tasks: ask the agent to plan first, code second. In planning mode, the agent reads the codebase, identifies affected files, outlines the approach, and presents it for your review. Only after you approve does it start making changes. This prevents the agent from charging down the wrong path.
What a Good Plan Looks Like
// Agent plan for “add user authentication”: 1. Create auth/middleware.ts → JWT verification, session handling 2. Update routes/api.ts → Wrap protected routes with auth middleware 3. Create auth/login.ts → Login endpoint, password hashing 4. Update db/schema.ts → Add users table with email, hash, role 5. Add tests in tests/auth.test.ts → Login, protected routes, token expiry // Dependencies: bcrypt, jsonwebtoken // Estimated: 5 files, ~200 lines
Why Planning Works
Catches wrong assumptions early — you can correct the approach before any code is written
Reduces fix-the-fix loops — the agent has a coherent strategy instead of improvising
Creates documentation — the plan itself serves as a record of what was done and why
Enables partial execution — you can approve steps 1–3 and handle step 4 yourself
When to Skip Planning
Not every task needs a plan. Skip it for:
• Single-file changes with clear scope
• Bug fixes where you already know the cause
• Boilerplate generation (tests, types, configs)
• Tasks you’ve done before and trust the agent with
Pro tip: Save good plans. When the agent produces a plan you approve, that plan becomes reusable documentation. Next time you need similar work, paste the plan structure and the agent will follow the same pattern.
trending_up
The Agent Gets Better: Self-Improvement Loops
How agents learn from your corrections across sessions
The Stateless Problem
Agents are stateless between sessions. Every new conversation starts from zero. The agent doesn’t remember that you prefer tabs over spaces, that your project uses a specific testing framework, or that you corrected it three times yesterday about import paths. This is why the same mistakes repeat.
Rules Files as Persistent Memory
The solution: rules files (AGENTS.md, .cursorrules, CLAUDE.md) act as persistent memory injected at session start. When you notice the agent making the same mistake repeatedly, add a rule. These files accumulate your project’s conventions, preferences, and gotchas — becoming a living style guide the agent follows.
Automated Self-Reflection
Emerging tools analyze past session transcripts to identify moments of friction — where you said “no,” “not that,” or had to repeat instructions. They automatically propose additions to your rules files. In practice, this has reduced correction rates from 0.45 to 0.07 per session over daily iterations.
The Compound Effect
Over weeks, a well-maintained rules file transforms the agent experience. The agent knows your naming conventions, testing patterns, error handling style, and architectural preferences. It stops making the mistakes you’ve corrected. The investment in rules pays compound returns on every future session.
Key insight: The agent loop isn’t just Reason → Act → Observe within a session. There’s a meta-loop across sessions: Use → Correct → Encode Rule → Better Next Time. The developers who get the most from AI agents are the ones who invest in this outer loop.