Ch 6 — Memory & Entropy Management

Episodic memory, documentation consistency, and preventing codebase drift
High Level
memory
Episodic
arrow_forward
auto_stories
Few-Shot
arrow_forward
description
Docs
arrow_forward
search
Scan
arrow_forward
cleaning_services
Cleanup
arrow_forward
balance
Stable
-
Click play or press Space to begin...
Step- / 8
memory
Episodic Memory
Saving successful task completions for future reference
The Concept
Episodic memory records successful task completions as reusable examples. When the agent successfully creates an API endpoint, the full sequence (plan, implementation, tests, review) is saved. Next time a similar task arises, this episode is loaded as a few-shot example, dramatically improving first-attempt quality.
What to Save
The task description (what was requested).

The plan (how the agent approached it).

The implementation (key files changed).

Review feedback (what was corrected).

Final result (the approved version).

This gives future agents a complete picture of how to handle similar tasks.
Key insight: Episodic memory is the mechanism by which harnesses learn from experience. Without it, every task starts from zero. With it, the agent builds on past successes.
auto_stories
Few-Shot Learning from Memory
Using past successes as in-context examples
How It Works
When a new task arrives, the harness searches episodic memory for similar past tasks. The most relevant episode is loaded into context as a few-shot example. The agent sees: “Here’s how a similar task was completed successfully. Follow this pattern.” This is more effective than abstract instructions because it shows the agent a concrete, approved example.
Impact
Few-shot examples from episodic memory can improve first-attempt quality by 40–60% compared to instructions alone. The agent doesn’t just know the rules — it sees what a correct implementation looks like in your specific codebase. This is especially powerful for complex tasks with many implicit conventions.
Key insight: Instructions tell the agent what to do. Few-shot examples show the agent what “done right” looks like. Both are needed, but examples are more powerful for complex, convention-heavy tasks.
trending_down
Codebase Entropy
How agent-generated code drifts over time
The Problem
Codebase entropy is the gradual degradation of code quality and consistency over time. With human developers, entropy accumulates slowly. With AI agents generating code at high velocity, entropy can accumulate 10× faster. Slightly different naming conventions, subtly inconsistent patterns, gradually diverging module structures.
How It Manifests
Naming drift: Some modules use getUserById, others use fetchUser, others use loadUserData.

Pattern drift: Error handling is done differently in each module.

Documentation drift: README says one thing, code does another.

Constraint drift: CLAUDE.md rules don’t match actual code patterns.
Critical in AI: Entropy is invisible until it’s severe. Each individual deviation is small. But after 1,000 agent-generated PRs, the cumulative drift can make the codebase feel like it was written by 100 different developers with different conventions.
search
Constraint Violation Scanners
Automated detection of drift from standards
How They Work
Constraint violation scanners periodically scan the entire codebase for deviations from the constraint document. Unlike linters (which check individual files), scanners check cross-file consistency: Are all API endpoints following the same pattern? Are all error messages formatted consistently? Do all modules use the same dependency injection approach?
Implementation
Scanners can be rule-based (grep for patterns, count inconsistencies) or LLM-based (ask a model to review the codebase for consistency). Rule-based scanners are cheaper and more reliable for known patterns. LLM-based scanners can catch subtle inconsistencies that rules miss.
Key insight: Scanners are the immune system of the codebase. They detect entropy before it becomes severe. Run them weekly on a schedule, not just on new code. Existing code can drift too as conventions evolve.
cleaning_services
Cleanup Agents
Automated entropy reduction
The Pattern
Cleanup agents are scheduled agents that run periodically to reduce entropy. They scan for inconsistencies, generate fix PRs, and submit them for review. Unlike coding agents (which add new code), cleanup agents normalize existing code to match current conventions.
What They Fix
Documentation consistency: Update READMEs and comments to match current code.

Naming normalization: Rename functions and variables to match current conventions.

Pattern enforcement: Refactor code to use the current standard pattern.

Dead code removal: Find and remove unused functions, imports, and files.
Key insight: Cleanup agents are the janitorial staff of the codebase. They’re not glamorous, but without them, entropy wins. Schedule them weekly. Review their PRs with the same rigor as feature PRs.
sync
Documentation Consistency Agents
Keeping docs in sync with code
The Problem
Documentation drifts from code faster than any other artifact. API docs describe endpoints that no longer exist. READMEs reference deprecated patterns. The CLAUDE.md itself may contain rules that no longer match the codebase. Stale documentation is worse than no documentation because it actively misleads the agent.
The Solution
A documentation consistency agent compares documentation against the actual codebase and flags discrepancies. It can run after every significant PR merge to check whether the change invalidated any existing documentation. When it finds a discrepancy, it either auto-fixes or creates a ticket for human review.
Rule of thumb: If your CLAUDE.md hasn’t been updated in a month but your codebase has changed significantly, it’s probably stale. Documentation consistency agents prevent this drift automatically.
error
Error Pattern Tracking
Learning from agent mistakes across sessions
The System
Error pattern tracking logs every mistake the agent makes across all sessions: what the agent did wrong, what the correct action was, and what constraint or rule would have prevented it. Over time, this builds a failure taxonomy — a structured catalog of how agents fail in your specific codebase.
Using the Taxonomy
The failure taxonomy drives harness improvement: frequent failures become new lint rules or constraint document entries. Rare but severe failures become structural tests. Novel failures get investigated to understand whether they represent a new failure mode or a one-off. The taxonomy turns reactive firefighting into proactive prevention.
Key insight: Without error tracking, you fix the same types of mistakes repeatedly. With it, every mistake is an investment in prevention. The harness gets smarter with every failure it records.
balance
Entropy Budget
Managing the rate of codebase change
The Concept
Just as token budgets manage context window usage, an entropy budget manages the rate of codebase change. If agents are merging 100 PRs per week, the codebase is changing faster than humans can review for consistency. Setting limits on the rate of change — and scheduling cleanup proportionally — keeps entropy manageable.
Practical Guidelines
For every 10 feature PRs, schedule 1 cleanup PR. This 10:1 ratio keeps entropy from accumulating.

Weekly consistency scans catch drift before it compounds.

Monthly constraint document reviews ensure rules match reality.

Quarterly architecture audits verify the big picture is still coherent.
Key insight: Entropy management is the long game of harness engineering. Individual tasks succeed or fail based on constraints and review. But the codebase’s long-term health depends on actively managing the cumulative effect of thousands of agent-generated changes.