Ch 6: Memory & Entropy Management — Harness Engineering

Ch 6 — Memory & Entropy Management

Episodic memory, documentation consistency, and preventing codebase drift

Index

High Level

memory

Episodic

arrow_forward

auto_stories

Few-Shot

arrow_forward

description

Docs

arrow_forward

Scan

arrow_forward

cleaning_services

Cleanup

arrow_forward

balance

Stable

Click play or press Space to begin...

Step- / 8

memory

Episodic Memory

Saving successful task completions for future reference

The Concept

Episodic memory records successful task completions as reusable examples. When the agent successfully creates an API endpoint, the full sequence (plan, implementation, tests, review) is saved. Next time a similar task arises, this episode is loaded as a few-shot example, dramatically improving first-attempt quality.

What to Save

The task description (what was requested).

The plan (how the agent approached it).

The implementation (key files changed).

Review feedback (what was corrected).

Final result (the approved version).

This gives future agents a complete picture of how to handle similar tasks.

Key insight: Episodic memory is the mechanism by which harnesses learn from experience. Without it, every task starts from zero. With it, the agent builds on past successes.

auto_stories

Few-Shot Learning from Memory

Using past successes as in-context examples

How It Works

When a new task arrives, the harness searches episodic memory for similar past tasks. The most relevant episode is loaded into context as a few-shot example. The agent sees: “Here’s how a similar task was completed successfully. Follow this pattern.” This is more effective than abstract instructions because it shows the agent a concrete, approved example.

Impact

Few-shot examples from episodic memory can improve first-attempt quality by 40–60% compared to instructions alone. The agent doesn’t just know the rules — it sees what a correct implementation looks like in your specific codebase. This is especially powerful for complex tasks with many implicit conventions.

Key insight: Instructions tell the agent what to do. Few-shot examples show the agent what “done right” looks like. Both are needed, but examples are more powerful for complex, convention-heavy tasks.

trending_down

Codebase Entropy

How agent-generated code drifts over time

The Problem

Codebase entropy is the gradual degradation of code quality and consistency over time. With human developers, entropy accumulates slowly. With AI agents generating code at high velocity, entropy can accumulate 10× faster. Slightly different naming conventions, subtly inconsistent patterns, gradually diverging module structures.

How It Manifests

Naming drift: Some modules use getUserById, others use fetchUser, others use loadUserData.

Pattern drift: Error handling is done differently in each module.

Documentation drift: README says one thing, code does another.

Constraint drift: CLAUDE.md rules don’t match actual code patterns.

Critical in AI: Entropy is invisible until it’s severe. Each individual deviation is small. But after 1,000 agent-generated PRs, the cumulative drift can make the codebase feel like it was written by 100 different developers with different conventions.

Constraint Violation Scanners

Automated detection of drift from standards

How They Work

Constraint violation scanners periodically scan the entire codebase for deviations from the constraint document. Unlike linters (which check individual files), scanners check cross-file consistency: Are all API endpoints following the same pattern? Are all error messages formatted consistently? Do all modules use the same dependency injection approach?

Implementation

Scanners can be rule-based (grep for patterns, count inconsistencies) or LLM-based (ask a model to review the codebase for consistency). Rule-based scanners are cheaper and more reliable for known patterns. LLM-based scanners can catch subtle inconsistencies that rules miss.

Key insight: Scanners are the immune system of the codebase. They detect entropy before it becomes severe. Run them weekly on a schedule, not just on new code. Existing code can drift too as conventions evolve.

cleaning_services

Cleanup Agents

Automated entropy reduction

The Pattern

Cleanup agents are scheduled agents that run periodically to reduce entropy. They scan for inconsistencies, generate fix PRs, and submit them for review. Unlike coding agents (which add new code), cleanup agents normalize existing code to match current conventions.

What They Fix

Documentation consistency: Update READMEs and comments to match current code.

Naming normalization: Rename functions and variables to match current conventions.

Pattern enforcement: Refactor code to use the current standard pattern.

Dead code removal: Find and remove unused functions, imports, and files.

Key insight: Cleanup agents are the janitorial staff of the codebase. They’re not glamorous, but without them, entropy wins. Schedule them weekly. Review their PRs with the same rigor as feature PRs.

sync

Documentation Consistency Agents

Keeping docs in sync with code

The Problem

Documentation drifts from code faster than any other artifact. API docs describe endpoints that no longer exist. READMEs reference deprecated patterns. The CLAUDE.md itself may contain rules that no longer match the codebase. Stale documentation is worse than no documentation because it actively misleads the agent.

The Solution

A documentation consistency agent compares documentation against the actual codebase and flags discrepancies. It can run after every significant PR merge to check whether the change invalidated any existing documentation. When it finds a discrepancy, it either auto-fixes or creates a ticket for human review.

Rule of thumb: If your CLAUDE.md hasn’t been updated in a month but your codebase has changed significantly, it’s probably stale. Documentation consistency agents prevent this drift automatically.

error

Error Pattern Tracking

Learning from agent mistakes across sessions

The System

Error pattern tracking logs every mistake the agent makes across all sessions: what the agent did wrong, what the correct action was, and what constraint or rule would have prevented it. Over time, this builds a failure taxonomy — a structured catalog of how agents fail in your specific codebase.

Using the Taxonomy

The failure taxonomy drives harness improvement: frequent failures become new lint rules or constraint document entries. Rare but severe failures become structural tests. Novel failures get investigated to understand whether they represent a new failure mode or a one-off. The taxonomy turns reactive firefighting into proactive prevention.

Key insight: Without error tracking, you fix the same types of mistakes repeatedly. With it, every mistake is an investment in prevention. The harness gets smarter with every failure it records.

balance

Entropy Budget

Managing the rate of codebase change

The Concept

Just as token budgets manage context window usage, an entropy budget manages the rate of codebase change. If agents are merging 100 PRs per week, the codebase is changing faster than humans can review for consistency. Setting limits on the rate of change — and scheduling cleanup proportionally — keeps entropy manageable.

Practical Guidelines

For every 10 feature PRs, schedule 1 cleanup PR. This 10:1 ratio keeps entropy from accumulating.

Weekly consistency scans catch drift before it compounds.

Monthly constraint document reviews ensure rules match reality.

Quarterly architecture audits verify the big picture is still coherent.

Key insight: Entropy management is the long game of harness engineering. Individual tasks succeed or fail based on constraints and review. But the codebase’s long-term health depends on actively managing the cumulative effect of thousands of agent-generated changes.

arrow_back Ch 5: Review Pipelines Ch 7: Orchestration at Scale arrow_forward