Ch 4: Architectural Constraints & Linting

Ch 4 — Architectural Constraints & Linting

Enforcing code structure with deterministic and LLM-based verification

Index

High Level

layers

Layers

arrow_forward

rule

Lint

arrow_forward

smart_toy

LLM Audit

arrow_forward

science

Test

arrow_forward

commit

Hook

arrow_forward

check_circle

Pass

Click play or press Space to begin...

Step- / 8

layers

Dependency Layering

Enforcing which modules can import from which

The Pattern

Dependency layering defines a strict hierarchy of code layers: Types → Config → Repository → Service → UI. Each layer can only import from layers below it. The UI layer can import from Services, but Services cannot import from UI. This prevents circular dependencies and maintains separation of concerns.

Why Agents Need It

Without explicit layering rules, agents frequently create cross-layer imports. A service that imports a React component. A utility that imports from the database layer. These violations compile fine but create architectural debt that compounds over time. Agents don’t intuitively understand your architecture — they need explicit boundaries.

Key insight: Dependency layering is the most impactful architectural constraint. It prevents the most common class of agent-introduced architectural violations and can be enforced with simple import-path linting rules.

rule

Deterministic Linters

Rules that always produce the same result

What They Enforce

Deterministic linters (ESLint, Pylint, Clippy, etc.) enforce rules that are binary: pass or fail. No ambiguity. They catch import violations, naming conventions, forbidden patterns, and style rules. They run in CI and block merges when violated. The agent gets immediate, unambiguous feedback.

Agent-Specific Rules

// Custom ESLint rules for agents no-cross-layer-import: error // Services can't import from UI no-direct-db-access: error // Must use repository layer require-error-handling: warn // All async calls need try/catch max-function-length: 30 // Agents tend to write long functions

Why Deterministic First

Deterministic linters are the cheapest and most reliable enforcement mechanism. Zero cost per run, instant feedback, no false positives when rules are well-written. They should be the first line of defense. Only use LLM-based auditing for rules that can’t be expressed as deterministic checks.

Key insight: Every constraint in your CLAUDE.md that can be expressed as a linter rule should be. Linters are cheaper, faster, and more reliable than hoping the agent reads and follows the constraint document.

smart_toy

LLM-Based Auditors

Using AI to check what linters can’t

When to Use

Some rules can’t be expressed as deterministic checks: “Error messages must be user-friendly”, “Variable names must be descriptive”, “Comments must explain why, not what.” These require judgment. LLM-based auditors use a second model to review the first model’s output against these subjective criteria.

The Tradeoff

LLM auditors are expensive (another inference call per review), non-deterministic (may give different results on the same code), and slower than linters. Use them sparingly for high-value checks that can’t be automated any other way. Reserve them for PR-level review, not per-file checking.

Key insight: The ideal enforcement stack is layered: deterministic linters catch 80% of issues instantly and free. LLM auditors catch the remaining 20% that require judgment. Human review catches what both miss.

bug_report

“Vibecoded Lints”

Linting rules designed specifically for agent mistakes

The Concept

“Vibecoded lints” are custom linting rules designed to catch mistakes that AI agents make but human developers rarely do. Agents have characteristic failure patterns: overly generic variable names, missing edge case handling, importing from the wrong layer, creating duplicate utility functions. These patterns are predictable and lintable.

Examples

No generic names: Ban variables named data, result, temp, value without qualification.

No duplicate utilities: Flag new utility functions that overlap with existing ones.

Required error context: Error messages must include the operation that failed and the input that caused it.

No orphan files: New files must be imported somewhere within 24 hours or flagged for review.

Critical in AI: Vibecoded lints are born from observing agent failures. Keep a log of every agent mistake that makes it past review. When you see a pattern, write a lint rule. Your lint suite becomes a codified history of agent failure modes.

science

Structural Tests

Tests that verify architecture, not behavior

What They Test

Structural tests verify architectural invariants: dependency direction, module boundaries, naming conventions, file organization. They don’t test that code works correctly — they test that code is organized correctly. They run in CI alongside unit tests and block merges when architectural rules are violated.

Example

// Structural test: dependency direction test("services don't import from UI", () => { const serviceFiles = glob("src/services/**"); for (const file of serviceFiles) { const imports = getImports(file); const uiImports = imports.filter( i => i.startsWith("src/ui") ); expect(uiImports).toHaveLength(0); } });

Why They Matter

Structural tests catch violations that unit tests miss. A service that imports from the UI layer will pass all its unit tests — the code works. But the architectural violation creates coupling that makes the codebase harder to maintain. Structural tests enforce the rules that keep the codebase healthy long-term.

Key insight: Structural tests are the bridge between constraint documents and CI. They turn written rules into automated checks. Every architectural rule in your CLAUDE.md should have a corresponding structural test.

commit

Pre-Commit Hooks

Catching violations before they enter the repository

The Pattern

Pre-commit hooks run linters, formatters, and quick structural checks before code is committed. For agent-generated code, they provide immediate feedback — the agent sees the violation and can fix it in the same session, rather than discovering it in CI minutes later.

What to Include

Fast checks only: Formatting, import sorting, basic linting. Keep hooks under 10 seconds.

Auto-fix when possible: Formatting and import sorting should auto-fix, not just report.

Skip slow checks: Full test suites and LLM audits belong in CI, not pre-commit.

Agent-aware: Some hooks can detect agent-generated code patterns and apply stricter rules.

Rule of thumb: Pre-commit hooks should be fast enough that the agent doesn’t notice them. If hooks take more than 10 seconds, the agent may timeout or retry, creating confusion. Keep them lean.

stacked_bar_chart

The Enforcement Stack

Layering all verification mechanisms

The Full Stack

// Enforcement stack (fastest to slowest) Layer 1: Pre-commit (<10s) Formatting, import sorting Basic lint rules Auto-fix what's possible Layer 2: CI Linting (<2min) Full lint suite Structural tests Type checking Layer 3: CI Tests (<10min) Unit tests Integration tests Coverage checks Layer 4: LLM Audit (<5min) Subjective quality checks Architecture review Documentation review Layer 5: Human Review (async) Final approval Strategic decisions Edge cases

The Principle

Each layer catches different types of issues. Earlier layers are cheaper and faster but less nuanced. Later layers are more expensive but catch subtler problems. The goal is to catch as much as possible in the early, cheap layers so the expensive layers only handle what they must.

Key insight: A well-designed enforcement stack means 90% of agent mistakes never reach human review. The human reviewer sees only the subtle, judgment-requiring issues that automated checks can’t handle.

trending_up

Building Your Enforcement Stack

Where to start and how to grow

Getting Started

Week 1: Add dependency layering rules to your existing linter. Define which directories can import from which.

Week 2: Add 3–5 vibecoded lint rules based on your most common agent mistakes.

Month 1: Add structural tests for your top architectural invariants.

Month 2: Add LLM-based auditing for subjective quality checks on PRs.

Measuring Effectiveness

Track the violation rate per layer. If pre-commit catches 60% of violations, CI catches 30%, and only 10% reach human review, your stack is working well. If 50% still reach human review, your earlier layers need more rules. The goal is to shift violations left — catch them earlier, cheaper, faster.

Key insight: Your enforcement stack is never done. Every agent mistake that reaches production is a missing rule somewhere in the stack. Add the rule, and that class of mistake never reaches production again. Over time, the stack becomes comprehensive.

arrow_back Ch 3: Constraint Documents Ch 5: Review Pipelines arrow_forward