Ch 4 — Architectural Constraints & Linting

Enforcing code structure with deterministic and LLM-based verification
High Level
layers
Layers
arrow_forward
rule
Lint
arrow_forward
smart_toy
LLM Audit
arrow_forward
science
Test
arrow_forward
commit
Hook
arrow_forward
check_circle
Pass
-
Click play or press Space to begin...
Step- / 8
layers
Dependency Layering
Enforcing which modules can import from which
The Pattern
Dependency layering defines a strict hierarchy of code layers: Types → Config → Repository → Service → UI. Each layer can only import from layers below it. The UI layer can import from Services, but Services cannot import from UI. This prevents circular dependencies and maintains separation of concerns.
Why Agents Need It
Without explicit layering rules, agents frequently create cross-layer imports. A service that imports a React component. A utility that imports from the database layer. These violations compile fine but create architectural debt that compounds over time. Agents don’t intuitively understand your architecture — they need explicit boundaries.
Key insight: Dependency layering is the most impactful architectural constraint. It prevents the most common class of agent-introduced architectural violations and can be enforced with simple import-path linting rules.
rule
Deterministic Linters
Rules that always produce the same result
What They Enforce
Deterministic linters (ESLint, Pylint, Clippy, etc.) enforce rules that are binary: pass or fail. No ambiguity. They catch import violations, naming conventions, forbidden patterns, and style rules. They run in CI and block merges when violated. The agent gets immediate, unambiguous feedback.
Agent-Specific Rules
// Custom ESLint rules for agents no-cross-layer-import: error // Services can't import from UI no-direct-db-access: error // Must use repository layer require-error-handling: warn // All async calls need try/catch max-function-length: 30 // Agents tend to write long functions
Why Deterministic First
Deterministic linters are the cheapest and most reliable enforcement mechanism. Zero cost per run, instant feedback, no false positives when rules are well-written. They should be the first line of defense. Only use LLM-based auditing for rules that can’t be expressed as deterministic checks.
Key insight: Every constraint in your CLAUDE.md that can be expressed as a linter rule should be. Linters are cheaper, faster, and more reliable than hoping the agent reads and follows the constraint document.
smart_toy
LLM-Based Auditors
Using AI to check what linters can’t
When to Use
Some rules can’t be expressed as deterministic checks: “Error messages must be user-friendly”, “Variable names must be descriptive”, “Comments must explain why, not what.” These require judgment. LLM-based auditors use a second model to review the first model’s output against these subjective criteria.
The Tradeoff
LLM auditors are expensive (another inference call per review), non-deterministic (may give different results on the same code), and slower than linters. Use them sparingly for high-value checks that can’t be automated any other way. Reserve them for PR-level review, not per-file checking.
Key insight: The ideal enforcement stack is layered: deterministic linters catch 80% of issues instantly and free. LLM auditors catch the remaining 20% that require judgment. Human review catches what both miss.
bug_report
“Vibecoded Lints”
Linting rules designed specifically for agent mistakes
The Concept
“Vibecoded lints” are custom linting rules designed to catch mistakes that AI agents make but human developers rarely do. Agents have characteristic failure patterns: overly generic variable names, missing edge case handling, importing from the wrong layer, creating duplicate utility functions. These patterns are predictable and lintable.
Examples
No generic names: Ban variables named data, result, temp, value without qualification.

No duplicate utilities: Flag new utility functions that overlap with existing ones.

Required error context: Error messages must include the operation that failed and the input that caused it.

No orphan files: New files must be imported somewhere within 24 hours or flagged for review.
Critical in AI: Vibecoded lints are born from observing agent failures. Keep a log of every agent mistake that makes it past review. When you see a pattern, write a lint rule. Your lint suite becomes a codified history of agent failure modes.
science
Structural Tests
Tests that verify architecture, not behavior
What They Test
Structural tests verify architectural invariants: dependency direction, module boundaries, naming conventions, file organization. They don’t test that code works correctly — they test that code is organized correctly. They run in CI alongside unit tests and block merges when architectural rules are violated.
Example
// Structural test: dependency direction test("services don't import from UI", () => { const serviceFiles = glob("src/services/**"); for (const file of serviceFiles) { const imports = getImports(file); const uiImports = imports.filter( i => i.startsWith("src/ui") ); expect(uiImports).toHaveLength(0); } });
Why They Matter
Structural tests catch violations that unit tests miss. A service that imports from the UI layer will pass all its unit tests — the code works. But the architectural violation creates coupling that makes the codebase harder to maintain. Structural tests enforce the rules that keep the codebase healthy long-term.
Key insight: Structural tests are the bridge between constraint documents and CI. They turn written rules into automated checks. Every architectural rule in your CLAUDE.md should have a corresponding structural test.
commit
Pre-Commit Hooks
Catching violations before they enter the repository
The Pattern
Pre-commit hooks run linters, formatters, and quick structural checks before code is committed. For agent-generated code, they provide immediate feedback — the agent sees the violation and can fix it in the same session, rather than discovering it in CI minutes later.
What to Include
Fast checks only: Formatting, import sorting, basic linting. Keep hooks under 10 seconds.

Auto-fix when possible: Formatting and import sorting should auto-fix, not just report.

Skip slow checks: Full test suites and LLM audits belong in CI, not pre-commit.

Agent-aware: Some hooks can detect agent-generated code patterns and apply stricter rules.
Rule of thumb: Pre-commit hooks should be fast enough that the agent doesn’t notice them. If hooks take more than 10 seconds, the agent may timeout or retry, creating confusion. Keep them lean.
stacked_bar_chart
The Enforcement Stack
Layering all verification mechanisms
The Full Stack
// Enforcement stack (fastest to slowest) Layer 1: Pre-commit (<10s) Formatting, import sorting Basic lint rules Auto-fix what's possible Layer 2: CI Linting (<2min) Full lint suite Structural tests Type checking Layer 3: CI Tests (<10min) Unit tests Integration tests Coverage checks Layer 4: LLM Audit (<5min) Subjective quality checks Architecture review Documentation review Layer 5: Human Review (async) Final approval Strategic decisions Edge cases
The Principle
Each layer catches different types of issues. Earlier layers are cheaper and faster but less nuanced. Later layers are more expensive but catch subtler problems. The goal is to catch as much as possible in the early, cheap layers so the expensive layers only handle what they must.
Key insight: A well-designed enforcement stack means 90% of agent mistakes never reach human review. The human reviewer sees only the subtle, judgment-requiring issues that automated checks can’t handle.
trending_up
Building Your Enforcement Stack
Where to start and how to grow
Getting Started
Week 1: Add dependency layering rules to your existing linter. Define which directories can import from which.

Week 2: Add 3–5 vibecoded lint rules based on your most common agent mistakes.

Month 1: Add structural tests for your top architectural invariants.

Month 2: Add LLM-based auditing for subjective quality checks on PRs.
Measuring Effectiveness
Track the violation rate per layer. If pre-commit catches 60% of violations, CI catches 30%, and only 10% reach human review, your stack is working well. If 50% still reach human review, your earlier layers need more rules. The goal is to shift violations left — catch them earlier, cheaper, faster.
Key insight: Your enforcement stack is never done. Every agent mistake that reaches production is a missing rule somewhere in the stack. Add the rule, and that class of mistake never reaches production again. Over time, the stack becomes comprehensive.