Ch 11 — AI-Assisted Testing & Debugging

Generate tests, fix failures, and debug systematically — with AI in the loop
High Level
science
Generate
arrow_forward
play_arrow
Run
arrow_forward
bug_report
Diagnose
arrow_forward
build
Fix
arrow_forward
replay
Re-run
arrow_forward
verified
Green
-
Click play or press Space to begin...
Step- / 8
science
AI Test Generation: The New Baseline
From zero tests to meaningful coverage in minutes
What AI Can Generate
AI excels at generating unit tests for pure functions, data transformations, and API endpoints. It reads the function signature, infers expected behavior, and produces tests covering the happy path, edge cases, and error conditions. For a typical utility function, AI generates 5–10 test cases in seconds — work that would take a developer 15–30 minutes.
Scope of Generation
Unit tests — individual functions and methods
Integration tests — API endpoints with mocked dependencies
Snapshot tests — component rendering output
Edge case tests — null inputs, empty arrays, boundary values
Error path tests — invalid inputs, network failures, timeouts
The Prompt That Works
// Effective test generation prompt: “Write tests for @src/utils/validate.ts. Use vitest. Follow the patterns in @tests/utils/format.test.ts. Cover: valid inputs, empty strings, null/undefined, boundary values, and error messages. Run the tests after writing them.” // Key elements: 1. Specific file to test 2. Test framework specified 3. Pattern to follow (exemplar) 4. Cases to cover (explicit) 5. Verification step (run them)
Key insight: AI-generated tests are a starting point, not a finished product. They cover the obvious cases well but often miss business logic edge cases, race conditions, and security-relevant scenarios. Always review and supplement.
replay
The Code-Test-Fix Loop
The autonomous cycle that reduces debugging time by 6.5x
How It Works
The most powerful AI testing pattern: the agent writes code, runs the tests automatically, sees the failures, fixes the code, and re-runs — all in a single interaction. This tight loop means the agent gets immediate feedback on whether its code works, without waiting for you to manually run anything.
The Loop in Practice
1. Agent writes implementation code 2. Agent runs: npm test -- --watch 3. Tests fail (expected for new code) 4. Agent reads failure output 5. Agent fixes the implementation 6. Tests re-run automatically 7. Repeat until all green // Typical: 2–4 iterations to green // Complex: 5–8 iterations // If >8: agent is likely stuck
Escalation Tiers
When the loop stalls, structured systems escalate:

Tier 1: Agent retries with a different approach
Tier 2: An independent “arbiter” model reviews the code for logic errors the tests don’t catch
Tier 3: The system flags the issue for human review

This prevents infinite fix-the-fix loops while maximizing autonomous resolution.
The 6.5x multiplier: Properly structured code-test-fix loops reduce manual debugging time by 6.5x. The key is having tests before the agent writes code (TDD) or immediately after. Tests turn vague “something’s wrong” into precise “line 42 returns null instead of [].”
bug_report
AI-Assisted Debugging: Paste the Error
The fastest debugging technique is the simplest one
The Paste-and-Explain Pattern
The single most effective debugging technique with AI: paste the full error message and stack trace. The model reads the error, traces it to the source file and line, identifies the root cause, and suggests a fix. This works because error messages are highly structured — they contain exactly the information the model needs.
What to Include
Full error message — not just the first line
Stack trace — shows the call chain
What you expected — vs. what actually happened
Steps to reproduce — what triggers the bug
What you already tried — prevents duplicate suggestions
Where AI Debugging Excels
GREAT at: Type errors and null reference exceptions Import/module resolution failures Configuration and environment issues Syntax errors and typos Known library error patterns Stack trace interpretation STRUGGLES with: Race conditions and timing bugs State management inconsistencies Performance regressions (no profiler) Visual/layout bugs (can’t see the UI) Intermittent / flaky failures Business logic errors (doesn’t know intent)
Pro tip: For visual bugs, describe what you see vs. what you expect. For intermittent bugs, provide the conditions under which it fails. The more context you give, the better the diagnosis. AI debugging is a conversation, not a magic wand.
psychology
TDD with AI: Tests First, Code Second
The most disciplined and most effective AI coding workflow
The TDD + AI Workflow
Step 1: You write the test (or describe it to the AI). The test defines what the code should do.
Step 2: The AI writes the implementation to make the test pass.
Step 3: The AI runs the test. If it fails, the AI fixes the implementation.
Step 4: You review the passing code for quality and correctness.
Why TDD + AI Is Powerful
Tests are the spec — the AI knows exactly what “correct” means
Immediate feedback — the code-test-fix loop runs automatically
No hallucination — if the test passes, the behavior is verified
Built-in safety net — future changes are protected by the tests
Faster review — you review tests (intent) + passing code (implementation)
The Coverage Target
Aim for ~70% test coverage as a practical safety net. 100% coverage is diminishing returns — the last 30% tests trivial getters, configuration, and framework boilerplate. Focus coverage on: business logic, data transformations, authentication/authorization, error handling, and API contracts.
Key insight: TDD with AI inverts the traditional workflow. Instead of writing code and hoping it works, you define “works” first (the test) and let the AI figure out how. This is the most reliable way to get correct AI-generated code.
warning
What AI-Generated Tests Get Wrong
The blind spots you must check manually
Testing the Implementation, Not the Behavior
AI tests often mirror the implementation too closely. If the function uses a for loop, the test checks for a for loop — instead of checking the output. These tests pass but break on any refactor. Fix: Review tests for behavioral assertions (what it returns) vs. implementation assertions (how it works).
Missing Security Tests
AI rarely generates tests for: SQL injection, XSS, authentication bypass, authorization escalation, rate limiting, or input sanitization. It tests the happy path and obvious errors but not adversarial inputs. Fix: Explicitly request security-focused tests as a separate pass.
Common Blind Spots
AI tests often miss: × Concurrent access / race conditions × Database transaction rollbacks × Timezone and locale edge cases × Large dataset performance × Network timeout handling × Partial failure scenarios × State cleanup between tests × Environment-specific behavior AI tests usually cover well: Input validation (types, ranges) Null / undefined / empty handling Return value shape and types Basic error throwing Happy path scenarios
The danger: AI-generated tests create a false sense of security. “All tests pass” doesn’t mean “all behavior is correct.” The tests only verify what they test. If the critical edge case isn’t tested, it isn’t protected.
layers
Multi-Layer Verification
Tests alone aren’t enough — build a verification stack
The Verification Stack
Layer 1: Type System TypeScript strict mode catches type errors at compile time. Free and instant. Layer 2: Linting ESLint catches patterns known to cause bugs. Security-focused rules catch common vulns. Layer 3: Unit Tests AI-generated + human-supplemented. Cover business logic and edge cases. Layer 4: Integration Tests Test API contracts and data flow. Catch issues between components. Layer 5: AI Review Run code through a different AI model for a “second opinion” on logic and security. Layer 6: Human Review Final check for intent, architecture, and business logic correctness.
Multi-Model Review
A powerful emerging pattern: generate code with one model, review with another. Different models have different blind spots. Code that passes Model A’s review might be flagged by Model B. This catches subtle issues that a single model consistently misses.
The Cost-Benefit
Each layer catches different classes of bugs. Types catch ~30% of issues for free. Linting catches another ~15%. Tests catch ~35%. AI review catches ~10%. Human review catches the final ~10% — but that last 10% includes the most dangerous bugs (security, business logic, architectural violations).
The principle: No single verification layer is sufficient. The combination of automated checks and human judgment catches what each alone would miss. Build the stack once, run it on every change.
speed
Debugging Workflows: Systematic Approaches
When paste-the-error isn’t enough
The Bisect Pattern
When you don’t know which change introduced a bug: ask the AI to help you git bisect. The agent checks out commits, runs the test, and binary-searches for the commit that broke things. What would take 30 minutes of manual checkout-and-test takes 2 minutes with an agent.
The Rubber Duck Pattern
Explain the bug to the AI as if it knows nothing. Describe the expected behavior, the actual behavior, and your mental model of how the code works. Often, the act of explaining reveals the bug — and if it doesn’t, the AI’s questions will probe the assumptions you haven’t questioned.
The Isolation Pattern
Ask the AI to create a minimal reproduction: the smallest possible code that exhibits the bug. Strip away everything unrelated. This isolates the root cause from the noise of the full application. AI is excellent at this because it can rapidly generate simplified versions of complex code.
The Log Injection Pattern
Ask the AI to add strategic console.log / debug statements at key decision points in the code. Run the code, paste the log output back to the AI. The agent reads the execution trace and identifies where the actual behavior diverges from the expected behavior.
When to stop: If you’ve spent 15 minutes debugging with AI and aren’t making progress, switch strategies. Try a different model, ask a colleague, or use traditional debugging tools (breakpoints, profilers). AI debugging is fast for common issues but can waste time on unusual ones.
checklist
The Testing & Debugging Playbook
A practical reference for every AI coding session
For New Code
1. Write tests first (TDD) or immediately after 2. Ask AI to generate tests with explicit cases 3. Supplement with security-focused tests 4. Run the code-test-fix loop 5. Review tests for behavioral assertions 6. Target ~70% coverage on business logic
For Bug Fixes
1. Paste the full error + stack trace 2. Describe expected vs. actual behavior 3. Include what you already tried 4. Ask AI to write a failing test first 5. Fix the code to make the test pass 6. The test prevents future regression
For Existing Code Without Tests
1. Ask AI to generate tests for current behavior 2. Run them — they should all pass 3. If any fail, the test is wrong (not the code) 4. Fix the tests until they match reality 5. Now you have a safety net for refactoring 6. Add edge case tests manually
Key insight: AI transforms testing from a chore into a conversation. The barrier to writing tests drops dramatically. But the responsibility for test quality — ensuring the right things are tested — remains firmly with you. AI writes the tests; you decide what to test.