Ch 11: AI-Assisted Testing & Debugging

Ch 11 — AI-Assisted Testing & Debugging

Generate tests, fix failures, and debug systematically — with AI in the loop

Index

High Level

science

Generate

arrow_forward

play_arrow

Run

arrow_forward

bug_report

Diagnose

arrow_forward

build

Fix

arrow_forward

replay

Re-run

arrow_forward

verified

Green

Click play or press Space to begin...

Step- / 8

science

AI Test Generation: The New Baseline

From zero tests to meaningful coverage in minutes

What AI Can Generate

AI excels at generating unit tests for pure functions, data transformations, and API endpoints. It reads the function signature, infers expected behavior, and produces tests covering the happy path, edge cases, and error conditions. For a typical utility function, AI generates 5–10 test cases in seconds — work that would take a developer 15–30 minutes.

Scope of Generation

• Unit tests — individual functions and methods
• Integration tests — API endpoints with mocked dependencies
• Snapshot tests — component rendering output
• Edge case tests — null inputs, empty arrays, boundary values
• Error path tests — invalid inputs, network failures, timeouts

The Prompt That Works

// Effective test generation prompt: “Write tests for @src/utils/validate.ts. Use vitest. Follow the patterns in @tests/utils/format.test.ts. Cover: valid inputs, empty strings, null/undefined, boundary values, and error messages. Run the tests after writing them.” // Key elements: 1. Specific file to test 2. Test framework specified 3. Pattern to follow (exemplar) 4. Cases to cover (explicit) 5. Verification step (run them)

Key insight: AI-generated tests are a starting point, not a finished product. They cover the obvious cases well but often miss business logic edge cases, race conditions, and security-relevant scenarios. Always review and supplement.

replay

The Code-Test-Fix Loop

The autonomous cycle that reduces debugging time by 6.5x

How It Works

The most powerful AI testing pattern: the agent writes code, runs the tests automatically, sees the failures, fixes the code, and re-runs — all in a single interaction. This tight loop means the agent gets immediate feedback on whether its code works, without waiting for you to manually run anything.

The Loop in Practice

1. Agent writes implementation code 2. Agent runs: npm test -- --watch 3. Tests fail (expected for new code) 4. Agent reads failure output 5. Agent fixes the implementation 6. Tests re-run automatically 7. Repeat until all green // Typical: 2–4 iterations to green // Complex: 5–8 iterations // If >8: agent is likely stuck

Escalation Tiers

When the loop stalls, structured systems escalate:

Tier 1: Agent retries with a different approach
Tier 2: An independent “arbiter” model reviews the code for logic errors the tests don’t catch
Tier 3: The system flags the issue for human review

This prevents infinite fix-the-fix loops while maximizing autonomous resolution.

The 6.5x multiplier: Properly structured code-test-fix loops reduce manual debugging time by 6.5x. The key is having tests before the agent writes code (TDD) or immediately after. Tests turn vague “something’s wrong” into precise “line 42 returns null instead of [].”

bug_report

AI-Assisted Debugging: Paste the Error

The fastest debugging technique is the simplest one

The Paste-and-Explain Pattern

The single most effective debugging technique with AI: paste the full error message and stack trace. The model reads the error, traces it to the source file and line, identifies the root cause, and suggests a fix. This works because error messages are highly structured — they contain exactly the information the model needs.

What to Include

• Full error message — not just the first line
• Stack trace — shows the call chain
• What you expected — vs. what actually happened
• Steps to reproduce — what triggers the bug
• What you already tried — prevents duplicate suggestions

Where AI Debugging Excels

GREAT at: Type errors and null reference exceptions Import/module resolution failures Configuration and environment issues Syntax errors and typos Known library error patterns Stack trace interpretation STRUGGLES with: Race conditions and timing bugs State management inconsistencies Performance regressions (no profiler) Visual/layout bugs (can’t see the UI) Intermittent / flaky failures Business logic errors (doesn’t know intent)

Pro tip: For visual bugs, describe what you see vs. what you expect. For intermittent bugs, provide the conditions under which it fails. The more context you give, the better the diagnosis. AI debugging is a conversation, not a magic wand.

psychology

TDD with AI: Tests First, Code Second

The most disciplined and most effective AI coding workflow

The TDD + AI Workflow

Step 1: You write the test (or describe it to the AI). The test defines what the code should do.
Step 2: The AI writes the implementation to make the test pass.
Step 3: The AI runs the test. If it fails, the AI fixes the implementation.
Step 4: You review the passing code for quality and correctness.

Why TDD + AI Is Powerful

• Tests are the spec — the AI knows exactly what “correct” means
• Immediate feedback — the code-test-fix loop runs automatically
• No hallucination — if the test passes, the behavior is verified
• Built-in safety net — future changes are protected by the tests
• Faster review — you review tests (intent) + passing code (implementation)

The Coverage Target

Aim for ~70% test coverage as a practical safety net. 100% coverage is diminishing returns — the last 30% tests trivial getters, configuration, and framework boilerplate. Focus coverage on: business logic, data transformations, authentication/authorization, error handling, and API contracts.

Key insight: TDD with AI inverts the traditional workflow. Instead of writing code and hoping it works, you define “works” first (the test) and let the AI figure out how. This is the most reliable way to get correct AI-generated code.

warning

What AI-Generated Tests Get Wrong

The blind spots you must check manually

Testing the Implementation, Not the Behavior

AI tests often mirror the implementation too closely. If the function uses a for loop, the test checks for a for loop — instead of checking the output. These tests pass but break on any refactor. Fix: Review tests for behavioral assertions (what it returns) vs. implementation assertions (how it works).

Missing Security Tests

AI rarely generates tests for: SQL injection, XSS, authentication bypass, authorization escalation, rate limiting, or input sanitization. It tests the happy path and obvious errors but not adversarial inputs. Fix: Explicitly request security-focused tests as a separate pass.

Common Blind Spots

AI tests often miss: × Concurrent access / race conditions × Database transaction rollbacks × Timezone and locale edge cases × Large dataset performance × Network timeout handling × Partial failure scenarios × State cleanup between tests × Environment-specific behavior AI tests usually cover well: ✓ Input validation (types, ranges) ✓ Null / undefined / empty handling ✓ Return value shape and types ✓ Basic error throwing ✓ Happy path scenarios

The danger: AI-generated tests create a false sense of security. “All tests pass” doesn’t mean “all behavior is correct.” The tests only verify what they test. If the critical edge case isn’t tested, it isn’t protected.

layers

Multi-Layer Verification

Tests alone aren’t enough — build a verification stack

The Verification Stack

Layer 1: Type System TypeScript strict mode catches type errors at compile time. Free and instant. Layer 2: Linting ESLint catches patterns known to cause bugs. Security-focused rules catch common vulns. Layer 3: Unit Tests AI-generated + human-supplemented. Cover business logic and edge cases. Layer 4: Integration Tests Test API contracts and data flow. Catch issues between components. Layer 5: AI Review Run code through a different AI model for a “second opinion” on logic and security. Layer 6: Human Review Final check for intent, architecture, and business logic correctness.

Multi-Model Review

A powerful emerging pattern: generate code with one model, review with another. Different models have different blind spots. Code that passes Model A’s review might be flagged by Model B. This catches subtle issues that a single model consistently misses.

The Cost-Benefit

Each layer catches different classes of bugs. Types catch ~30% of issues for free. Linting catches another ~15%. Tests catch ~35%. AI review catches ~10%. Human review catches the final ~10% — but that last 10% includes the most dangerous bugs (security, business logic, architectural violations).

The principle: No single verification layer is sufficient. The combination of automated checks and human judgment catches what each alone would miss. Build the stack once, run it on every change.

speed

Debugging Workflows: Systematic Approaches

When paste-the-error isn’t enough

The Bisect Pattern

When you don’t know which change introduced a bug: ask the AI to help you git bisect. The agent checks out commits, runs the test, and binary-searches for the commit that broke things. What would take 30 minutes of manual checkout-and-test takes 2 minutes with an agent.

The Rubber Duck Pattern

Explain the bug to the AI as if it knows nothing. Describe the expected behavior, the actual behavior, and your mental model of how the code works. Often, the act of explaining reveals the bug — and if it doesn’t, the AI’s questions will probe the assumptions you haven’t questioned.

The Isolation Pattern

Ask the AI to create a minimal reproduction: the smallest possible code that exhibits the bug. Strip away everything unrelated. This isolates the root cause from the noise of the full application. AI is excellent at this because it can rapidly generate simplified versions of complex code.

The Log Injection Pattern

Ask the AI to add strategic console.log / debug statements at key decision points in the code. Run the code, paste the log output back to the AI. The agent reads the execution trace and identifies where the actual behavior diverges from the expected behavior.

When to stop: If you’ve spent 15 minutes debugging with AI and aren’t making progress, switch strategies. Try a different model, ask a colleague, or use traditional debugging tools (breakpoints, profilers). AI debugging is fast for common issues but can waste time on unusual ones.

checklist

The Testing & Debugging Playbook

A practical reference for every AI coding session

For New Code

1. Write tests first (TDD) or immediately after 2. Ask AI to generate tests with explicit cases 3. Supplement with security-focused tests 4. Run the code-test-fix loop 5. Review tests for behavioral assertions 6. Target ~70% coverage on business logic

For Bug Fixes

1. Paste the full error + stack trace 2. Describe expected vs. actual behavior 3. Include what you already tried 4. Ask AI to write a failing test first 5. Fix the code to make the test pass 6. The test prevents future regression

For Existing Code Without Tests

1. Ask AI to generate tests for current behavior 2. Run them — they should all pass 3. If any fail, the test is wrong (not the code) 4. Fix the tests until they match reality 5. Now you have a safety net for refactoring 6. Add edge case tests manually

Key insight: AI transforms testing from a chore into a conversation. The barrier to writing tests drops dramatically. But the responsibility for test quality — ensuring the right things are tested — remains firmly with you. AI writes the tests; you decide what to test.

arrow_back Ch 10: Multi-File Refactoring Ch 12: Security & Quality Risks arrow_forward