Ch 9 — Prompting for Code

Code generation, debugging, refactoring, and test writing — why “write a function” fails and what to do instead
Real-World
psychology
Code ≠ Prose
arrow_forward
warning
Vague Prompt
arrow_forward
code
Spec Prompt
arrow_forward
bug_report
Debug Right
arrow_forward
auto_fix_high
Refactoring
arrow_forward
science
Test Writing
arrow_forward
inventory_2
Context Pack
arrow_forward
checklist
Checklist
-
Click play or press Space to begin...
Step- / 8
psychology
Code Prompting Is Different
Precision matters more, ambiguity costs more, and context is everything
Why Code Is Harder to Prompt For
When you prompt for prose, a “pretty good” answer is usually fine. When you prompt for code, a “pretty good” answer might have a subtle bug that takes hours to find. Code has zero tolerance for ambiguity:

1. Syntax must be perfect: One missing bracket = broken code
2. Edge cases matter: Empty input, null values, unicode, concurrency
3. Context is invisible: The model can’t see your codebase, your database schema, your framework version, or your coding conventions
4. “Works” ≠ “Correct”: Code can run without errors but produce wrong results
The Code Prompting Mindset
Think of the model as a very fast junior developer who:

• Knows every language and framework (from training data)
• Has never seen your codebase
• Will do exactly what you ask (not what you mean)
• Won’t ask clarifying questions
• Defaults to the most common pattern, not necessarily the best one

Your job is to write prompts that a smart-but-context-blind developer could follow perfectly.
Key insight: The #1 mistake in code prompting is under-specifying. “Write a function to validate emails” has 50 possible interpretations. “Write a Python function validate_email(email: str) -> bool that checks RFC 5322 format and rejects disposable domains” has one. Be the spec, not the wish.
warning
The Vague Code Prompt
“Write a function to validate emails” — what you get vs what you need
Superficial Prompt
Prompt: Write a function to validate email addresses.
What You Get
import re def validate_email(email): pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$' return bool(re.match(pattern, email))
What’s Wrong
1. No type hints: email could be anything
2. Naive regex: Rejects valid emails like user+tag@gmail.com, name@sub.domain.co.uk
3. No length check: RFC 5321 limits emails to 254 chars
4. No unicode handling: Fails on internationalized domains
5. No edge cases: What about None, empty string, whitespace?
6. No tests: How do you know it works?
7. No docstring: What does “valid” mean in this context?
The Model’s Perspective
The model gave you exactly what you asked for: a function that validates emails. It chose the most common pattern from its training data (a simple regex). It had no way to know you needed:

• Production-quality validation
• Specific edge case handling
• Type safety
• Tests
• A particular validation standard

The model isn’t being lazy — your prompt was ambiguous, so it made 7 assumptions, and most of them were wrong for your use case.
The pattern: Vague code prompts produce code that “looks right” but fails in production. The model optimizes for the most common interpretation, not your specific requirements. Every unspecified detail is a coin flip.
code
The Spec Prompt: Production-Quality Code Generation
Treat your prompt like a function specification — signature, behavior, edge cases, tests
Deliberate Prompt
Write a Python function with this spec: Function: validate_email(email: str) -> bool Python: 3.11+ Dependencies: standard library only Requirements: 1. Check RFC 5322 format (use email.utils or a robust regex, not a naive one) 2. Reject emails longer than 254 characters 3. Reject disposable email domains: mailinator.com, guerrillamail.com, tempmail.com, throwaway.email 4. Handle edge cases: None input, empty string, whitespace-only, missing @ 5. Strip leading/trailing whitespace before validation Return: True if valid, False otherwise Include: Google-style docstring Include: 8 test cases covering: - Valid standard email - Valid email with + tag - Valid subdomain email - Disposable domain (should fail) - Too long (should fail) - None input (should fail) - Empty string (should fail) - Missing @ (should fail)
What You Get Now
The model produces a function with:

Type hints on input and return
Robust validation using email.utils.parseaddr or a proper regex
Length check before regex (fast fail)
Disposable domain blocklist as a set for O(1) lookup
Whitespace stripping before validation
None/empty handling with early returns
Google-style docstring explaining behavior
8 test cases using assert or pytest
Vague Prompt
5-line regex function. No types, no edge cases, no tests. Rejects valid emails, accepts invalid ones. Needs rewriting.
Spec Prompt
30+ line production function. Type hints, docstring, edge case handling, disposable domain check, 8 test cases. Ready for code review.
Key insight: The spec prompt template for code: function signature + language/version + dependencies + requirements (numbered) + edge cases + return behavior + tests. This is the same information you’d put in a Jira ticket for a developer. If you wouldn’t assign a task without a spec, don’t prompt without one.
bug_report
Debugging: Context Is Everything
Pasting just the error message vs providing the full debugging context
Superficial Debug Prompt
Prompt: “I’m getting a 500 error on my Flask API. What’s wrong?”

Output: Generic checklist of 8 possible causes. None specific to your code. You’re no closer to a fix.
Deliberate Debug Prompt
Debug this Flask API error. Diagnose the root cause, then provide the fix. Endpoint: POST /api/orders Framework: Flask 3.0 + SQLAlchemy 2.0 Database: PostgreSQL 16 Error traceback: sqlalchemy.exc.IntegrityError: (psycopg2.errors.ForeignKeyViolation) insert or update on table "order_items" violates foreign key constraint "order_items_product_id_fkey" DETAIL: Key (product_id)=(847) is not present in table "products". Route handler: @app.route('/api/orders', methods=['POST']) def create_order(): data = request.get_json() order = Order(customer_id=data['cid']) db.session.add(order) db.session.flush() for item in data['items']: oi = OrderItem( order_id=order.id, product_id=item['product_id'], quantity=item['qty']) db.session.add(oi) db.session.commit() return jsonify(order.to_dict()), 201 Request payload that triggers the error: {"cid": 12, "items": [ {"product_id": 847, "qty": 2} ]}
Model Output (With Context)
Root cause: product_id 847 doesn't exist in the products table. The route handler doesn't validate that product IDs exist before inserting order items. The fix (validate before insert): @app.route('/api/orders', methods=['POST']) def create_order(): data = request.get_json() # Validate all product IDs exist product_ids = [i['product_id'] for i in data['items']] existing = db.session.query( Product.id ).filter( Product.id.in_(product_ids) ).all() existing_ids = {p.id for p in existing} missing = set(product_ids) - existing_ids if missing: return jsonify({ "error": "Invalid product IDs", "missing": list(missing) }), 400 # ... rest of the handler
Key insight: The debug prompt formula: error message + relevant code + request/input that triggers it + framework/DB versions. The model can’t see your codebase — you need to bring the relevant context to it. More context = more precise diagnosis.
auto_fix_high
Refactoring: Tell the Model What to Improve
Don’t say “refactor this” — specify what’s wrong and what “better” looks like
Vague Refactor
Prompt: “Refactor this code to be better.”

Model renames variables, adds comments, maybe extracts a function. Surface-level changes that don’t address real issues.
Specific Refactor
Prompt: “Refactor this function. Specific issues: (1) It has 3 nested for-loops making it O(n³), (2) It mutates the input list, (3) Variable names are single letters. Target: O(n log n) using a dict for lookups, no mutation, descriptive names.”
The Refactoring Prompt Template
Refactor this [language] code. Current issues: 1. [specific problem — performance, readability, coupling, duplication, etc.] 2. [specific problem] 3. [specific problem] Constraints: - Must maintain the same public API (function signature, return type) - Must pass these existing tests: [list] - Target complexity: [O(n), O(n log n), etc.] - Use [pattern/library] for [specific part] Don't change: - [things that should stay the same] [paste the code]
Key insight: “Refactor this” is like saying “make this better” — it’s too vague to be actionable. Specify the problems (what’s wrong), the constraints (what must stay the same), and the target (what “better” looks like). The model needs a clear definition of done.
science
Test Writing: The Highest-ROI Code Prompt
LLMs are surprisingly good at writing tests — if you tell them what to test
Why Tests Are the Sweet Spot
Test writing is arguably the best use of LLMs for code because:

1. Tests are repetitive: Similar structure, different inputs/outputs. Perfect for pattern completion.
2. Edge cases are enumerable: You can list them and the model generates the code.
3. Low risk: A wrong test fails loudly. A wrong function fails silently.
4. High leverage: Tests you wouldn’t have written catch bugs you wouldn’t have found.
Test Prompt Template
Write pytest tests for this function: [paste the function] Test framework: pytest Style: AAA (Arrange-Act-Assert) Naming: test_[function]_[scenario]_[expected] Cover these scenarios: 1. Happy path with typical input 2. Empty input 3. None/null input 4. Boundary values (0, max int, etc.) 5. Invalid types 6. Concurrent access (if applicable) 7. Large input (performance sanity check) For each test: - Clear docstring explaining what's tested - Explicit assertion messages - No test interdependencies
Example Output (Partial)
import pytest from app.validators import validate_email class TestValidateEmail: def test_valid_standard_email(self): """Standard email passes validation.""" assert validate_email("user@example.com") def test_valid_email_with_plus_tag(self): """Gmail-style + tags are valid.""" assert validate_email( "user+newsletter@gmail.com") def test_none_input_returns_false(self): """None input should not raise.""" assert not validate_email(None) def test_disposable_domain_rejected(self): """Disposable emails are rejected.""" assert not validate_email( "test@mailinator.com") def test_exceeds_max_length_rejected(self): """Emails > 254 chars are invalid.""" long = "a" * 245 + "@test.com" assert not validate_email(long)
Key insight: The model writes tests you wouldn’t bother writing — like the 254-character boundary test or the None input test. These “boring” tests are exactly the ones that catch production bugs. Let the model handle the tedium; you focus on the tricky edge cases it might miss.
inventory_2
The Context Package: What to Include
The model can’t see your codebase — pack the right context into every prompt
The Context Checklist
For code generation: □ Language and version (Python 3.11) □ Framework and version (Flask 3.0) □ Function signature with types □ Dependencies allowed/forbidden □ Coding conventions (naming, style) □ Related types/interfaces For debugging: □ Full error message / traceback □ The code that triggers the error □ The input that causes the failure □ Database schema (if DB-related) □ Framework/library versions □ What you've already tried For refactoring: □ The current code □ What's wrong with it (specific issues) □ What "better" looks like (target) □ Constraints (API compatibility, etc.) □ Existing tests that must still pass
What NOT to Include
Don’t dump your entire file. The model gets confused by irrelevant code. Include only:

• The function/class being discussed
• Imports it depends on
• Types/interfaces it references
• The specific error or test that fails

Think of it as a minimal reproducible example — the same thing you’d post on Stack Overflow.
The “Invisible Context” Problem
The most common code prompting failure is assuming the model knows things it doesn’t:

• Your database schema
• Your custom types and interfaces
• Your environment variables
• Your deployment configuration
• Your team’s coding conventions

If the model’s output doesn’t match your codebase, the fix is almost always: add more context to the prompt.
Key insight: The quality of code output is directly proportional to the quality of context you provide. Think of every code prompt as having two parts: the task (what to do) and the context package (everything the model needs to do it right). Most people nail the task and skip the context.
checklist
The Code Prompting Checklist
Before you hit send on any code prompt, verify these
Pre-Send Checklist
□ Specified language and version "Python 3.11" not just "Python" □ Included function signature Name, parameters, types, return type □ Listed edge cases explicitly None, empty, boundary values, errors □ Specified dependencies "Standard library only" or "can use X" □ Asked for tests Always. The model writes them for free. □ Provided relevant context Schema, related code, framework version □ Defined "done" What does correct output look like?
Post-Receive Checklist
□ Read the code before using it Never copy-paste without reading □ Run the tests If they pass, the code probably works If they fail, you found a bug early □ Check edge cases manually The model might have missed one □ Verify no hallucinated APIs Model sometimes invents function names or uses deprecated APIs □ Check security SQL injection, XSS, hardcoded secrets □ Verify it fits your codebase Naming conventions, error handling patterns, logging style
Key insight: Code prompting is a collaboration, not a delegation. The model writes the first draft; you review, test, and integrate. The best workflow: prompt with a spec → get code + tests → run tests → review for edge cases → integrate. This is faster than writing from scratch but safer than blind copy-paste.