Ch 9: Prompting for Code — Prompt Engineering Mastery

Ch 9 — Prompting for Code

Code generation, debugging, refactoring, and test writing — why “write a function” fails and what to do instead

arrow_backIndex

Real-World

psychology

Code ≠ Prose

arrow_forward

warning

Vague Prompt

arrow_forward

code

Spec Prompt

arrow_forward

bug_report

Debug Right

arrow_forward

auto_fix_high

Refactoring

arrow_forward

science

Test Writing

arrow_forward

inventory_2

Context Pack

arrow_forward

checklist

Checklist

Click play or press Space to begin...

Step- / 8

psychology

Code Prompting Is Different

Precision matters more, ambiguity costs more, and context is everything

Why Code Is Harder to Prompt For

When you prompt for prose, a “pretty good” answer is usually fine. When you prompt for code, a “pretty good” answer might have a subtle bug that takes hours to find. Code has zero tolerance for ambiguity:

1. Syntax must be perfect: One missing bracket = broken code
2. Edge cases matter: Empty input, null values, unicode, concurrency
3. Context is invisible: The model can’t see your codebase, your database schema, your framework version, or your coding conventions
4. “Works” ≠ “Correct”: Code can run without errors but produce wrong results

The Code Prompting Mindset

Think of the model as a very fast junior developer who:

• Knows every language and framework (from training data)
• Has never seen your codebase
• Will do exactly what you ask (not what you mean)
• Won’t ask clarifying questions
• Defaults to the most common pattern, not necessarily the best one

Your job is to write prompts that a smart-but-context-blind developer could follow perfectly.

Key insight: The #1 mistake in code prompting is under-specifying. “Write a function to validate emails” has 50 possible interpretations. “Write a Python function validate_email(email: str) -> bool that checks RFC 5322 format and rejects disposable domains” has one. Be the spec, not the wish.

warning

The Vague Code Prompt

“Write a function to validate emails” — what you get vs what you need

Superficial Prompt

Prompt: Write a function to validate email addresses.

What You Get

import re def validate_email(email): pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$' return bool(re.match(pattern, email))

What’s Wrong

1. No type hints: email could be anything
2. Naive regex: Rejects valid emails like user+tag@gmail.com, name@sub.domain.co.uk
3. No length check: RFC 5321 limits emails to 254 chars
4. No unicode handling: Fails on internationalized domains
5. No edge cases: What about None, empty string, whitespace?
6. No tests: How do you know it works?
7. No docstring: What does “valid” mean in this context?

The Model’s Perspective

The model gave you exactly what you asked for: a function that validates emails. It chose the most common pattern from its training data (a simple regex). It had no way to know you needed:

• Production-quality validation
• Specific edge case handling
• Type safety
• Tests
• A particular validation standard

The model isn’t being lazy — your prompt was ambiguous, so it made 7 assumptions, and most of them were wrong for your use case.

The pattern: Vague code prompts produce code that “looks right” but fails in production. The model optimizes for the most common interpretation, not your specific requirements. Every unspecified detail is a coin flip.

code

The Spec Prompt: Production-Quality Code Generation

Treat your prompt like a function specification — signature, behavior, edge cases, tests

Deliberate Prompt

Write a Python function with this spec: Function: validate_email(email: str) -> bool Python: 3.11+ Dependencies: standard library only Requirements: 1. Check RFC 5322 format (use email.utils or a robust regex, not a naive one) 2. Reject emails longer than 254 characters 3. Reject disposable email domains: mailinator.com, guerrillamail.com, tempmail.com, throwaway.email 4. Handle edge cases: None input, empty string, whitespace-only, missing @ 5. Strip leading/trailing whitespace before validation Return: True if valid, False otherwise Include: Google-style docstring Include: 8 test cases covering: - Valid standard email - Valid email with + tag - Valid subdomain email - Disposable domain (should fail) - Too long (should fail) - None input (should fail) - Empty string (should fail) - Missing @ (should fail)

What You Get Now

The model produces a function with:

• Type hints on input and return
• Robust validation using email.utils.parseaddr or a proper regex
• Length check before regex (fast fail)
• Disposable domain blocklist as a set for O(1) lookup
• Whitespace stripping before validation
• None/empty handling with early returns
• Google-style docstring explaining behavior
• 8 test cases using assert or pytest

Vague Prompt

5-line regex function. No types, no edge cases, no tests. Rejects valid emails, accepts invalid ones. Needs rewriting.

Spec Prompt

30+ line production function. Type hints, docstring, edge case handling, disposable domain check, 8 test cases. Ready for code review.

Key insight: The spec prompt template for code: function signature + language/version + dependencies + requirements (numbered) + edge cases + return behavior + tests. This is the same information you’d put in a Jira ticket for a developer. If you wouldn’t assign a task without a spec, don’t prompt without one.

bug_report

Debugging: Context Is Everything

Pasting just the error message vs providing the full debugging context

Superficial Debug Prompt

Prompt: “I’m getting a 500 error on my Flask API. What’s wrong?”

Output: Generic checklist of 8 possible causes. None specific to your code. You’re no closer to a fix.

Deliberate Debug Prompt

Debug this Flask API error. Diagnose the root cause, then provide the fix. Endpoint: POST /api/orders Framework: Flask 3.0 + SQLAlchemy 2.0 Database: PostgreSQL 16 Error traceback: sqlalchemy.exc.IntegrityError: (psycopg2.errors.ForeignKeyViolation) insert or update on table "order_items" violates foreign key constraint "order_items_product_id_fkey" DETAIL: Key (product_id)=(847) is not present in table "products". Route handler: @app.route('/api/orders', methods=['POST']) def create_order(): data = request.get_json() order = Order(customer_id=data['cid']) db.session.add(order) db.session.flush() for item in data['items']: oi = OrderItem( order_id=order.id, product_id=item['product_id'], quantity=item['qty']) db.session.add(oi) db.session.commit() return jsonify(order.to_dict()), 201 Request payload that triggers the error: {"cid": 12, "items": [ {"product_id": 847, "qty": 2} ]}

Model Output (With Context)

Root cause: product_id 847 doesn't exist in the products table. The route handler doesn't validate that product IDs exist before inserting order items. The fix (validate before insert): @app.route('/api/orders', methods=['POST']) def create_order(): data = request.get_json() # Validate all product IDs exist product_ids = [i['product_id'] for i in data['items']] existing = db.session.query( Product.id ).filter( Product.id.in_(product_ids) ).all() existing_ids = {p.id for p in existing} missing = set(product_ids) - existing_ids if missing: return jsonify({ "error": "Invalid product IDs", "missing": list(missing) }), 400 # ... rest of the handler

Key insight: The debug prompt formula: error message + relevant code + request/input that triggers it + framework/DB versions. The model can’t see your codebase — you need to bring the relevant context to it. More context = more precise diagnosis.

auto_fix_high

Refactoring: Tell the Model What to Improve

Don’t say “refactor this” — specify what’s wrong and what “better” looks like

Vague Refactor

Prompt: “Refactor this code to be better.”

Model renames variables, adds comments, maybe extracts a function. Surface-level changes that don’t address real issues.

Specific Refactor

Prompt: “Refactor this function. Specific issues: (1) It has 3 nested for-loops making it O(n³), (2) It mutates the input list, (3) Variable names are single letters. Target: O(n log n) using a dict for lookups, no mutation, descriptive names.”

The Refactoring Prompt Template

Refactor this [language] code. Current issues: 1. [specific problem — performance, readability, coupling, duplication, etc.] 2. [specific problem] 3. [specific problem] Constraints: - Must maintain the same public API (function signature, return type) - Must pass these existing tests: [list] - Target complexity: [O(n), O(n log n), etc.] - Use [pattern/library] for [specific part] Don't change: - [things that should stay the same] [paste the code]

Key insight: “Refactor this” is like saying “make this better” — it’s too vague to be actionable. Specify the problems (what’s wrong), the constraints (what must stay the same), and the target (what “better” looks like). The model needs a clear definition of done.

science

Test Writing: The Highest-ROI Code Prompt

LLMs are surprisingly good at writing tests — if you tell them what to test

Why Tests Are the Sweet Spot

Test writing is arguably the best use of LLMs for code because:

1. Tests are repetitive: Similar structure, different inputs/outputs. Perfect for pattern completion.
2. Edge cases are enumerable: You can list them and the model generates the code.
3. Low risk: A wrong test fails loudly. A wrong function fails silently.
4. High leverage: Tests you wouldn’t have written catch bugs you wouldn’t have found.

Test Prompt Template

Write pytest tests for this function: [paste the function] Test framework: pytest Style: AAA (Arrange-Act-Assert) Naming: test_[function]_[scenario]_[expected] Cover these scenarios: 1. Happy path with typical input 2. Empty input 3. None/null input 4. Boundary values (0, max int, etc.) 5. Invalid types 6. Concurrent access (if applicable) 7. Large input (performance sanity check) For each test: - Clear docstring explaining what's tested - Explicit assertion messages - No test interdependencies

Example Output (Partial)

import pytest from app.validators import validate_email class TestValidateEmail: def test_valid_standard_email(self): """Standard email passes validation.""" assert validate_email("user@example.com") def test_valid_email_with_plus_tag(self): """Gmail-style + tags are valid.""" assert validate_email( "user+newsletter@gmail.com") def test_none_input_returns_false(self): """None input should not raise.""" assert not validate_email(None) def test_disposable_domain_rejected(self): """Disposable emails are rejected.""" assert not validate_email( "test@mailinator.com") def test_exceeds_max_length_rejected(self): """Emails > 254 chars are invalid.""" long = "a" * 245 + "@test.com" assert not validate_email(long)

Key insight: The model writes tests you wouldn’t bother writing — like the 254-character boundary test or the None input test. These “boring” tests are exactly the ones that catch production bugs. Let the model handle the tedium; you focus on the tricky edge cases it might miss.

inventory_2

The Context Package: What to Include

The model can’t see your codebase — pack the right context into every prompt

The Context Checklist

For code generation: □ Language and version (Python 3.11) □ Framework and version (Flask 3.0) □ Function signature with types □ Dependencies allowed/forbidden □ Coding conventions (naming, style) □ Related types/interfaces For debugging: □ Full error message / traceback □ The code that triggers the error □ The input that causes the failure □ Database schema (if DB-related) □ Framework/library versions □ What you've already tried For refactoring: □ The current code □ What's wrong with it (specific issues) □ What "better" looks like (target) □ Constraints (API compatibility, etc.) □ Existing tests that must still pass

What NOT to Include

Don’t dump your entire file. The model gets confused by irrelevant code. Include only:

• The function/class being discussed
• Imports it depends on
• Types/interfaces it references
• The specific error or test that fails

Think of it as a minimal reproducible example — the same thing you’d post on Stack Overflow.

The “Invisible Context” Problem

The most common code prompting failure is assuming the model knows things it doesn’t:

• Your database schema
• Your custom types and interfaces
• Your environment variables
• Your deployment configuration
• Your team’s coding conventions

If the model’s output doesn’t match your codebase, the fix is almost always: add more context to the prompt.

Key insight: The quality of code output is directly proportional to the quality of context you provide. Think of every code prompt as having two parts: the task (what to do) and the context package (everything the model needs to do it right). Most people nail the task and skip the context.

checklist

The Code Prompting Checklist

Before you hit send on any code prompt, verify these

Pre-Send Checklist

□ Specified language and version "Python 3.11" not just "Python" □ Included function signature Name, parameters, types, return type □ Listed edge cases explicitly None, empty, boundary values, errors □ Specified dependencies "Standard library only" or "can use X" □ Asked for tests Always. The model writes them for free. □ Provided relevant context Schema, related code, framework version □ Defined "done" What does correct output look like?

Post-Receive Checklist

□ Read the code before using it Never copy-paste without reading □ Run the tests If they pass, the code probably works If they fail, you found a bug early □ Check edge cases manually The model might have missed one □ Verify no hallucinated APIs Model sometimes invents function names or uses deprecated APIs □ Check security SQL injection, XSS, hardcoded secrets □ Verify it fits your codebase Naming conventions, error handling patterns, logging style

Key insight: Code prompting is a collaboration, not a delegation. The model writes the first draft; you review, test, and integrate. The best workflow: prompt with a spec → get code + tests → run tests → review for edge cases → integrate. This is faster than writing from scratch but safer than blind copy-paste.

arrow_back Ch 8: Prompt Patterns Ch 10: RAG & Context Injection arrow_forward