When a Prompt Fails
1. Reproduce
Run the exact same input 3 times.
Is the failure consistent or random?
Consistent → prompt bug
Random → temperature/sampling issue
2. Diagnose (5-step framework)
□ Instruction clear?
□ Enough context?
□ Format specified?
□ Conflicting constraints?
□ Task too complex?
3. Isolate
Remove parts of the prompt until you
find the minimum that reproduces the
failure. Often it's one ambiguous
sentence.
4. Fix
Make the smallest change that fixes
the failure. Don't rewrite the whole
prompt — you'll introduce new bugs.
5. Verify
Run the fix against:
- The failing input (should pass now)
- 10 previously passing inputs
(should still pass — no regression)
Quick Reference
Wrong format? → Add explicit format instructions with an example
Too long/short? → Add word/sentence count constraint
Hallucinating? → Add “Only use information from the provided context”
Inconsistent? → Lower temperature, add few-shot examples
Wrong tool called? → Improve tool descriptions (Ch 12)
Loses context? → Add summaries (Ch 11)
Works sometimes? → Run 10x, find the pattern in failures
Key insight: Prompt debugging is a learnable skill. The 5-step framework (reproduce, diagnose, isolate, fix, verify) works for every failure mode. The key discipline is making the smallest fix and verifying it doesn’t break existing behavior. Prompt engineering is iterative, not inspirational.