Per-Tool Checklist
□ Clear description
What it does, what it returns
□ When to use
Specific triggers ("when customer
asks about order status")
□ When NOT to use
Explicit exclusions ("not for refunds")
□ Parameter descriptions
Format, examples, edge case handling
□ Missing parameter behavior
"Ask the user" not "guess"
□ Prerequisites
"Call X before calling this tool"
□ Safety guardrails
Confirmation for writes, limits,
escalation triggers
System-Level Checklist
□ System prompt with global rules
When to use tools vs answer directly
□ Tool disambiguation
No two tools with overlapping triggers
□ Workflow orchestration
Clear call order for multi-tool flows
□ Backend validation
Never trust model arguments blindly
□ Error handling
What should the model do when a tool
call fails? (Retry? Tell user? Escalate?)
□ Testing
Test each tool with: correct use,
wrong tool, missing params, edge cases
Key insight: Tool use is where prompt engineering becomes agent engineering. The same principles apply — be specific, give examples, set constraints — but the stakes are higher because tools have real-world side effects. A bad text response is annoying; a bad tool call can refund money, delete data, or send emails. Invest in tool descriptions like you invest in production code.