Ch 12 — Tool Use & Function Calling

Prompting models to use tools, APIs, and structured actions — tool descriptions ARE prompts
Mastery
psychology
How Tools Work
arrow_forward
warning
Vague Tools
arrow_forward
build
Good Tools
arrow_forward
support_agent
Support Agent
arrow_forward
rule
When to Call
arrow_forward
security
Safety
arrow_forward
link
Chaining
arrow_forward
checklist
Checklist
-
Click play or press Space to begin...
Step- / 8
psychology
How Function Calling Works
You define the tools — the model decides when to call them and with what arguments
The Flow
1. You define tools: Name, description, parameters (as JSON Schema)
2. User sends a message: “What’s the status of order #12345?”
3. Model decides: “I should call lookup_order with order_id: "12345"
4. Your code executes: Calls the actual API/database
5. Result fed back: The tool result is added to the conversation
6. Model responds: Uses the tool result to answer the user

The model never executes anything — it only decides which tool to call and with what arguments. Your code does the actual execution.
Tool Definition (OpenAI Format)
tools = [{ "type": "function", "function": { "name": "lookup_order", "description": "Look up an order by its ID. Returns order status, items, and shipping info.", "parameters": { "type": "object", "required": ["order_id"], "properties": { "order_id": { "type": "string", "description": "The order ID, e.g. 'ORD-12345'" } } } } }]
Key insight: Tool descriptions ARE prompts. The description field is the most important part of a tool definition — it’s what the model reads to decide whether and how to use the tool. Every principle from this course (be specific, give examples, set constraints) applies to tool descriptions.
warning
Vague Tool Descriptions: Wrong Tool, Wrong Parameters
When descriptions are ambiguous, the model guesses — and guesses wrong
Vague Tool Definitions
❌ BAD — vague descriptions { "name": "search", "description": "Search for things", "parameters": { "query": {"type": "string"} } } { "name": "get_data", "description": "Get data from the system", "parameters": { "id": {"type": "string"} } }
What Goes Wrong
User: “Find order #12345”

Model’s dilemma: Should I call search with query “order #12345”? Or get_data with id “12345”? Both descriptions are vague enough to match.

Result: The model picks one randomly, or calls search with the full sentence as the query (wrong), or calls get_data with “order #12345” including the hash (wrong format).
Common Failures
1. Wrong tool selected: Model can’t distinguish between similar tools
2. Wrong parameter format: Passes “12345” vs “ORD-12345” vs “order #12345”
3. Missing required context: Model doesn’t know it needs to extract the ID from the user’s message
4. Unnecessary tool calls: Model calls a tool when it could answer directly
The pattern: Vague tool descriptions cause the same problems as vague prompts: the model guesses, and guesses are often wrong. The fix is the same too: be specific, give examples, set constraints.
build
Writing Great Tool Descriptions
The same prompt engineering principles, applied to tool definitions
Good Tool Definition
{ "name": "lookup_order", "description": "Look up a customer order by its order ID. Returns the order status, line items, shipping address, and tracking number. Use this when the customer asks about an order status, delivery, or items in their order. Do NOT use for refund requests — use refund_order instead.", "parameters": { "type": "object", "required": ["order_id"], "properties": { "order_id": { "type": "string", "description": "Order ID in format ORD-XXXXX (e.g., 'ORD-12345'). Extract from the customer's message. If they say just a number like '12345', prepend 'ORD-'." } } } }
The 5 Elements of a Great Tool Description
1. WHAT it does "Look up a customer order by its ID. Returns status, items, shipping, tracking." 2. WHEN to use it "Use when customer asks about order status, delivery, or items." 3. WHEN NOT to use it "Do NOT use for refund requests." 4. PARAMETER format "Order ID in format ORD-XXXXX. If they say '12345', prepend 'ORD-'." 5. EXAMPLE "e.g., 'ORD-12345'"
Key insight: Tool descriptions need the same specificity as any other prompt. What it does + when to use + when NOT to use + parameter format with examples. The “when NOT to use” is especially important when you have multiple similar tools.
support_agent
Domain Example: Customer Support Agent
Three tools that work together — lookup, refund, and escalate
The Tool Set
Tool 1: lookup_order "Look up order by ID. Returns status, items, shipping. Use for: status checks, delivery questions, item inquiries." Tool 2: refund_order "Process a refund for an order. ALWAYS call lookup_order first to verify the order exists and is eligible. Use for: refund requests, cancellation requests. Requires: order_id AND reason." Tool 3: escalate_to_human "Transfer to a human agent. Use ONLY when: (1) customer explicitly asks for a human, (2) issue involves data loss or security, (3) customer is angry after 2+ failed resolution attempts. Do NOT escalate for routine questions."
Conversation Flow
User: "I want a refund for order 12345" Model thinks: Refund request → need to look up the order first (per refund_order description) → then process refund. Model calls: lookup_order("ORD-12345") Result: {status: "delivered", items: [...], refund_eligible: true} Model calls: refund_order("ORD-12345", "Customer requested refund") Result: {refund_id: "REF-789", amount: "$49.99", eta: "3-5 days"} Model responds: "I've processed your refund of $49.99 for order ORD-12345. Refund ID: REF-789. You should see it in your account within 3-5 business days. Is there anything else?"
Key insight: The tool descriptions orchestrate the agent’s behavior. “ALWAYS call lookup_order first” in the refund tool description creates a two-step workflow without any application code. The model follows the instructions in the descriptions like a protocol.
rule
“When to Call” Guidance
Preventing unnecessary tool calls and ensuring the right tool for the right situation
The Problem
Without “when to call” guidance, models tend to:

1. Over-call: Use a tool for every question, even when the model knows the answer
2. Under-call: Answer from training data when they should use a tool (e.g., making up order statuses)
3. Mis-call: Use the wrong tool for the situation
System Prompt Guidance
System prompt for the support agent: You have access to 3 tools. Follow these rules: 1. ALWAYS use tools for customer-specific data (orders, account info, refunds). Never guess or make up this data. 2. NEVER use tools for general questions ("What's your return policy?" — answer from your knowledge, don't call a tool) 3. Call lookup_order BEFORE refund_order to verify the order exists. 4. Only escalate when the criteria in the escalate_to_human description are met. Exhaust all other options first.
Parameter Validation in Descriptions
# Tell the model how to handle # missing or ambiguous parameters "order_id": { "description": "Order ID in format ORD-XXXXX. If the customer doesn't provide an order ID, ASK for it before calling this tool. Do not guess or make up an order ID." }
Handling Missing Parameters
Without the “ASK for it” instruction, the model might:

• Call the tool with a made-up ID
• Call the tool with an empty string
• Try to extract an ID from unrelated numbers in the conversation

The parameter description tells the model: if you don’t have what you need, ask the user instead of guessing.
Key insight: Tool call guidance lives in two places: the system prompt (global rules) and the tool descriptions (per-tool rules). Use the system prompt for cross-tool rules (“always look up before refunding”) and tool descriptions for tool-specific rules (“ask for order ID if not provided”).
security
Safety: Preventing Dangerous Tool Calls
When the model has access to write operations, guardrails are critical
The Risk
Read-only tools (lookup, search) are low risk. Write tools (refund, delete, update) are high risk. A model that can call refund_order can be tricked into issuing unauthorized refunds through prompt injection or misunderstanding.
Safety Patterns
Pattern 1: Confirmation before writes "Before calling refund_order, ALWAYS confirm with the customer: 'I'll process a refund of $X for order Y. Shall I go ahead?' Only call the tool after they confirm." Pattern 2: Amount limits "refund_order can only process refunds up to $500. For amounts over $500, use escalate_to_human instead." Pattern 3: Rate limiting in description "Maximum 1 refund per conversation. If the customer requests multiple refunds, escalate to human."
Defense in Depth
Don’t rely solely on the model following instructions. Layer defenses:

Layer 1 (Prompt): Tool descriptions with safety rules
Layer 2 (Code): Validate tool call arguments before execution
Layer 3 (Backend): Business logic checks (is this order refund-eligible?)
Layer 4 (Monitoring): Alert on unusual patterns (10 refunds in 1 hour)
Key insight: The model is the decision-maker, but your code is the executor. Never trust the model’s tool calls blindly. Validate every argument, enforce business rules in your backend, and add confirmation steps for destructive operations. The prompt is the first line of defense, not the only one.
link
Tool Chaining: Multi-Step Workflows
When one tool call isn’t enough — orchestrating sequences of tool calls
Sequential Chaining
The model can call multiple tools in sequence, using the result of one to inform the next. The “ALWAYS call lookup_order before refund_order” pattern is a simple chain.

More complex example: a travel booking agent that needs to (1) search flights, (2) check seat availability, (3) book the seat, (4) send confirmation email — four tools in sequence.
Parallel Calling
# Modern APIs support parallel tool calls # Model can call multiple tools at once User: "Compare prices for flights from SFO to NYC and SFO to LAX for next week" Model calls (parallel): search_flights(from="SFO", to="NYC", date="2025-03-20") search_flights(from="SFO", to="LAX", date="2025-03-20") # Both results come back, model # compares and responds
Orchestration in Tool Descriptions
Tool: book_flight "Book a flight. Prerequisites: 1. Must have called search_flights first 2. Must have called check_availability for the specific flight_id 3. Must have confirmed with the user: flight details, price, and seat Call order: search → check → confirm with user → book. Never skip steps."
When Chaining Gets Complex
For workflows with 4+ tools and branching logic, consider:

1. Explicit workflow in system prompt: Describe the full workflow as a numbered sequence
2. State tracking: Include current workflow state in the system prompt
3. Agentic frameworks: Use LangChain, CrewAI, or similar frameworks that manage tool orchestration programmatically
Key insight: Tool chaining is where prompt engineering meets system design. Simple chains (2–3 tools) work well with description-level orchestration. Complex chains (4+ tools with branching) need framework-level orchestration. Know when to graduate from prompts to code.
checklist
The Tool Description Checklist
Before deploying any tool-using agent, verify these
Per-Tool Checklist
□ Clear description What it does, what it returns □ When to use Specific triggers ("when customer asks about order status") □ When NOT to use Explicit exclusions ("not for refunds") □ Parameter descriptions Format, examples, edge case handling □ Missing parameter behavior "Ask the user" not "guess" □ Prerequisites "Call X before calling this tool" □ Safety guardrails Confirmation for writes, limits, escalation triggers
System-Level Checklist
□ System prompt with global rules When to use tools vs answer directly □ Tool disambiguation No two tools with overlapping triggers □ Workflow orchestration Clear call order for multi-tool flows □ Backend validation Never trust model arguments blindly □ Error handling What should the model do when a tool call fails? (Retry? Tell user? Escalate?) □ Testing Test each tool with: correct use, wrong tool, missing params, edge cases
Key insight: Tool use is where prompt engineering becomes agent engineering. The same principles apply — be specific, give examples, set constraints — but the stakes are higher because tools have real-world side effects. A bad text response is annoying; a bad tool call can refund money, delete data, or send emails. Invest in tool descriptions like you invest in production code.