The Retry Problem
Network calls fail. APIs time out. Systems go down for maintenance. An enterprise agent must handle all of these gracefully. The critical requirement is idempotency: if the agent retries a failed operation, the result should be the same as if it succeeded the first time. Creating a duplicate invoice, sending a duplicate email, or double-booking a resource because the agent retried a timed-out call is a production incident. Every tool call the agent makes must be designed so that retrying is safe. This means using idempotency keys, checking for existing records before creating new ones, and designing write operations as upserts rather than inserts.
Idempotency Patterns
Non-idempotent (dangerous):
POST /invoices {amount: 4250}
Timeout → retry → duplicate invoice
Idempotent (safe):
PUT /invoices/INV-4401 {amount: 4250}
Timeout → retry → same invoice
Patterns:
Idempotency keys on every write
Check-before-create logic
Upsert instead of insert
Deduplication at middleware layer
Key insight: In traditional software, a retry bug creates a support ticket. In an AI agent, a retry bug creates real-world consequences — duplicate payments, duplicate orders, duplicate communications. Design every tool call to be safely retryable.