Cloud API Costs (2025 Pricing)
GPT-4o
Input: $2.50 / 1M tokens
Output: $10.00 / 1M tokens
GPT-4o-mini
Input: $0.15 / 1M tokens
Output: $0.60 / 1M tokens
Claude 3.5 Sonnet
Input: $3.00 / 1M tokens
Output: $15.00 / 1M tokens
Claude 3.5 Haiku
Input: $0.80 / 1M tokens
Output: $4.00 / 1M tokens
Local Model Costs
Hardware (one-time):
MacBook M2 Pro (16GB): ~$2,000
Gaming PC + RTX 4090: ~$2,500
Mac Studio M2 Ultra: ~$4,000
Running cost:
Electricity: ~$5-15/month
Maintenance: $0
Per-token cost: $0.00
Break-even:
At $500/month cloud spend →
local pays for itself in 4-5 months
Key insight: If you’re spending more than $200/month on API calls for tasks that a 7B–9B model can handle (classification, extraction, summarization, simple chat), local deployment pays for itself within months. The marginal cost of each additional request is essentially zero.