The Concept
Model routing sends each request to the cheapest model that can handle it well. A classifier (itself a cheap model) evaluates incoming requests and routes simple tasks to budget models and complex tasks to premium models. Research shows 62% of agent tasks can route to budget models with zero quality loss.
// Routing rules example
Classification/extraction → GPT-4o mini ($0.15/M)
Simple Q&A, formatting → GPT-5 Nano ($0.05/M)
Code generation, analysis → GPT-4.1 ($2.00/M)
Complex reasoning → Claude Opus ($5.00/M)
Hard math/science → o3 ($2.00/M)
Implementation
The simplest router is a keyword/heuristic classifier that checks request length, presence of code blocks, or explicit complexity markers. More sophisticated routers use a small LLM (GPT-4o mini at $0.15/M) to classify the request before routing. The router cost is negligible compared to the savings from routing 60%+ of traffic to budget models.
Key insight: Model routing is the single highest-leverage optimization because it addresses the 3,000x price range across models. Even a crude router that correctly classifies 70% of requests saves 40–50% of total spend.