Ch 5 — Context Routing

Directing queries to the right context source before anything enters the window
High Level
input
Query
arrow_forward
category
Classify
arrow_forward
route
Route
arrow_forward
source
Source
arrow_forward
layers
Load
arrow_forward
smart_toy
Respond
-
Click play or press Space to begin...
Step- / 8
error
The Multi-Source Problem
Why loading all knowledge bases for every query fails
The Scenario
A multi-domain agent has access to multiple knowledge bases, tool sets, and instruction sets. A billing question doesn’t need the onboarding knowledge base. A technical support query doesn’t need the refund policy. Loading all of them for every query wastes context and degrades accuracy by diluting the model’s attention with irrelevant information.
The Cost of No Routing
Without routing, every query carries the full weight of every knowledge domain. For an enterprise support agent with 5 domains, each with 10,000 tokens of context, that’s 50,000 tokens loaded before the model even sees the question. Most of it is irrelevant noise competing for attention.
Critical in AI: Context routing is the gatekeeper pattern. It decides what enters the context window before the main model starts reasoning. Getting routing wrong means the model either lacks critical context (under-routing) or drowns in irrelevant context (over-routing).
text_fields
Rule-Based Routing
Fast and predictable, but rigid
How It Works
Rule-based routing uses keyword matching or pattern detection to classify queries. If the query contains “refund,” “return,” or “money back,” route to the refund knowledge base. If it contains “password,” “login,” or “account access,” route to technical support.
Example
// Rule-based router function route(query) { const q = query.toLowerCase(); if (q.match(/refund|return|money back/)) return "refund_kb"; if (q.match(/password|login|account/)) return "tech_support_kb"; if (q.match(/invoice|charge|billing/)) return "billing_kb"; return "general_kb"; // fallback }
Strengths & Weaknesses
Strengths: Near-instant latency (no LLM call), completely predictable, easy to debug, zero cost.

Weaknesses: Rigid — misses anything outside expected patterns. “I was charged twice and want my money back” matches both billing and refund rules. Requires manual updates for new domains. Cannot handle nuanced or ambiguous queries.
Key insight: Rule-based routing is the right starting point for most teams. Even simple keyword rules cut context bloat significantly before you invest in more sophisticated approaches. Start here, upgrade when you hit accuracy limits.
smart_toy
LLM-Based Routing
Using the model itself to classify and route
How It Works
LLM-based routing uses a model (often a smaller, faster one) to classify the query and select the appropriate context source. The router model receives the query plus a list of available knowledge domains with descriptions, and returns the best match. It understands nuance, handles ambiguity, and adapts to new patterns automatically.
Example
// LLM-based router prompt system: "Classify the user query into one of these domains: - billing: charges, invoices, plans - refunds: returns, money back - tech_support: login, bugs, errors - onboarding: setup, getting started Return only the domain name." user: "I was charged twice and want my money back" "refunds" // Understands intent
Strengths & Weaknesses
Strengths: Understands nuance and intent, handles ambiguous queries, adapts to new patterns without code changes, can route to multiple domains when appropriate.

Weaknesses: Adds an inference call before the main task (latency + cost), can hallucinate routing decisions, harder to debug when it misroutes, requires a fallback strategy.
Key insight: LLM routing adds latency and cost upfront but saves far more downstream by loading only relevant context. The net effect is usually positive — a small router call that prevents 30K tokens of irrelevant context from entering the main model’s window.
account_tree
Hierarchical Routing
A lead agent triages to specialized sub-agents
The Pattern
Hierarchical routing uses a lead agent to triage queries to specialized sub-agents, each with its own focused context window. The lead agent carries minimal context (just enough to classify), while sub-agents carry deep domain-specific context. This is the multi-agent version of context routing.
Architecture
// Hierarchical routing Lead Agent (lightweight context) ├─→ Billing Agent │ └─ billing KB, billing tools ├─→ Refund Agent │ └─ refund KB, payment tools ├─→ Tech Support Agent │ └─ tech KB, diagnostic tools └─→ Onboarding Agent └─ setup KB, config tools
When to Use
Hierarchical routing makes sense when domains are deeply different — different tools, different knowledge bases, different behavioral patterns. If domains share most of their context and differ only in instructions, progressive disclosure (Ch 3) with skill-based identity management is simpler and cheaper.
Why it matters: Hierarchical routing trades orchestration complexity for context efficiency. Each sub-agent has a clean, focused context window. The cost is managing inter-agent communication and ensuring the lead agent routes correctly.
merge
Hybrid Routing
Combining methods for production reliability
The Production Pattern
Most production systems use hybrid routing that combines multiple methods. A typical pattern: rule-based routing first (fast, cheap, handles obvious cases), then LLM-based routing for queries that don’t match any rule (handles nuance), with a fallback to a default agent or human escalation when confidence is low.
Hybrid Flow
// Hybrid routing pipeline 1. Rule-based check (instant) → Match? Route directly. → No match? Continue. 2. LLM classifier (100-200ms) → High confidence? Route. → Low confidence? Continue. 3. Fallback → Default agent or → Human escalation
Why Hybrid Wins
Hybrid routing optimizes for the common case: 70–80% of queries match simple rules and are routed instantly at zero cost. The remaining 20–30% get the more expensive LLM classification. This gives you the accuracy of LLM routing at a fraction of the cost.
Rule of thumb: Start with rule-based routing for your top 10 query patterns. Add LLM routing when you see misroutes on ambiguous queries. Add hierarchical routing only when domains need completely different tool sets and knowledge bases.
savings
Downstream Savings
How routing multiplies the value of other techniques
The Multiplier Effect
Routing’s savings come downstream, not from the routing step itself. By loading only relevant context, you reduce tokens for the main inference call. This compounds with compression: if you route to a 10K-token knowledge base instead of loading all 50K tokens, and then compress that 10K to 5K, you’ve achieved a 10× reduction from the unoptimized baseline.
Multi-Agent Efficiency
In multi-agent systems, routing prevents the context duplication problem. Without routing, every agent carries every piece of context. With routing, each agent carries only what it needs. For a system with 5 agents and 5 domains, routing can reduce total context consumption by 80% (each agent carries 1/5 instead of 5/5).
Key insight: Routing is the highest-leverage optimization for multi-domain systems. It doesn’t just save tokens — it improves accuracy by ensuring the model’s attention is focused on relevant information rather than diluted across irrelevant domains.
warning
Routing Failures
What happens when routing goes wrong
Misrouting
When the router sends a query to the wrong domain, the model answers with the wrong knowledge base. A billing question routed to tech support gets a technically correct but contextually wrong answer. The user experience is worse than if no routing existed, because the model is confidently wrong.
Cross-Domain Queries
Some queries genuinely span multiple domains: “I was charged for a feature that doesn’t work.” This is both billing and tech support. Simple routers force a single choice; sophisticated ones can load context from multiple domains, but this partially defeats the purpose of routing.
Mitigation Strategies
Confidence thresholds: Only route when the classifier is confident; fall back to a broader context for ambiguous queries.

Multi-route: Allow routing to 2 domains when the query spans boundaries, accepting the extra context cost.

Human escalation: Route to a human when confidence is below threshold — better than a confidently wrong answer.

Feedback loops: Log routing decisions and outcomes to identify systematic misroutes and improve rules over time.
Key insight: LLM routing can hallucinate routing decisions. A fallback to a human or default agent is not optional — it’s a required safety net for any production routing system.
hub
Routing in the Layered Architecture
How routing fits with progressive disclosure and compression
The Full Stack
In a production context engineering system, routing works alongside the other patterns: Progressive disclosure (Ch 3) defines what can enter the window. Routing (this chapter) selects which domain’s context to load. Compression (Ch 4) shrinks what’s loaded. Retrieval (Ch 6) fetches specific documents within the routed domain. Each layer addresses a different failure mode.
Routing vs. Progressive Disclosure
These two patterns are complementary, not competing. Progressive disclosure manages instruction loading within a single agent. Routing manages which knowledge base and tool set the agent accesses. In practice, you use both: progressive disclosure for the agent’s own capabilities, and routing for the external context it consumes.
Key insight: If your agents serve multiple domains, add routing. Even keyword-based rules cut context bloat before you invest in LLM-based classification. The ROI is immediate and compounds with every other optimization in the stack.