Ch 5: Context Routing — Context Engineering

error

The Multi-Source Problem

Why loading all knowledge bases for every query fails

The Scenario

A multi-domain agent has access to multiple knowledge bases, tool sets, and instruction sets. A billing question doesn’t need the onboarding knowledge base. A technical support query doesn’t need the refund policy. Loading all of them for every query wastes context and degrades accuracy by diluting the model’s attention with irrelevant information.

The Cost of No Routing

Without routing, every query carries the full weight of every knowledge domain. For an enterprise support agent with 5 domains, each with 10,000 tokens of context, that’s 50,000 tokens loaded before the model even sees the question. Most of it is irrelevant noise competing for attention.

Critical in AI: Context routing is the gatekeeper pattern. It decides what enters the context window before the main model starts reasoning. Getting routing wrong means the model either lacks critical context (under-routing) or drowns in irrelevant context (over-routing).

text_fields

Rule-Based Routing

Fast and predictable, but rigid

How It Works

Rule-based routing uses keyword matching or pattern detection to classify queries. If the query contains “refund,” “return,” or “money back,” route to the refund knowledge base. If it contains “password,” “login,” or “account access,” route to technical support.

Example

// Rule-based router function route(query) { const q = query.toLowerCase(); if (q.match(/refund|return|money back/)) return "refund_kb"; if (q.match(/password|login|account/)) return "tech_support_kb"; if (q.match(/invoice|charge|billing/)) return "billing_kb"; return "general_kb"; // fallback }

Strengths & Weaknesses

Strengths: Near-instant latency (no LLM call), completely predictable, easy to debug, zero cost.

Weaknesses: Rigid — misses anything outside expected patterns. “I was charged twice and want my money back” matches both billing and refund rules. Requires manual updates for new domains. Cannot handle nuanced or ambiguous queries.

Key insight: Rule-based routing is the right starting point for most teams. Even simple keyword rules cut context bloat significantly before you invest in more sophisticated approaches. Start here, upgrade when you hit accuracy limits.

smart_toy

LLM-Based Routing

Using the model itself to classify and route

How It Works

LLM-based routing uses a model (often a smaller, faster one) to classify the query and select the appropriate context source. The router model receives the query plus a list of available knowledge domains with descriptions, and returns the best match. It understands nuance, handles ambiguity, and adapts to new patterns automatically.

Example

// LLM-based router prompt system: "Classify the user query into one of these domains: - billing: charges, invoices, plans - refunds: returns, money back - tech_support: login, bugs, errors - onboarding: setup, getting started Return only the domain name." user: "I was charged twice and want my money back" → "refunds" // Understands intent

Strengths & Weaknesses

Strengths: Understands nuance and intent, handles ambiguous queries, adapts to new patterns without code changes, can route to multiple domains when appropriate.

Weaknesses: Adds an inference call before the main task (latency + cost), can hallucinate routing decisions, harder to debug when it misroutes, requires a fallback strategy.

Key insight: LLM routing adds latency and cost upfront but saves far more downstream by loading only relevant context. The net effect is usually positive — a small router call that prevents 30K tokens of irrelevant context from entering the main model’s window.

account_tree

Hierarchical Routing

A lead agent triages to specialized sub-agents

The Pattern

Hierarchical routing uses a lead agent to triage queries to specialized sub-agents, each with its own focused context window. The lead agent carries minimal context (just enough to classify), while sub-agents carry deep domain-specific context. This is the multi-agent version of context routing.

Architecture

// Hierarchical routing Lead Agent (lightweight context) ├─→ Billing Agent │ └─ billing KB, billing tools ├─→ Refund Agent │ └─ refund KB, payment tools ├─→ Tech Support Agent │ └─ tech KB, diagnostic tools └─→ Onboarding Agent └─ setup KB, config tools

When to Use

Hierarchical routing makes sense when domains are deeply different — different tools, different knowledge bases, different behavioral patterns. If domains share most of their context and differ only in instructions, progressive disclosure (Ch 3) with skill-based identity management is simpler and cheaper.

Why it matters: Hierarchical routing trades orchestration complexity for context efficiency. Each sub-agent has a clean, focused context window. The cost is managing inter-agent communication and ensuring the lead agent routes correctly.

merge

Hybrid Routing

Combining methods for production reliability

The Production Pattern

Most production systems use hybrid routing that combines multiple methods. A typical pattern: rule-based routing first (fast, cheap, handles obvious cases), then LLM-based routing for queries that don’t match any rule (handles nuance), with a fallback to a default agent or human escalation when confidence is low.

Hybrid Flow

// Hybrid routing pipeline 1. Rule-based check (instant) → Match? Route directly. → No match? Continue. 2. LLM classifier (100-200ms) → High confidence? Route. → Low confidence? Continue. 3. Fallback → Default agent or → Human escalation

Why Hybrid Wins

Hybrid routing optimizes for the common case: 70–80% of queries match simple rules and are routed instantly at zero cost. The remaining 20–30% get the more expensive LLM classification. This gives you the accuracy of LLM routing at a fraction of the cost.

Rule of thumb: Start with rule-based routing for your top 10 query patterns. Add LLM routing when you see misroutes on ambiguous queries. Add hierarchical routing only when domains need completely different tool sets and knowledge bases.

savings

Downstream Savings

How routing multiplies the value of other techniques

The Multiplier Effect

Routing’s savings come downstream, not from the routing step itself. By loading only relevant context, you reduce tokens for the main inference call. This compounds with compression: if you route to a 10K-token knowledge base instead of loading all 50K tokens, and then compress that 10K to 5K, you’ve achieved a 10× reduction from the unoptimized baseline.

Multi-Agent Efficiency

In multi-agent systems, routing prevents the context duplication problem. Without routing, every agent carries every piece of context. With routing, each agent carries only what it needs. For a system with 5 agents and 5 domains, routing can reduce total context consumption by 80% (each agent carries 1/5 instead of 5/5).

Key insight: Routing is the highest-leverage optimization for multi-domain systems. It doesn’t just save tokens — it improves accuracy by ensuring the model’s attention is focused on relevant information rather than diluted across irrelevant domains.

warning

Routing Failures

What happens when routing goes wrong

Misrouting

When the router sends a query to the wrong domain, the model answers with the wrong knowledge base. A billing question routed to tech support gets a technically correct but contextually wrong answer. The user experience is worse than if no routing existed, because the model is confidently wrong.

Cross-Domain Queries

Some queries genuinely span multiple domains: “I was charged for a feature that doesn’t work.” This is both billing and tech support. Simple routers force a single choice; sophisticated ones can load context from multiple domains, but this partially defeats the purpose of routing.

Mitigation Strategies

Confidence thresholds: Only route when the classifier is confident; fall back to a broader context for ambiguous queries.

Multi-route: Allow routing to 2 domains when the query spans boundaries, accepting the extra context cost.

Human escalation: Route to a human when confidence is below threshold — better than a confidently wrong answer.

Feedback loops: Log routing decisions and outcomes to identify systematic misroutes and improve rules over time.

Key insight: LLM routing can hallucinate routing decisions. A fallback to a human or default agent is not optional — it’s a required safety net for any production routing system.

hub

Routing in the Layered Architecture

How routing fits with progressive disclosure and compression

The Full Stack

In a production context engineering system, routing works alongside the other patterns: Progressive disclosure (Ch 3) defines what can enter the window. Routing (this chapter) selects which domain’s context to load. Compression (Ch 4) shrinks what’s loaded. Retrieval (Ch 6) fetches specific documents within the routed domain. Each layer addresses a different failure mode.

Routing vs. Progressive Disclosure

These two patterns are complementary, not competing. Progressive disclosure manages instruction loading within a single agent. Routing manages which knowledge base and tool set the agent accesses. In practice, you use both: progressive disclosure for the agent’s own capabilities, and routing for the external context it consumes.

Key insight: If your agents serve multiple domains, add routing. Even keyword-based rules cut context bloat before you invest in LLM-based classification. The ROI is immediate and compounds with every other optimization in the stack.

Ch 5 — Context Routing