7
“Each tool schema costs 500+ tokens. With 90 tools, that’s 45K tokens before the user says anything.”
- MCP (Model Context Protocol) is the emerging standard for tool integration. Tool schemas consume significant context — 20–40% of the window in production systems.
- KV-cache invalidation: Manus discovered that dynamically changing available tools invalidates the cache, causing massive latency spikes. Keep tool sets stable.
- Progressive tool disclosure — loading tool schemas only when relevant — is the primary mitigation for tool token bloat.
8
“Treat your context window like a financial budget — every token has a cost and a return.”
- Target 40–60% context utilization with a 30–40% safety margin. KV-cache hit rate is the single most important production metric.
- Prompt caching (stable prefix reuse) can achieve 90% cost savings on the cached portion. Deterministic serialization maximizes cache hits.
- The layered architecture combines all patterns: disclosure → tools → routing → retrieval → compression → budgeting. Real-world case studies show 73–87% cost reductions.
Bottom line: Context engineering is not one technique — it’s a layered system. Start with progressive disclosure (cheapest), add routing and compression, optimize retrieval, manage tools carefully, and budget every token. The compound effect of all layers is transformative.