Ch 7: Tool & Capability Management

build

The Tool Schema Problem

Why tool definitions are the hidden context cost

The Hidden Cost

Agents need tools to interact with the world: APIs, databases, file systems, search engines. Each tool requires a JSON schema definition that the model reads to understand what the tool does and how to call it. A single complex schema (nested objects, enums, parameter descriptions) can consume 500+ tokens.

The Scale Problem

// Tool schema token cost Simple tool: ~100 tokens Complex tool: ~500 tokens MCP server: ~5-15 tools // Connect a few MCP servers: 6 servers × 15 tools × 500 tokens = 45,000 tokens // That's 45K tokens BEFORE any // user interaction begins

Critical in AI: OpenAI recommends fewer than 20 tools per agent, with accuracy degrading past 10. Yet connecting a few MCP servers can easily reach 90+ tool definitions. This is the tool management crisis.

hub

MCP: The Standard Protocol

Model Context Protocol as the universal connector

What MCP Is

MCP (Model Context Protocol) has become the standard for connecting agents to external tools. Originally released by Anthropic in November 2024, it is now governed by the Agentic AI Foundation under the Linux Foundation. MCP provides a standardized interface for tool discovery, invocation, and result handling.

What MCP Solves (and Doesn’t)

MCP solves the connection problem — a universal protocol means tools work across different agent platforms without custom integrations.

MCP does NOT solve the context cost problem — each connected server still adds its tool schemas to the context window. The standardization of the interface doesn’t reduce the token cost of the schemas themselves.

Key insight: MCP is to AI agents what HTTP is to web browsers — a universal protocol for connecting to services. But just as loading too many web pages slows a browser, connecting too many MCP servers bloats the context window.

memory

The KV-Cache Invalidation Problem

Manus’s critical discovery about dynamic tool changes

The Finding

Manus discovered a critical constraint: avoid dynamically adding or removing tools mid-iteration. Tool definitions sit near the front of the context (in the stable prefix zone). Any change to tool definitions invalidates the KV-cache for all subsequent tokens, forcing the model to recompute attention from scratch.

The Impact

KV-cache invalidation means significant latency spikes every time tools change. In a long-running agent task, dynamically adding a tool mid-conversation can add seconds of recomputation. The practical rule: define your tool set at the start of the session and keep it stable. If you need different tools for different phases, use separate agent sessions.

Key insight: This single finding changed how production agent systems manage their tool registries. Tools are no longer treated as dynamic capabilities that can be swapped in and out — they’re treated as part of the stable infrastructure that must remain constant for performance.

description

Description Quality

The model selects tools based on descriptions written for humans

The Problem

The model selects tools based on their descriptions, but most MCP server authors write descriptions for humans, not models. Too vague and the model picks the wrong tool. Too verbose and you waste context on a single schema. The quality of tool descriptions directly determines tool selection accuracy.

Good vs Bad Descriptions

// Bad: vague, human-oriented "description": "Search for stuff" // Bad: too verbose "description": "This tool allows you to search our comprehensive database of customer records including..." // 200 tokens of description // Good: precise, model-oriented "description": "Search customer records by name, email, or account ID. Returns top 10 matches with contact info and account status."

Open Problems

Tool overlap: Two different MCP servers might offer similar capabilities (two search tools, two file readers). Without deduplication or preference logic, the model picks arbitrarily.

No versioning: When an MCP server updates its tool schemas, the agent has no way to know. Stale descriptions in cache cause silent failures.

No quality standard: MCP standardizes the interface but not description quality, schema conventions, or documentation depth.

Rule of thumb: Write tool descriptions as if you’re explaining the tool to a new engineer who needs to decide when to use it. Include: what it does, what inputs it needs, what it returns, and when to use it vs. alternatives.

shield

Security Surface

Every tool is an attack vector

The Attack Surface

Each connected MCP server is an attack surface. Tool outputs can contain prompt injection attempts — malicious instructions embedded in data that the model processes as part of its context. The more tools available, the larger the exposure. A compromised or malicious MCP server can inject instructions that override the agent’s system prompt.

Mitigation

Principle of least privilege: Give agents access only to the tools they need for the current task, not every tool available.

Output sanitization: Filter tool outputs for known injection patterns before injecting into context.

Sandboxing: Run MCP servers in isolated environments with limited permissions.

Audit logging: Log every tool invocation and its output for security review.

Critical in AI: Security surface scales linearly with tool count. An agent with 90 tools has 9× the attack surface of an agent with 10. Tool management is not just a context optimization — it’s a security requirement.

filter_list

Progressive Tool Disclosure

Applying the progressive disclosure pattern to tools

The Pattern

Just as Agent Skills use progressive disclosure for instructions (Ch 3), tools can use the same pattern. At startup, the agent sees only tool categories and brief descriptions. When a category is relevant, the full tool schemas load. This prevents 50K tokens of tool schemas from consuming the context before any work begins.

Dynamic Namespacing

OpenAI’s approach uses dynamic namespacing — agents discover relevant tools on demand instead of being overwhelmed with hundreds of options. The agent starts with a small set of core tools and can request additional tools from specific namespaces when the task requires them. This is the tool equivalent of lazy loading in web development.

Key insight: Progressive tool disclosure converts a fixed upfront cost (all tool schemas loaded) into a variable cost (only relevant tools loaded). Combined with stable tool sets per session (to preserve KV-cache), this is the current best practice for tool management.

analytics

Measuring Tool Cost

Auditing the token impact of your tool registry

The Audit

The first step in tool management is measuring the problem. Count how many tokens your tool schemas consume before any user interaction. This number is usually higher than expected. For many production systems, tool schemas are the single largest fixed cost in the context window — larger than the system prompt, larger than few-shot examples.

Audit Checklist

// Tool registry audit 1. Count total tools across all MCP servers 2. Measure tokens per tool schema 3. Calculate total tool token cost 4. Compare to context window size 5. Identify overlapping tools 6. Identify rarely-used tools 7. Measure tool selection accuracy 8. Check for stale/outdated schemas

Optimization Strategies

Schema compression: Remove verbose descriptions, simplify parameter names, eliminate optional parameters the agent never uses.

Tool deduplication: When two MCP servers offer similar tools, choose one and remove the other.

Tiered loading: Core tools always loaded; specialized tools loaded on demand.

Schema caching: Use prompt caching to avoid paying for tool schemas on every request.

Key insight: If your agents connect to multiple MCP servers, audit the token cost. Most teams are shocked to discover that tool schemas consume 20–40% of their context window before any actual work begins.

trending_up

The Future of Tool Management

Where the field is heading

Unsolved Problems

Description quality standardization: MCP standardizes the interface but not the quality of tool descriptions. A community standard for model-oriented descriptions would dramatically improve tool selection accuracy.

Schema versioning: When MCP servers update their tools, agents need to know. There’s no versioning protocol for tool contracts yet.

Cross-server deduplication: Automatically detecting and resolving overlapping tools across MCP servers remains manual.

Emerging Solutions

Tool registries: Centralized catalogs of MCP servers with quality ratings and compatibility information.

Automatic schema optimization: LLMs that rewrite tool descriptions for better model comprehension.

Tool recommendation: Systems that suggest which tools to load based on the current task, similar to how IDEs suggest imports.

Key insight: Tool management is the youngest and least mature area of context engineering. MCP solved the connection problem in 2024; the context cost problem, description quality problem, and security problem are still actively being worked on in 2026.

Ch 7 — Tool & Capability Management