Coding Agents often show an extreme imbalance in real-world execution: input tokens can reach millions while output tokens remain only in the thousands. In addition to context expansion caused by reading project source files, another major cause is mounting a large number of tools, especially MCP tools:
- Very low tool execution rate: many tools are never actually invoked, yet their descriptions continue to occupy tokens and create serious waste.
- Schemas and descriptions consume the context window: tool descriptions and JSON Schemas can be large, making it easy to hit the context limit and forcing the system to start context compression early in task execution.
Core Optimization Strategy
1. Tool Lazy Loading (On-Demand Loading)
Inspired by the on-demand loading mechanism used by Skills, tool lazy loading is introduced together with lifecycle management to significantly reduce unnecessary tool transmission.
Loading strategy: instead of keyword retrieval or RAG search, it follows the same progressive disclosure idea used by Skills:
- Provide the model with only a tool description summary of no more than 50 characters;
- Let the model autonomously decide and select which tools need to be loaded.
Summary sources:
- Internal tools: short summaries are configured directly when tools are registered;
- MCP tools: tool descriptions from each MCP Server are automatically summarized.
2. Tool Lifecycle Management
Loaded tools are automatically unloaded at the end of a single conversation, with their lifecycle strictly limited to one conversation. If they are needed later, they must be loaded again.
Design rationale: if the lifecycle is expanded to the session level, later Coding Agent requests will accumulate more and more tools as conversation turns increase, continuously squeezing the context window. Therefore, unloading tools at the end of each conversation is the more reasonable choice.
3. Automatic Unloading
Tool lazy loading and lifecycle management have already been implemented in v1.6.0, but one issue remains: if a single conversation is extremely long and executes hundreds or even thousands of tool calls, a large number of loaded tools can still accumulate in the later stages of the conversation, even with conversation-level lifecycle management. In practice, tools usually have a very low reuse rate after their first call.
To address this, an automatic unloading mechanism is introduced: during the conversation, the Coding Agent dynamically identifies and unloads tools that are no longer needed for the current task, solving the tool accumulation problem in long conversation scenarios.
Supporting Optimizations
- Parallel loading: similar to Skill lazy loading, tool loading requires model participation in selection. If a separate loading request is issued, it also consumes a large number of tokens. The current implementation tries to parallelize tool loading/unloading with other tool executions to reduce extra overhead.
- Whitelist mechanism: high-frequency foundational tools such as
todoandtaskare directly added to the whitelist and provided by default, avoiding repeated loading by the Coding Agent.
Future Improvements
- Multi-layer progressive disclosure: the current progressive disclosure of tools themselves is not friendly enough for Skill and Subagent tools. At this stage, Skill tools are directly included in the whitelist, while Subagents are not specially handled. In the future, these tools should be configured with more precise summary descriptions instead of simply being whitelisted.
- Dynamic whitelist: lazy-loading decisions are based on the number of occupied tokens. If a tool’s information occupies very few tokens, it can be directly added to the whitelist. Even if repeated invalid calls are avoided, the token cost is not worthwhile if the tool requires even one separate loading request.
- Subagent & Agent Team: Subagents currently disable the tool lazy loading mechanism directly, but some subagents may use the full set of tools, so tool lazy loading should be enabled for them.
- Skill lifecycle: Skills should also be included in lifecycle management to avoid occupying the context window for too long. This has not yet been implemented.