Limits and Token Management
Limit knobs
maxToolCalls– total tool executions allowed across the entire invocation. Once reached, additional tool calls are skipped and a finalize message is injected.maxParallelTools– maximum concurrent tool executions per agent turn (default 1). Adjust to balance throughput vs. rate limits.maxToken– estimated token threshold for the next agent turn. Exceeding this triggers the summarization node before the model call.contextTokenLimit– desired size of the live transcript after summarization (used as a target, not a hard cap).summaryTokenLimit– target length for each generated summary chunk (defaults to a generous value if omitted).
Tool limit finalize
When the assistant proposes tool calls but toolCallCount >= maxToolCalls, the tools node:
- Emits
tool_callevents withphase: "skipped"for the overflow calls. - Appends tool response messages noting the skip.
- Invokes
toolLimitFinalize, which injects a system message instructing the model to answer directly.
On the next agent turn, the model sees the finalize notice and must produce a direct assistant response without more tool calls.
Summarization flow
Summarization is enabled by default for smart agents. It activates when:
estimatedTokens(messages) > limits.maxTokenSteps:
- Chunk the transcript while keeping tool call/response pairs together.
- Summarize each chunk using the configured model.
- Merge partial summaries iteratively to respect
summaryTokenLimit. - Replace tool responses with
SUMMARIZED executionId:'...'markers. - Move original tool outputs to
toolHistoryArchived. - Add a synthetic assistant/tool pair labelled
context_summarizecontaining the merged summary. - Emit a
summarizationevent and resettoolHistoryfor future runs.
Disable summarization entirely via summarization: false. When disabled, maxToken is ignored.
Token heuristics
countApproxTokens(text) estimates tokens using Math.ceil(text.length / 4). It avoids provider-specific encoders and keeps the runtime dependency-free. If you need precise counts, pre-truncate content or swap in your own estimation before calling invoke.
Tips
- Return concise tool payloads to minimize summarization churn. Keep raw content accessible via IDs or
get_tool_response. - Increase
summaryTokenLimitif summaries feel too lossy, but note that larger summaries consume more budget. - For conversations with user-provided long context, consider pre-summarizing or chunking prior to passing into the agent.
- Monitor
summarizationevents to visualize how often compaction occurs and whether limits need tuning.