Limits and Token Management

Limit knobs

maxToolCalls – total tool executions allowed across the entire invocation. Once reached, additional tool calls are skipped and a finalize message is injected.
maxParallelTools – maximum concurrent tool executions per agent turn (default 1). Adjust to balance throughput vs. rate limits.
maxToken – estimated token threshold for the next agent turn. Exceeding this triggers the summarization node before the model call.
contextTokenLimit – desired size of the live transcript after summarization (used as a target, not a hard cap).
summaryTokenLimit – target length for each generated summary chunk (defaults to a generous value if omitted).

Tool limit finalize

When the assistant proposes tool calls but toolCallCount >= maxToolCalls, the tools node:

Emits tool_call events with phase: "skipped" for the overflow calls.
Appends tool response messages noting the skip.
Invokes toolLimitFinalize, which injects a system message instructing the model to answer directly.

On the next agent turn, the model sees the finalize notice and must produce a direct assistant response without more tool calls.

Summarization flow

Summarization is enabled by default for smart agents. It activates when:

estimatedTokens(messages) > limits.maxToken

Steps:

Chunk the transcript while keeping tool call/response pairs together.
Summarize each chunk using the configured model.
Merge partial summaries iteratively to respect summaryTokenLimit.
Replace tool responses with SUMMARIZED executionId:'...' markers.
Move original tool outputs to toolHistoryArchived.
Add a synthetic assistant/tool pair labelled context_summarize containing the merged summary.
Emit a summarization event and reset toolHistory for future runs.

Disable summarization entirely via summarization: false. When disabled, maxToken is ignored.

Token heuristics

countApproxTokens(text) estimates tokens using Math.ceil(text.length / 4). It avoids provider-specific encoders and keeps the runtime dependency-free. If you need precise counts, pre-truncate content or swap in your own estimation before calling invoke.

Tips

Return concise tool payloads to minimize summarization churn. Keep raw content accessible via IDs or get_tool_response.
Increase summaryTokenLimit if summaries feel too lossy, but note that larger summaries consume more budget.
For conversations with user-provided long context, consider pre-summarizing or chunking prior to passing into the agent.
Monitor summarization events to visualize how often compaction occurs and whether limits need tuning.