Tool-Heavy / Long-Running Agents

This guide covers the practical knobs that matter when an agent runs for tens of turns, dispatches many tool calls, and has to do it cheaply enough to put into production.

If you are building a coding-style agent, a research agent, or any workflow that loops through dozens of tool calls, start here.

1. Cost: enable prompt caching

Long agents repeat the same system prompt and tool catalog on every turn. Without caching that is hundreds of redundant prompt tokens per turn.

import { createProvider } from "@cognipeer/agent-sdk";

const provider = createProvider({
  provider: "anthropic",
  apiKey: process.env.ANTHROPIC_API_KEY!,
  prompt_caching: { enabled: true },
});

This places cache breakpoints on the stable prefix [system + tools]. Anthropic charges roughly 10% of normal input pricing for cache hits; Bedrock applies the same model via cachePoint blocks.

OpenAI / Azure cache automatically once the prompt exceeds the provider's threshold — no breakpoint configuration needed.

Verify the cache is working by inspecting response.usage.cachedInputTokens and cachedWriteTokens, surfaced uniformly across providers.

2. Throughput: parallel tool execution

When the model issues multiple tool calls in a single turn (e.g. "read these 5 files"), serial execution wastes wall-clock time and burns the user's patience.

const agent = createSmartAgent({
  model,
  tools,
  runtimeProfile: "deep",            // already defaults to 3
  limits: { maxParallelTools: 4 },   // bump higher if your tools are I/O bound
});

Approval-required tools still run sequentially so the first pending approval can short-circuit the turn. Order of tool results is preserved (Bedrock / Anthropic strict-pairing safe).

3. Tool catalog size: use skills for progressive disclosure

When an agent has dozens or hundreds of possible tools, the tool catalog itself becomes part of the cost and quality problem. Put optional or domain-specific bundles behind skills so the model sees a cheap header first and binds concrete tools only when needed.

import { createSmartAgent, type Skill } from "@cognipeer/agent-sdk";

const jiraSkill: Skill = {
  key: "mcp:jira",
  title: "Jira",
  header: "search and update Jira issues when project work involves tickets",
  prompt: "Prefer read-only tools first. Use write tools only when the user asks to change Jira.",
  listToolIndex: () => jiraToolHeaders,
  bindTools: (names) => bindJiraTools(names),
  rankToolHeaders: (query, headers) => rankJiraTools(query, headers),
  defaultBindNames: ["jira_search_issues", "jira_get_issue"],
};

const agent = createSmartAgent({
  model,
  skills: [jiraSkill],
  skillPolicy: {
    maxOpenSkills: 2,
    maxBoundToolsPerSkill: 8,
    maxBoundToolsTotal: 24,
    modelTier: "large",
  },
});

Small skills bind all tools on open_skill. Large skills return a ranked tool index and require bind_skill_tools for the selected subset. See Skills & Progressive Disclosure for the full contract.

4. Idempotent reads: opt into tool caching

If the agent re-reads the same file, fetches the same URL, or looks up the same id within a single invoke, the cache pays for itself:

const fetchFile = createTool({
  name: "fetch_file",
  description: "Read a static file by path",
  schema: z.object({ path: z.string() }),
  func: async ({ path }) => fs.promises.readFile(path, "utf8"),
  cache: { keyFn: (a) => a.path }, // dedupe by path; ignore irrelevant args
});

Cached hits are recorded with fromCache: true on state.toolHistory so you can audit them.

Do NOT enable cache for non-deterministic tools (live API state, time-sensitive lookups). The agent will not see fresh results.

5. Resilience: retry policies

External APIs flake. Declare retry directly on the tool:

const search = createTool({
  name: "search",
  description: "Search API",
  schema: z.object({ q: z.string() }),
  func: async ({ q }) => client.search(q),
  retry: {
    maxRetries: 3,
    backoffMs: 250,                              // doubles each attempt
    shouldRetry: (err) => !`${err}`.includes("UNAUTHORIZED"),
    circuitBreakerThreshold: 5,                  // open after N consecutive failures
  },
});

Provider-level retries (429 / 5xx) are already automatic — see Native Providers.

6. Safety: budget enforcement

Always set hard ceilings on production agents. They cost roughly the same to declare as they save when things go wrong.

const agent = createSmartAgent({
  model,
  tools,
  limits: {
    maxToolCalls: 25,
    maxParallelTools: 4,
    maxContextTokens: 40_000,
    maxTotalOutputTokens: 80_000,
    maxWallClockMs: 5 * 60_000,
    maxCostUsd: 1.00,
  },
  costEstimator: ({ modelName, inputTokens, outputTokens, cachedInputTokens }) => {
    // Plug in your real pricing table here.
    if (modelName?.startsWith("claude-")) {
      return (inputTokens - (cachedInputTokens || 0)) * 3e-6
        + (cachedInputTokens || 0) * 0.3e-6
        + outputTokens * 15e-6;
    }
    return 0;
  },
});

A breach emits a metadata event with limitBreached: string and exits the loop cleanly. result.content and result.state still contain whatever the agent managed to produce.

7. Context survival: summarization

Tool-heavy agents accumulate massive payloads. Configure summarization so the model can keep thinking when context pressure builds:

const agent = createSmartAgent({
  model,
  tools,
  summarization: {
    summaryTriggerTokens: 30_000,
    summaryMode: "incremental",
  },
  toolResponses: {
    defaultPolicy: "summarize_archive",
    toolResponseRetentionByTool: {
      // Critical tool whose recent results must stay verbatim
      read_skills: "keep_full",
    },
  },
});

When context is compacted, the raw payload remains recoverable via get_tool_response — the runtime injects that tool automatically once a recovery marker appears in the transcript.

8. Reflection budget

If you enable reasoning.reflection to give the agent a post-tool think-pause, cap it so a 50-turn run does not produce 50 extra model calls:

const agent = createSmartAgent({
  model,
  tools,
  reasoning: {
    enabled: true,
    level: "medium",
    reflection: {
      cadence: "after_tool",
      maxPerRun: 5,        // hard cap per invoke (run-scoped)
      everyNTurns: 3,      // reflect every 3 tool turns
    },
  },
});

Cadence options

Cadence	Fires
`off`	Never.
`every_turn`	Every loop turn.
`after_tool`	After any turn that ran tools.
`on_branch`	After a tool turn whose tool-name set differs from the previous tool turn (a genuine strategy change).
`initial_then_after_tool`	Once up-front as a planning note, then like `after_tool`. Default for `level: "medium"` / `"high"`.

level: "minimal" selects native effort: "minimal" and disables reflection; "low" defaults to the on_branch cadence.

Hooks and routing

Reflection exposes lifecycle hooks and an optional destination for the note:

reflection: {
  cadence: "after_tool",
  // Override the cadence decision entirely (throttles still apply on top).
  shouldReflect: ({ turn, ranToolsThisTurn }) => ranToolsThisTurn && turn % 2 === 0,
  // Customize the prompt body sent to the model.
  buildPrompt: ({ defaultPrompt, maxChars }) => `${defaultPrompt}\nKeep it under ${maxChars} chars.`,
  // Side-effect hook after each reflection record is produced.
  onReflection: (record) => myTimeline.push(record),
  // Route the note: "memory" (low-confidence MemoryFact), "plan" (plan.lastReflection), or "none".
  feedTo: "memory",
}

Near-identical consecutive reflections are suppressed automatically so the prompt does not accumulate restated insights.

9. Delegation: dispatch work to sub-agents safely

For multi-agent workflows, runtime delegation policy is enforced — not just documented in the prompt:

const codingAgent = createSmartAgent({ /* ... */ });
const reviewerAgent = createSmartAgent({ /* ... */ });

const parent = createSmartAgent({
  model,
  tools: [
    codingAgent.asTool({ toolName: "delegate_coding", description: "..." }),
    reviewerAgent.asTool({ toolName: "delegate_review", description: "..." }),
  ],
  customProfile: {
    extends: "research",
    delegation: {
      mode: "automatic",
      maxDelegationDepth: 2,
      maxChildCalls: 6,
      childContextPolicy: "scoped",
    },
  },
});

Refused delegations (depth/budget exceeded, mode: "off") return a structured error to the parent model instead of silently looping.

10. Concurrency-safe agents

The runtime now creates per-invoke plan / todo / tool-history references, so the same agent instance can serve many concurrent users without state crosstalk:

// Safe in a server context
await Promise.all([
  agent.invoke({ messages: [...] }),
  agent.invoke({ messages: [...] }),
  agent.invoke({ messages: [...] }),
]);

11. Observability

Always enable tracing for long-running agents. The structured event stream and trace session are how you debug "why did this run cost $4?":

const agent = createSmartAgent({
  model,
  tools,
  tracing: { enabled: true, logData: false }, // metrics only, no payload capture
});

Watch for these events:

summarization — how often the runtime had to compact context
tool_call with phase: "error" — flaky tools or broken contracts
metadata.limitBreached — budget tripped
reflection — how much reflection cost you paid

Recommended starting point for tool-heavy agents

const agent = createSmartAgent({
  model: fromNativeProvider(
    createProvider({
      provider: "anthropic",
      apiKey: process.env.ANTHROPIC_API_KEY!,
      prompt_caching: { enabled: true },
      retry: { maxRetries: 4 },
    }),
    { model: "claude-sonnet-4-20250514", maxTokens: 8192 },
  ),
  tools,
  runtimeProfile: "deep",
  planning: { mode: "todo", replanPolicy: "on_failure" },
  reasoning: {
    enabled: true,
    level: "medium",
    reflection: { cadence: "after_tool", maxPerRun: 5, everyNTurns: 3 },
  },
  limits: {
    maxToolCalls: 25,
    maxTotalOutputTokens: 80_000,
    maxWallClockMs: 5 * 60_000,
    maxCostUsd: 1.00,
  },
  costEstimator: myCostTable,
  tracing: { enabled: true },
});

This is the configuration shape that survives real production load.

Tool-Heavy / Long-Running Agents ​

1. Cost: enable prompt caching ​

2. Throughput: parallel tool execution ​

3. Tool catalog size: use skills for progressive disclosure ​

4. Idempotent reads: opt into tool caching ​

5. Resilience: retry policies ​

6. Safety: budget enforcement ​

7. Context survival: summarization ​

8. Reflection budget ​

Cadence options ​

Hooks and routing ​

9. Delegation: dispatch work to sub-agents safely ​

10. Concurrency-safe agents ​

11. Observability ​

Recommended starting point for tool-heavy agents ​