claude-code - 💡(How to fix) Fix Multi-lane Workflow agents re-create shared prompt cache per lane (no cross-sibling sharing), causing excessive token usage

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When the Workflow tool fans out parallel subagents (parallel() / multiple sibling agent() lanes), each lane re-creates (cache_creation) the entire shared prompt prefix from scratch instead of reading it from cache (cache_read). Sibling lanes share only a small (~10k-token) fixed harness shell; all shared content beyond that -- identical instructions/criteria in the user message and identical system-prompt content -- is paid at the 1.25x cache-creation rate once per lane. An N-lane fan-out over a shared instruction set therefore pays ~Nx for content that could be 1x create + (N-1)x read.

Root Cause

The Anthropic prompt cache is cross-request (keyed on prefix + org, ~5-min TTL), so identical prefixes across concurrently-firing lanes could share (first creates, rest read at 0.1x). Because they don't, a fan-out pays ~Nx cache-creation (1.25x) for content that could be ~1x create + (N-1)x read (0.1x) -- a large, avoidable usage multiplier that grows linearly with lane count.

RAW_BUFFERClick to expand / collapse

Summary

When the Workflow tool fans out parallel subagents (parallel() / multiple sibling agent() lanes), each lane re-creates (cache_creation) the entire shared prompt prefix from scratch instead of reading it from cache (cache_read). Sibling lanes share only a small (~10k-token) fixed harness shell; all shared content beyond that -- identical instructions/criteria in the user message and identical system-prompt content -- is paid at the 1.25x cache-creation rate once per lane. An N-lane fan-out over a shared instruction set therefore pays ~Nx for content that could be 1x create + (N-1)x read.

Environment

  • Claude Code 2.1.158, Windows 11, native dynamic Workflow tool.

What I observed

Measured per-agent cache_creation_input_tokens vs cache_read_input_tokens from .../subagents/workflows/<wf>/agent-*.jsonl. Tried two structures to get a shared prefix to cache once and be read by siblings:

A -- shared text at the front of each lane's agent() prompt + a pre-warm pass, then fan out. ~14k-token identical prefix:

  • prewarm: cache_create ~= 39.7k
  • each of 3 lanes: cache_create ~= 38.7k, cache_read ~= 9.7k

-> shared block cache-created 4x total; no prefix sharing beyond the ~10k shell.

B -- shared content in the system prompt (built-in general-purpose agent), pre-warm then fan out:

  • prewarm: cache_create ~= 18.9k
  • each of 3 lanes: cache_create ~= 17.6k, cache_read ~= 10k

-> even system-prompt content is re-created per sibling.

Neither user-message nor system-prompt content shares across sibling lanes; only a ~10k fixed shell does. In a real multi-file audit workflow, each lane re-created ~200-250k tokens, much of it identical shared criteria.

Why it matters

The Anthropic prompt cache is cross-request (keyed on prefix + org, ~5-min TTL), so identical prefixes across concurrently-firing lanes could share (first creates, rest read at 0.1x). Because they don't, a fan-out pays ~Nx cache-creation (1.25x) for content that could be ~1x create + (N-1)x read (0.1x) -- a large, avoidable usage multiplier that grows linearly with lane count.

Expected behavior

Sibling subagents with byte-identical prompt prefixes (system prompt and/or leading user-message content) should share the cached prefix -- first lane creates, subsequent lanes read. At minimum the static portion of the subagent system prompt should be a cacheable prefix reused across sibling lanes.

Likely cause (guess)

Either no cache_control breakpoint spans the shared content, or per-agent dynamic content sits early in the prefix and diverges it before any breakpoint. Within-lane multi-turn caching works fine (high cache_read within a lane); the gap is specifically cross-sibling.

Repro

  1. In a workflow script build a ~14k-token constant SHARED.
  2. await agent(SHARED + " reply DONE", {}) (pre-warm).
  3. await parallel([0,1,2].map(i => () => agent(SHARED + " task " + i, {}))).
  4. Inspect subagents/workflows/<wf>/agent-*.jsonl usage: each lane shows ~len(SHARED) cache_creation rather than cache_read.

Suggested direction

Let workflow authors (or the runtime) mark a shared prefix as cacheable across sibling lanes -- hoist a declared shared-context block into a cache-controlled prefix identical across lanes, or expose a breakpoint after a shared system-prompt/context segment so concurrent siblings reuse it.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Sibling subagents with byte-identical prompt prefixes (system prompt and/or leading user-message content) should share the cached prefix -- first lane creates, subsequent lanes read. At minimum the static portion of the subagent system prompt should be a cacheable prefix reused across sibling lanes.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Multi-lane Workflow agents re-create shared prompt cache per lane (no cross-sibling sharing), causing excessive token usage