claude-code - 💡(How to fix) Fix [FEATURE] Cache read tokens from parallel subagent dispatch accumulate without upper bound [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46421Fetched 2026-04-11 06:20:44
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Timeline (top)
labeled ×3commented ×1

Fix Action

Fix / Workaround

When dispatching parallel subagents (via the Agent tool with allowParallel: true), each subagent session re-reads the full conversation history and parent context from cache. This causes cache_read tokens to accumulate rapidly and without upper bound across parallel subagent dispatches.

Cache read tokens should either:

  1. Not bill to parent session — subagent cache reads are subagent overhead, not parent overhead
  2. Deduplicate across subagents — if multiple subagents read the same context, count it once
  3. Have a clear cap — set a maximum cache read budget per subagent dispatch to prevent runaway accumulation
  • This is distinct from issue #45958 (subagent notification stall) — this is specifically about cache_read token accumulation inflating the parent session's usage metrics
  • The same pattern affects CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 parallel dispatch workflows
  • Not limited to long-running sessions; even short parallel tasks trigger multiplicative cache reads

Code Example

Subagent 1: cache_read = X tokens
Subagent 2: cache_read = X tokens  
Subagent 3: cache_read = X tokens
Total: 3X cache_read tokens billed to parent session
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this has not been reported yet
  • This is a single bug report
  • I am using the latest version of Claude Code

What's Wrong?

When dispatching parallel subagents (via the Agent tool with allowParallel: true), each subagent session re-reads the full conversation history and parent context from cache. This causes cache_read tokens to accumulate rapidly and without upper bound across parallel subagent dispatches.

Reproduction

  1. Spawn 3 parallel subagents simultaneously (each running a long task)
  2. Observe that each subagent independently re-chats the same conversation history from cache
  3. Cache read tokens accumulate multiplicatively: 3 agents × N context tokens each = 3N cache reads
  4. A 90-minute parallel subagent session can burn ~15M cache_read tokens with no user-visible benefit

Observed Symptom

Subagent 1: cache_read = X tokens
Subagent 2: cache_read = X tokens  
Subagent 3: cache_read = X tokens
Total: 3X cache_read tokens billed to parent session

The parent session is billed for cache reads that were done independently by each subagent — but subagents are supposed to be isolated. They should not be inflating the parent session's cache read token count.

Expected Behavior

Cache read tokens should either:

  1. Not bill to parent session — subagent cache reads are subagent overhead, not parent overhead
  2. Deduplicate across subagents — if multiple subagents read the same context, count it once
  3. Have a clear cap — set a maximum cache read budget per subagent dispatch to prevent runaway accumulation

Technical Notes

  • This is distinct from issue #45958 (subagent notification stall) — this is specifically about cache_read token accumulation inflating the parent session's usage metrics
  • The same pattern affects CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 parallel dispatch workflows
  • Not limited to long-running sessions; even short parallel tasks trigger multiplicative cache reads

Suggested Labels

area:agents, area:cost, enhancement

extent analysis

TL;DR

Implement a mechanism to deduplicate cache reads across subagents or set a clear cap on cache read tokens per subagent dispatch to prevent accumulation.

Guidance

  • Investigate modifying the Agent tool to implement a cache read deduplication mechanism, ensuring that each unique cache read is only counted once across all subagents.
  • Consider introducing a cacheReadBudget parameter for subagent dispatches, allowing users to set a maximum cache read token limit per subagent to prevent runaway accumulation.
  • Review the billing logic for cache reads to determine if it's possible to exclude subagent cache reads from the parent session's usage metrics.
  • Explore the impact of CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 on cache read token accumulation and determine if a similar solution can be applied to this workflow.

Example

No code example is provided due to the lack of specific implementation details in the issue.

Notes

The solution may require significant changes to the underlying architecture of the Agent tool and the cache read billing logic. It's essential to consider the potential performance implications of implementing a deduplication mechanism or cache read budgeting system.

Recommendation

Apply a workaround by introducing a cache read deduplication mechanism or a clear cap on cache read tokens per subagent dispatch, as this will help prevent cache read token accumulation and provide a more accurate representation of parent session usage metrics.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING