claude-code - 💡(How to fix) Fix v2.1.128 caching regression in parallel-team workloads (10x token cost increase) [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#56293Fetched 2026-05-06 06:31:58
View on GitHub
Comments
2
Participants
3
Timeline
10
Reactions
0
Timeline (top)
labeled ×5commented ×2cross-referenced ×1mentioned ×1

The v2.1.128 changelog claims "Fixed sub-agent progress summaries missing the prompt cache (~3x cache_creation reduction)". However, empirical analysis of session transcripts shows this fix did not land for parallel-team workloads and actually made caching 10x worse compared to v2.1.121.

Root Cause

The regression correlates with inbound SendMessage deliveries (teammate-to-teammate messages). Each inbound message appears to:

  1. Break the cache_control prefix (checkpoint)
  2. Force regeneration of all downstream content (cache_creation spike)
  3. Continue until next cache rebuild

Likely cause: v2.1.126/v2.1.128 added cache_control to sub-agent progress summaries but not to inbound teammate message injections. Teammate messages likely include non-stable fields (timestamp, message-id, routing metadata) that shift the cache prefix hash.

Supporting evidence:

  • v2.1.121 agents with 4 SendMessage calls: 2 cache collapses total
  • v2.1.128 agents with 12 SendMessage calls: 25 cache collapses out of 52 turns
  • Collapse timing correlates with inbound message delivery timestamps (within 1-2 seconds)

Fix Action

Workaround

Until this is fixed, avoid parallel-team workloads (councils, observation-council, planning-council) that trigger high SendMessage volume. Use sequential task-based sub-agents instead, which maintain healthy cache behavior at 4% miss rate.

RAW_BUFFERClick to expand / collapse

Claude Code v2.1.128 Caching Regression in Parallel-Team Workloads

Severity: High (10x token cost increase for councils/teams)
Affected Version: v2.1.128 (and v2.1.126)
Works in: v2.1.121
User: @oksanantonova
Date: 2026-05-05

Summary

The v2.1.128 changelog claims "Fixed sub-agent progress summaries missing the prompt cache (~3x cache_creation reduction)". However, empirical analysis of session transcripts shows this fix did not land for parallel-team workloads and actually made caching 10x worse compared to v2.1.121.

Evidence

Like-For-Like Comparison (Same Workload Pattern)

v2.1.121 - SendMessage-heavy sub-agent (agent-a63a0e37df5eaa2da)

  • Turns: 121
  • SendMessage calls: 4
  • Cache miss share: 4%
  • Avg cache_creation/turn: 5,648 tokens
  • Pattern: Cache builds monotonically, only 2 collapses across entire session
  • Status: ✅ Healthy

v2.1.128 - SendMessage-heavy sub-agent (agent-ab587fd4e60ffc856)

  • Turns: 52
  • SendMessage calls: 12
  • Cache miss share: 40%
  • Avg cache_creation/turn: 26,000 tokens
  • Pattern: Cache collapses every 2-4 turns, oscillates between cold (cc25K) and warm (cc3K)
  • Status: ❌ Severe regression

Regression magnitude: 10x worsening in miss share for identical workload type

Per-Version Historical Data

Analysis of 4,367 sub-agent transcript files across versions v2.1.74 through v2.1.128:

VersionAvg cache_creation/turnWorkloadStatus
v2.1.1215,534 tokenscouncils/teamsbaseline
v2.1.1268,433 tokenscouncils/teams+52% regression
v2.1.12822,713 tokenscouncils/teams+410% regression

Low-parallelism v2.1.128 workloads show normal cache behavior (3-4% miss), confirming regression is specific to parallel-team fan-out patterns.

Root Cause Analysis

The regression correlates with inbound SendMessage deliveries (teammate-to-teammate messages). Each inbound message appears to:

  1. Break the cache_control prefix (checkpoint)
  2. Force regeneration of all downstream content (cache_creation spike)
  3. Continue until next cache rebuild

Likely cause: v2.1.126/v2.1.128 added cache_control to sub-agent progress summaries but not to inbound teammate message injections. Teammate messages likely include non-stable fields (timestamp, message-id, routing metadata) that shift the cache prefix hash.

Supporting evidence:

  • v2.1.121 agents with 4 SendMessage calls: 2 cache collapses total
  • v2.1.128 agents with 12 SendMessage calls: 25 cache collapses out of 52 turns
  • Collapse timing correlates with inbound message delivery timestamps (within 1-2 seconds)

User Impact

A typical observation-council spawning 5 parallel agents:

  • v2.1.121: ~280K tokens cold-start (5 agents × 56K initial), scales linearly with reuse
  • v2.1.128: ~900K+ tokens cold-start due to repeated cache invalidations, no scaling benefit

This explains why users running councils on v2.1.128 rapidly exhaust token budgets on subscription plans.

Reproduction Steps

  1. Spawn an observation-council with 5 parallel agents
  2. Inspect session transcripts at ~/.claude/projects/[SESSION_ID]/subagents/*.jsonl
  3. Extract cache_creation_input_tokens and cache_read_input_tokens from API calls
  4. Calculate per-turn cache miss share: 1 - (cache_read / (cache_read + cache_creation))
  5. Compare to same workload in v2.1.121

Expected result: v2.1.128 shows 35-40% miss share; v2.1.121 shows 3-5% miss share for identical workload.

Workaround

Until this is fixed, avoid parallel-team workloads (councils, observation-council, planning-council) that trigger high SendMessage volume. Use sequential task-based sub-agents instead, which maintain healthy cache behavior at 4% miss rate.

Requested Fix

Ensure cache_control headers are applied to inbound teammate messages in addition to progress summaries. Specifically:

  • Tag teammate message injections with cache_control before insertion
  • Use stable content hashes (exclude timestamp/message-id from cache key)
  • Validate cache miss share drops below 5% for parallel-team workloads in testing

Files for Investigation

  • Changelog source: github.com/anthropics/claude-code main branch, versions 2.1.121, 2.1.126, 2.1.128

Investigation conducted: 2026-05-05 via SRE observation-council with 5 agents (investigator, critic, validator, reviewer, historian) analyzing 4,367 historical transcripts and performing like-for-like version comparison.

extent analysis

TL;DR

Apply cache_control headers to inbound teammate messages to fix the caching regression in parallel-team workloads.

Guidance

  • Review the changelog for versions v2.1.126 and v2.1.128 to understand the introduction of cache_control in sub-agent progress summaries.
  • Investigate the implementation of cache_control in teammate message injections to identify the missing application of cache control headers.
  • Test the fix by applying cache_control headers to inbound teammate messages and validating the cache miss share drops below 5% for parallel-team workloads.
  • Consider using stable content hashes (excluding timestamp and message-id from cache key) to prevent cache prefix hash shifts.

Example

No code snippet is provided as the issue does not include specific code references.

Notes

The fix may require modifications to the teammate message injection mechanism to include cache_control headers. Additionally, the use of stable content hashes may require changes to the cache key generation logic.

Recommendation

Apply the workaround by avoiding parallel-team workloads until the fix is implemented, and then apply the fix by ensuring cache_control headers are applied to inbound teammate messages. This is recommended because the workaround only mitigates the issue, while the fix addresses the root cause of the caching regression.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix v2.1.128 caching regression in parallel-team workloads (10x token cost increase) [2 comments, 3 participants]