claude-code - 💡(How to fix) Fix v2.1.128 caching regression in parallel-team workloads (10x token cost increase) [2 comments, 3 participants]

claude-code2026-05-05 13:08:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#56293•Fetched 2026-05-06 06:31:58

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

labeled ×5commented ×2cross-referenced ×1mentioned ×1

The v2.1.128 changelog claims "Fixed sub-agent progress summaries missing the prompt cache (~3x cache_creation reduction)". However, empirical analysis of session transcripts shows this fix did not land for parallel-team workloads and actually made caching 10x worse compared to v2.1.121.

Root Cause

The regression correlates with inbound SendMessage deliveries (teammate-to-teammate messages). Each inbound message appears to:

Break the cache_control prefix (checkpoint)
Force regeneration of all downstream content (cache_creation spike)
Continue until next cache rebuild

Likely cause: v2.1.126/v2.1.128 added cache_control to sub-agent progress summaries but not to inbound teammate message injections. Teammate messages likely include non-stable fields (timestamp, message-id, routing metadata) that shift the cache prefix hash.

Supporting evidence:

v2.1.121 agents with 4 SendMessage calls: 2 cache collapses total
v2.1.128 agents with 12 SendMessage calls: 25 cache collapses out of 52 turns
Collapse timing correlates with inbound message delivery timestamps (within 1-2 seconds)

Fix Action

Workaround

Until this is fixed, avoid parallel-team workloads (councils, observation-council, planning-council) that trigger high SendMessage volume. Use sequential task-based sub-agents instead, which maintain healthy cache behavior at 4% miss rate.

RAW_BUFFERClick to expand / collapse

Claude Code v2.1.128 Caching Regression in Parallel-Team Workloads

Severity: High (10x token cost increase for councils/teams)
Affected Version: v2.1.128 (and v2.1.126)
Works in: v2.1.121
User: @oksanantonova
Date: 2026-05-05

Summary

Evidence

Like-For-Like Comparison (Same Workload Pattern)

v2.1.121 - SendMessage-heavy sub-agent (agent-a63a0e37df5eaa2da)

Turns: 121
SendMessage calls: 4
Cache miss share: 4%
Avg cache_creation/turn: 5,648 tokens
Pattern: Cache builds monotonically, only 2 collapses across entire session
Status: ✅ Healthy

v2.1.128 - SendMessage-heavy sub-agent (agent-ab587fd4e60ffc856)

Turns: 52
SendMessage calls: 12
Cache miss share: 40%
Avg cache_creation/turn: 26,000 tokens
Pattern: Cache collapses every 2-4 turns, oscillates between cold (cc~~25K) and warm (cc~~3K)
Status: ❌ Severe regression

Regression magnitude: 10x worsening in miss share for identical workload type

Per-Version Historical Data

Analysis of 4,367 sub-agent transcript files across versions v2.1.74 through v2.1.128:

Version	Avg cache_creation/turn	Workload	Status
v2.1.121	5,534 tokens	councils/teams	baseline
v2.1.126	8,433 tokens	councils/teams	+52% regression
v2.1.128	22,713 tokens	councils/teams	+410% regression

Low-parallelism v2.1.128 workloads show normal cache behavior (3-4% miss), confirming regression is specific to parallel-team fan-out patterns.

Root Cause Analysis

The regression correlates with inbound SendMessage deliveries (teammate-to-teammate messages). Each inbound message appears to:

Break the cache_control prefix (checkpoint)
Force regeneration of all downstream content (cache_creation spike)
Continue until next cache rebuild

Supporting evidence:

v2.1.121 agents with 4 SendMessage calls: 2 cache collapses total
v2.1.128 agents with 12 SendMessage calls: 25 cache collapses out of 52 turns
Collapse timing correlates with inbound message delivery timestamps (within 1-2 seconds)

User Impact

A typical observation-council spawning 5 parallel agents:

v2.1.121: ~280K tokens cold-start (5 agents × 56K initial), scales linearly with reuse
v2.1.128: ~900K+ tokens cold-start due to repeated cache invalidations, no scaling benefit

This explains why users running councils on v2.1.128 rapidly exhaust token budgets on subscription plans.

Reproduction Steps

Spawn an observation-council with 5 parallel agents
Inspect session transcripts at ~/.claude/projects/[SESSION_ID]/subagents/*.jsonl
Extract cache_creation_input_tokens and cache_read_input_tokens from API calls
Calculate per-turn cache miss share: 1 - (cache_read / (cache_read + cache_creation))
Compare to same workload in v2.1.121

Expected result: v2.1.128 shows 35-40% miss share; v2.1.121 shows 3-5% miss share for identical workload.

Workaround

Requested Fix

Ensure cache_control headers are applied to inbound teammate messages in addition to progress summaries. Specifically:

Tag teammate message injections with cache_control before insertion
Use stable content hashes (exclude timestamp/message-id from cache key)
Validate cache miss share drops below 5% for parallel-team workloads in testing

Files for Investigation

Changelog source: github.com/anthropics/claude-code main branch, versions 2.1.121, 2.1.126, 2.1.128

Investigation conducted: 2026-05-05 via SRE observation-council with 5 agents (investigator, critic, validator, reviewer, historian) analyzing 4,367 historical transcripts and performing like-for-like version comparison.

extent analysis

TL;DR

Apply cache_control headers to inbound teammate messages to fix the caching regression in parallel-team workloads.

Guidance

Review the changelog for versions v2.1.126 and v2.1.128 to understand the introduction of cache_control in sub-agent progress summaries.
Investigate the implementation of cache_control in teammate message injections to identify the missing application of cache control headers.
Test the fix by applying cache_control headers to inbound teammate messages and validating the cache miss share drops below 5% for parallel-team workloads.
Consider using stable content hashes (excluding timestamp and message-id from cache key) to prevent cache prefix hash shifts.

Example

No code snippet is provided as the issue does not include specific code references.

Notes

The fix may require modifications to the teammate message injection mechanism to include cache_control headers. Additionally, the use of stable content hashes may require changes to the cache key generation logic.

Recommendation

Apply the workaround by avoiding parallel-team workloads until the fix is implemented, and then apply the fix by ensuring cache_control headers are applied to inbound teammate messages. This is recommended because the workaround only mitigates the issue, while the fix addresses the root cause of the caching regression.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #batch processing #GPU compatibility #latency issue #model loading

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix v2.1.128 caching regression in parallel-team workloads (10x token cost increase) [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Claude Code v2.1.128 Caching Regression in Parallel-Team Workloads

Summary

Evidence

Like-For-Like Comparison (Same Workload Pattern)

Per-Version Historical Data

Root Cause Analysis

User Impact

Reproduction Steps

Workaround

Requested Fix

Files for Investigation

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix v2.1.128 caching regression in parallel-team workloads (10x token cost increase) [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Claude Code v2.1.128 Caching Regression in Parallel-Team Workloads

Summary

Evidence

Like-For-Like Comparison (Same Workload Pattern)

Per-Version Historical Data

Root Cause Analysis

User Impact

Reproduction Steps

Workaround

Requested Fix

Files for Investigation

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING