claude-code - 💡(How to fix) Fix [Bug] Prompt cache unexpectedly collapses mid-session on Opus 4.x [3 comments, 3 participants]

JarettHolmes · 2026-04-29T06:45:43Z

[claude-code] Bug Description Cache bug 34629 — recurrence with measurement Date: 2026-04-28 Plan: Max 20x Severity: 50% of 5h Opus quota burned in ~30 min on… ## Fix / Workaround ## What I built locally as a workaround **Bug Description** Cache bug #34629 — recurrence with measurement Date: 2026-04-28 Plan: Max 20x Severity: 50% of 5h Opus quota burned in ~30 min on what felt like normal interactive work across three sessions. Second occurrence today, similar shape this morning. Summary **Anthropic stated this was fixed. It's not.** Cache cache_read collapses from a stable 130K–185K-token cached prefix down to 0 or ~8.6K tokens between consecutive turns, mid-session, with no observable change in prompt structure on my side. The next turn pays full cache_creation (~130K–185K tokens) to rebuild what should still have been hot. This recurs multiple times in the same session, on Opus 4.x. I quantified this same pattern on 2026-04-01 against bug #34629 — today's transcripts match the same signature exactly. Today's measurement Two affected sessions caught in the same 60-min window: 1. (Opus 4.7 1M) - 155 turns, 6 cache-collapse events - Healthy comparison (same model, more turns, more user work): 170 turns, $36.82, 0 anomalies - Affected session: 155 turns, $66.77, 6 anomalies - Cost inflation attributable to bug: ~$30 (~80% over expected) turn 80: cache_read 132,264 -> 0 (paid cw=137,368) turn 84: cache_read 137,593 -> 8,661 (paid cw=144,967) turn 97: cache_read 147,609 -> 8,661 (paid cw=149,xxx) turn 101: cache_read 153,628 -> 8,661 (paid cw=153,xxx) turn 104: cache_read 155,148 -> 8,661 (paid cw=155,xxx) turn 140: cache_read 185,554 -> 8,661 (paid cw=185,xxx) Notice: 137,368 cw paid at turn 80 (cache_read=0), then a near-identical cache built immediately, then 4 collapses at the same fallback floor of 8,661. The 8,661 is suspiciously stable — looks like only the first cache breakpoint survives and everything above it gets dropped. 2. (Opus 4.7 1M) - 108 turns, 1 cache-collapse event - turn 28: cache_read 123,306 -> 0 (paid cw=129,058) - A second turn 22 seconds later paid an identical 129,058 cw with cache_read=0 again. Two full rebuilds of an identical 129K cache in 22 seconds. **What it does NOT correlate with:** - Sessions are on the 1M-context tier (`claude-opus-4-7[1m]`), but stayed under the 200K threshold throughout — bug occurred outside the doubled-pricing zone, so it is not a 1M-tier-pricing artifact - Not idle expiration — drops happen turn-over-turn, sub-minute spacing in some cases - Not session length — punk-brain 28-turn session hit it; another 170-turn session did not - Not specific tool calls — drops appear after various Bash/Edit/Read sequences with no obvious common trigger - Not specific to one project / repo / hooks setup — affects multiple projects with different settings.json This matches the prior characterization (3.7% session degradation rate, event-triggered, non-deterministic). ## Cost shape Across all sessions on the bad window (4 sessions, ~80 min), I burned ~14M Opus-equivalent tokens. By comparison the same kind of work on a clean window normally runs roughly half that for me. The two affected sessions account for the excess. ## Reproduction notes I cannot reproduce on demand. Detection signature is reliable: - `cache_read` value drops by >=50% turn-over-turn - previous `cache_read` was >= ~50K (so cache was substantively warm) - new `cache_read` is < ~20K (so collapse, not partial invalidation) - usually multiple drops within the same session ## What I built locally as a workaround A PostToolUse hook that scans the live transcript every 3 tool calls, applies the rule above, and emits a system-reminder telling the agent to stop and ask me before continuing. It also writes a save-state file with the last user message and last assistant action so I can resume cleanly after `/clear`. This catches the bug at the second drop event instead of letting it bleed for tens of turns. This shouldn't have to exist. Asking that this be looked at again. ## What would help most 1. Confirmation whether bug #34629 is still tracked / has any open investigation. 2. If there's a way to opt into instrumented telemetry that captures the cache-key hash that gets dropped, I'd happily run it for a week and submit logs. 3. For Anthropic to stop randomly resetting the 7 day quota. Anthropic arbitrarily reset it earlier, again. ## Reference data Local tooling (built 2026-04-01 against earlier instances of this bug): - `tools/cache-health.py` — post-hoc per-session analysis - `tools/cache-monitor.py` — repo-wide token excess accounting - `activity/logs/cache-health-log.jsonl` — accumulated per-session metrics

Bug Description Cache bug #34629 — recurrence with measurement

Date: 2026-04-28
Plan: Max 20x
Severity: 50% of 5h Opus quota burned in ~30 min on what felt like normal interactive work across three sessions. Second occurrence today, similar shape this morning.

Summary

Anthropic stated this was fixed. It's not.

Cache cache_read collapses from a stable 130K–185K-token cached prefix down to 0 or ~8.6K tokens between consecutive turns, mid-session, with no observable change in prompt structure on my side. The next turn pays full cache_creation (~130K–185K tokens) to
rebuild what should still have been hot. This recurs multiple times in the same session, on Opus 4.x. I quantified this same pattern on 2026-04-01 against bug #34629 — today's transcripts match the same signature exactly.

Today's measurement

Two affected sessions caught in the same 60-min window:

(Opus 4.7 1M)

155 turns, 6 cache-collapse events
Healthy comparison (same model, more turns, more user work): 170 turns, $36.82, 0
anomalies
Affected session: 155 turns, $66.77, 6 anomalies
Cost inflation attributable to bug: ~$30 (~80% over expected)

turn 80: cache_read 132,264 -> 0 (paid cw=137,368)
turn 84: cache_read 137,593 -> 8,661 (paid cw=144,967) turn 97: cache_read 147,609 -> 8,661 (paid cw=149,xxx)
turn 101: cache_read 153,628 -> 8,661 (paid cw=153,xxx)
turn 104: cache_read 155,148 -> 8,661 (paid cw=155,xxx)
turn 140: cache_read 185,554 -> 8,661 (paid cw=185,xxx)

Notice: 137,368 cw paid at turn 80 (cache_read=0), then a near-identical cache built
immediately, then 4 collapses at the same fallback floor of 8,661. The 8,661 is
suspiciously stable — looks like only the first cache breakpoint survives and everything
above it gets dropped.

(Opus 4.7 1M)

108 turns, 1 cache-collapse event
turn 28: cache_read 123,306 -> 0 (paid cw=129,058)
A second turn 22 seconds later paid an identical 129,058 cw with cache_read=0 again. Two full rebuilds of an identical 129K cache in 22 seconds.

What it does NOT correlate with:

Sessions are on the 1M-context tier (claude-opus-4-7[1m]), but stayed under the 200K threshold throughout — bug occurred outside the doubled-pricing zone, so it is not a 1M-tier-pricing artifact
Not idle expiration — drops happen turn-over-turn, sub-minute spacing in some cases
Not session length — punk-brain 28-turn session hit it; another 170-turn session did not
Not specific tool calls — drops appear after various Bash/Edit/Read sequences with no obvious common trigger
Not specific to one project / repo / hooks setup — affects multiple projects with different settings.json

This matches the prior characterization (3.7% session degradation rate, event-triggered, non-deterministic).

Cost shape

Across all sessions on the bad window (4 sessions, ~80 min), I burned ~14M Opus-equivalent tokens. By comparison the same kind of work on a clean window normally runs roughly half that for me. The two affected sessions account for the excess.

Reproduction notes

I cannot reproduce on demand. Detection signature is reliable:

cache_read value drops by >=50% turn-over-turn
previous cache_read was >= ~50K (so cache was substantively warm)
new cache_read is < ~20K (so collapse, not partial invalidation)
usually multiple drops within the same session

What I built locally as a workaround

A PostToolUse hook that scans the live transcript every 3 tool calls, applies the rule above, and emits a system-reminder telling the agent to stop and ask me before continuing. It also writes a save-state file with the last user message and last assistant action so I can resume cleanly after /clear. This catches the bug at the second drop event instead of letting it bleed for tens of turns.

This shouldn't have to exist. Asking that this be looked at again.

What would help most

Confirmation whether bug #34629 is still tracked / has any open investigation.
If there's a way to opt into instrumented telemetry that captures the cache-key hash that gets dropped, I'd happily run it for a week and submit logs.
For Anthropic to stop randomly resetting the 7 day quota. Anthropic arbitrarily reset it earlier, again.

Reference data

Local tooling (built 2026-04-01 against earlier instances of this bug):

tools/cache-health.py — post-hoc per-session analysis
tools/cache-monitor.py — repo-wide token excess accounting
activity/logs/cache-health-log.jsonl — accumulated per-session metrics

extent analysis

TL;DR

The most likely fix or workaround for the cache bug is to implement a mechanism to detect and prevent cache collapses, such as the PostToolUse hook created by the user, until the root cause is identified and addressed.

Guidance

Investigate the cache collapse issue by analyzing the cache_read values and identifying patterns or triggers that lead to the collapse.
Consider implementing a detection mechanism, such as the one described in the reproduction notes, to catch cache collapses and prevent further issues.
Review the tools/cache-health.py and tools/cache-monitor.py scripts to understand how they analyze cache health and token excess, and see if they can be used to gather more information about the issue.
If possible, opt into instrumented telemetry to capture the cache-key hash that gets dropped, as suggested by the user, to help identify the root cause.

Example

No code example is provided as it is not explicitly supported by the issue, but the user's PostToolUse hook can be used as a reference for detecting cache collapses:

# PostToolUse hook to detect cache collapses
def detect_cache_collapse(cache_read_values):
    if len(cache_read_values) < 2:
        return False
    previous_cache_read = cache_read_values[-2]
    current_cache_read = cache_read_values[-1]
    if previous_cache_read >= 50000 and current_cache_read < 20000 and current_cache_read / previous_cache_read < 0.5:
        return True
    return False

Notes

The issue lacks information about the underlying cause of the cache collapse, and the provided workaround is a detection mechanism rather than a fix. Further investigation is needed to identify the root cause and develop a permanent solution.

Recommendation

Apply the workaround by implementing a detection mechanism, such as the PostToolUse hook, to catch cache collapses and prevent further issues, until the root cause is identified and addressed. This will help

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [Bug] Prompt cache unexpectedly collapses mid-session on Opus 4.x [3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

What I built locally as a workaround

Cost shape

Reproduction notes

What I built locally as a workaround

What would help most

Reference data

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [Bug] Prompt cache unexpectedly collapses mid-session on Opus 4.x [3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

What I built locally as a workaround

Cost shape

Reproduction notes

What I built locally as a workaround

What would help most

Reference data

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING