claude-code - 💡(How to fix) Fix Opus 4.6 cache_read tokens counted at full rate despite cache hits on Max 20x

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

[cache:ryugu] read=602082 write=0 hit=100%
[cache:ryugu] read=114299 write=0 hit=100%
[cache:ryugu] read=105961 write=0 hit=100%

---

cache_read_input_tokens: 602,082
cache_creation_input_tokens: 162
input_tokens: 3
output_tokens: 6
hit ratio ≈ 99.97%
RAW_BUFFERClick to expand / collapse

Bug Description

On Max 20x plan with Opus 4.6, cache_read_input_tokens appear to be counted at full rate against weekly quota despite cache hits. Small usage with high cache hit ratios still consumes disproportionate weekly quota, suggesting cache_read tokens are not being discounted at the documented reduced rate.

Update: Even after closing large-context sessions, weekly quota still increases by +1% with minimal cache_read (~220k tokens over 2 requests), further confirming cache_read is counted at full rate.

Environment

  • Plan: Max 20x
  • Model: claude-opus-4-6
  • Platform: macOS
  • Claude Code version: Latest

Data Evidence

Initial Session Analysis (2026-05-17, before closing large-context sessions)

Three sessions with minimal actual work but high cache_read:

Session IDCache Read / RequestRequestsTotal Cache ReadCache CreationOutput
Session A~600k74,207,9456,6901,689
Session B~100k192,072,200--
Session C~105k7737,857--

Total: 33 requests, 7,018,002 cache_read tokens, 6,690 cache_creation tokens, 1,689 output tokens

After Closing Large-Context Sessions (2026-05-17)

After closing sessions with large context, weekly quota still increased by +1% with minimal usage:

  • Requests: 2
  • Cache read: 220,390 tokens
  • Cache creation: 301 tokens
  • Input: 6 tokens
  • Output: 17 tokens
  • Weekly quota change: 14% → 15% (+1%)

Analysis: If cache_read were counted at reduced rate (1/10), equivalent tokens would be ~22k. A +1% weekly quota increase on Max 20x cannot be explained by ~22k equivalent tokens. This confirms cache_read is counted at full rate.

Cache Hit Verification

From daemon logs:

[cache:ryugu] read=602082 write=0 hit=100%
[cache:ryugu] read=114299 write=0 hit=100%
[cache:ryugu] read=105961 write=0 hit=100%

From transcript usage (example):

cache_read_input_tokens: 602,082
cache_creation_input_tokens: 162
input_tokens: 3
output_tokens: 6
hit ratio ≈ 99.97%

Conclusion: These are cache hits (99.97% hit ratio), not cache misses. No cache rebuild occurred.

Weekly Quota Anomaly Timeline

  • 08:38:39: Claude 7d = 13%
  • 09:08:41: Claude 7d = 14% (+1%)
    • Between: 1 request with 105,851 cache_read tokens
    • If cache_read at reduced rate (1/10): ~10,709 equivalent tokens
    • Problem: +1% cannot be explained by ~10k equivalent tokens
  • After closing large-context sessions: Claude 7d = 15% (+1%)
    • Over: 2 requests with 220,390 cache_read tokens
    • If cache_read at reduced rate (1/10): ~22k equivalent tokens
    • Problem: +1% cannot be explained by ~22k equivalent tokens

Steps to Reproduce

  1. Use Claude Code with Opus 4.6 on Max 20x plan
  2. Run sessions with high cache hit ratios (typical in long-lived sessions)
  3. Monitor cache_read tokens vs actual work (input + output)
  4. Observe weekly quota consumption disproportionate to actual token usage
  5. Even after closing large-context sessions, observe weekly quota still increases with minimal cache_read

Expected Behavior

  • cache_read_input_tokens should count at reduced rate (documented ~1/10 of input tokens) against weekly quota
  • Cache hits should not significantly consume weekly quota
  • Weekly quota should increase proportionally to actual productive work (input + output tokens)
  • Closing large-context sessions should stop disproportionate quota consumption

Actual Behavior

  • cache_read_input_tokens appear to count at full rate against weekly quota
  • Small usage with high cache hit ratios consumes disproportionate weekly quota
  • Weekly quota increases despite minimal actual work
  • Even after closing large-context sessions, weekly quota still increases with minimal cache_read (~220k tokens causing +1%)

Related Issues

  • #45756 - Pro Max 5x Quota Exhausted in 1.5 Hours (cache_read counts at full rate on Opus 4.6)
  • #24147 - Cache read tokens consume 99.93% of usage quota (architectural issue)
  • #57699 - Weekly limit depletes disproportionately (May 6 cutover accounting drift)
  • #52135 - Max (20x) weekly limit depletes disproportionately

Additional Context

This is not a cache miss issue - verification shows 99.97% hit ratios. The problem is cache_read token accounting against weekly quota limits. The pattern matches #45756 (Opus 4.6 cache_read full rate) and suggests the weekly quota accounting plane may apply different weighting than documented for subscription plans.

The new data point (closing large-context sessions still results in +1% with ~220k cache_read) further confirms that cache_read tokens are being counted at full rate against weekly quota, regardless of session context size.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING