claude-code - 💡(How to fix) Fix [Bug] Opus 4.7 metering: cache_read_input_tokens consuming 5-hour bucket at input-token rate (Max $100) [2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#49302Fetched 2026-04-17 08:45:07
View on GitHub
Comments
2
Participants
1
Timeline
9
Reactions
1
Participants
Timeline (top)
labeled ×4commented ×2cross-referenced ×2subscribed ×1

Error Message

Error Messages/Logs

Code Example

No runtime errors — this is a metering/billing anomaly visible in the Claude Code usage panel.

5-hour limit panel shows 100% used, resets in 3h, after only ~60 assistant responses in Opus 4.7.
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

After Opus 4.7 rolled out, my Claude Code 5-hour usage bucket is being consumed ~7x faster than with Opus 4.6 on identical workloads. The data strongly suggests cache_read_input_tokens are being metered against the 5-hour bucket at full input-token weight instead of the documented reduced rate.

Ratio observed on the same account, back-to-back days:

  • Opus 4.6 (Apr 15): ~190M total tokens in a 5h window → never hit the limit
  • Opus 4.7 (Apr 16): ~30M total tokens in under 2h → 100% of 5h bucket consumed
  • Effective cost-per-token is ~7x higher on 4.7 for workloads dominated by cache reads.

I already opened a support ticket with Anthropic. Support confirmed this behavior is "inconsistent with documented behavior" and referred me here.

What Should Happen?

Per Anthropic's documented pricing, cache_read_input_tokens should be charged at a heavily reduced rate (~10% of input tokens). This was clearly the case with Opus 4.6 — 186M cache-read tokens in a 5h window did not hit the limit.

Opus 4.7 should apply the same reduced rate. Metering behavior should not regress between model versions.

Error Messages/Logs

No runtime errors — this is a metering/billing anomaly visible in the Claude Code usage panel.

5-hour limit panel shows 100% used, resets in 3h, after only ~60 assistant responses in Opus 4.7.

Steps to Reproduce

  1. On Max $100 plan with default Claude Code config, work a full session with Opus 4.6 → observe that a workload of ~700+ assistant responses and ~180M+ cache_read_input_tokens fits comfortably inside a 5h window.
  2. Switch to Opus 4.7 (via /model claude-opus-4-7[1m] or by starting a new session after the 4.7 rollout).
  3. Send a handful of normal requests (~60 assistant responses) with a typical cached context (~400k cached tokens per request).
  4. Observe the 5h usage panel jump to 100% in under 2 hours — dramatically faster than with 4.6 on the same workload.

Data from my session (JSONL usage objects aggregated by hour, UTC)

Opus 4.6 — no limit hit:

Hour UTCMsgsTotalcache_read
23:00 Apr 1515116.3M15.5M
00:00 Apr 1624648.2M47.4M
01:00 Apr 1616649.0M48.7M
02:00 Apr 1613146.1M44.7M
03:00 Apr 167529.8M29.7M
Total 5h769~190M~186M

Opus 4.7 — 100% hit:

Hour UTCMsgsTotalcache_read
15:00 Apr 1662.5M1.5M
16:00 Apr 165427.3M23.7M
Partial60~29.8M~25.2M

Session JSONL with per-message usage available on request: ~/.claude/projects/-Users-lck-Documents-Unna-Towers-Interfaz/19adaea6-....jsonl

Claude Model

Opus

Is this a regression?

Yes, this worked in a previous version

Last Working Version

claude-opus-4-6 (Opus 4.6) — metering worked correctly until switching to 4.7 on Apr 16, 2026

Claude Code Version

2.1.111 (Claude Code)

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

Terminal.app (macOS)

Additional Information

Support ticket already open — Anthropic support agent confirmed this pattern is inconsistent with documented cache pricing and directed me to file this bug.

Account plan: Max $100/month

Hypotheses (in order of likelihood):

  1. cache_read_input_tokens weighting regression in Opus 4.7 — being counted at input-token rate instead of reduced rate.
  2. Cache hits misreported — returning as cache_read in usage objects but billed as cache_create or full input.
  3. Model-switch invalidation — switching from 4.6 → 4.7 invalidates cache entries in a way that inflates metering beyond what the usage field shows.

Willing to provide:

  • Raw session JSONL (18MB) with all usage objects
  • CSV hourly breakdown
  • Screen recording of the 5h usage panel

Impact: On Max $100 plan I went from a normal working day (700+ responses) to hitting the limit in <2h with the same work patterns — effectively a 10x reduction in Opus throughput per dollar without any change on my end.

extent analysis

TL;DR

The most likely fix is to wait for Anthropic to address the cache_read_input_tokens weighting regression in Opus 4.7, which is causing the tokens to be counted at the full input-token rate instead of the reduced rate.

Guidance

  • Verify that the issue is indeed related to the cache_read_input_tokens weighting by analyzing the usage objects in the session JSONL file.
  • Check if the cache hits are being misreported as cache_read in the usage objects but billed as cache_create or full input.
  • Consider providing the raw session JSONL file and CSV hourly breakdown to Anthropic support for further investigation.
  • Monitor the issue and wait for an update from Anthropic on the regression fix.

Example

No code snippet is provided as this issue is related to the Anthropic API and metering behavior.

Notes

The issue seems to be specific to the Opus 4.7 model and the cache_read_input_tokens weighting. The fact that the issue is not present in Opus 4.6 suggests that it is a regression introduced in the newer version.

Recommendation

Apply workaround: Wait for Anthropic to address the regression and provide a fix for the cache_read_input_tokens weighting issue. This is the most likely solution, given that the issue is specific to the Opus 4.7 model and has been confirmed by Anthropic support as inconsistent with documented behavior.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING