claude-code - 💡(How to fix) Fix [Bug] Opus 4.7 metering: cache_read_input_tokens consuming 5-hour bucket at input-token rate (Max $100) [2 comments, 1 participants]

claude-code2026-04-16 17:14:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#49302•Fetched 2026-04-17 08:45:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

lorenzock2566

Participants

lorenzock2566

Timeline (top)

labeled ×4commented ×2cross-referenced ×2subscribed ×1

Error Message

Error Messages/Logs

Code Example

No runtime errors — this is a metering/billing anomaly visible in the Claude Code usage panel.

5-hour limit panel shows 100% used, resets in 3h, after only ~60 assistant responses in Opus 4.7.

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

After Opus 4.7 rolled out, my Claude Code 5-hour usage bucket is being consumed ~7x faster than with Opus 4.6 on identical workloads. The data strongly suggests cache_read_input_tokens are being metered against the 5-hour bucket at full input-token weight instead of the documented reduced rate.

Ratio observed on the same account, back-to-back days:

Opus 4.6 (Apr 15): ~190M total tokens in a 5h window → never hit the limit
Opus 4.7 (Apr 16): ~30M total tokens in under 2h → 100% of 5h bucket consumed
Effective cost-per-token is ~7x higher on 4.7 for workloads dominated by cache reads.

I already opened a support ticket with Anthropic. Support confirmed this behavior is "inconsistent with documented behavior" and referred me here.

What Should Happen?

Per Anthropic's documented pricing, cache_read_input_tokens should be charged at a heavily reduced rate (~10% of input tokens). This was clearly the case with Opus 4.6 — 186M cache-read tokens in a 5h window did not hit the limit.

Opus 4.7 should apply the same reduced rate. Metering behavior should not regress between model versions.

Error Messages/Logs

No runtime errors — this is a metering/billing anomaly visible in the Claude Code usage panel.

5-hour limit panel shows 100% used, resets in 3h, after only ~60 assistant responses in Opus 4.7.

Steps to Reproduce

On Max $100 plan with default Claude Code config, work a full session with Opus 4.6 → observe that a workload of ~700+ assistant responses and ~180M+ cache_read_input_tokens fits comfortably inside a 5h window.
Switch to Opus 4.7 (via /model claude-opus-4-7[1m] or by starting a new session after the 4.7 rollout).
Send a handful of normal requests (~60 assistant responses) with a typical cached context (~400k cached tokens per request).
Observe the 5h usage panel jump to 100% in under 2 hours — dramatically faster than with 4.6 on the same workload.

Data from my session (JSONL `usage` objects aggregated by hour, UTC)

Opus 4.6 — no limit hit:

Hour UTC	Msgs	Total	cache_read
23:00 Apr 15	151	16.3M	15.5M
00:00 Apr 16	246	48.2M	47.4M
01:00 Apr 16	166	49.0M	48.7M
02:00 Apr 16	131	46.1M	44.7M
03:00 Apr 16	75	29.8M	29.7M
Total 5h	769	~190M	~186M

Opus 4.7 — 100% hit:

Hour UTC	Msgs	Total	cache_read
15:00 Apr 16	6	2.5M	1.5M
16:00 Apr 16	54	27.3M	23.7M
Partial	60	~29.8M	~25.2M

Session JSONL with per-message usage available on request: ~/.claude/projects/-Users-lck-Documents-Unna-Towers-Interfaz/19adaea6-....jsonl

Claude Model

Opus

Is this a regression?

Yes, this worked in a previous version

Last Working Version

claude-opus-4-6 (Opus 4.6) — metering worked correctly until switching to 4.7 on Apr 16, 2026

Claude Code Version

2.1.111 (Claude Code)

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

Terminal.app (macOS)

Additional Information

Support ticket already open — Anthropic support agent confirmed this pattern is inconsistent with documented cache pricing and directed me to file this bug.

Account plan: Max $100/month

Hypotheses (in order of likelihood):

cache_read_input_tokens weighting regression in Opus 4.7 — being counted at input-token rate instead of reduced rate.
Cache hits misreported — returning as cache_read in usage objects but billed as cache_create or full input.
Model-switch invalidation — switching from 4.6 → 4.7 invalidates cache entries in a way that inflates metering beyond what the usage field shows.

Willing to provide:

Raw session JSONL (18MB) with all usage objects
CSV hourly breakdown
Screen recording of the 5h usage panel

Impact: On Max $100 plan I went from a normal working day (700+ responses) to hitting the limit in <2h with the same work patterns — effectively a 10x reduction in Opus throughput per dollar without any change on my end.

extent analysis

TL;DR

The most likely fix is to wait for Anthropic to address the cache_read_input_tokens weighting regression in Opus 4.7, which is causing the tokens to be counted at the full input-token rate instead of the reduced rate.

Guidance

Verify that the issue is indeed related to the cache_read_input_tokens weighting by analyzing the usage objects in the session JSONL file.
Check if the cache hits are being misreported as cache_read in the usage objects but billed as cache_create or full input.
Consider providing the raw session JSONL file and CSV hourly breakdown to Anthropic support for further investigation.
Monitor the issue and wait for an update from Anthropic on the regression fix.

Example

No code snippet is provided as this issue is related to the Anthropic API and metering behavior.

Notes

The issue seems to be specific to the Opus 4.7 model and the cache_read_input_tokens weighting. The fact that the issue is not present in Opus 4.6 suggests that it is a regression introduced in the newer version.

Recommendation

Apply workaround: Wait for Anthropic to address the regression and provide a fix for the cache_read_input_tokens weighting issue. This is the most likely solution, given that the issue is specific to the Opus 4.7 model and has been confirmed by Anthropic support as inconsistent with documented behavior.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #runtime error #index setup #retrieval issue #search optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [Bug] Opus 4.7 metering: cache_read_input_tokens consuming 5-hour bucket at input-token rate (Max $100) [2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Code Example

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Data from my session (JSONL `usage` objects aggregated by hour, UTC)

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [Bug] Opus 4.7 metering: cache_read_input_tokens consuming 5-hour bucket at input-token rate (Max $100) [2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Code Example

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Data from my session (JSONL usage objects aggregated by hour, UTC)

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Data from my session (JSONL `usage` objects aggregated by hour, UTC)