claude-code - 💡(How to fix) Fix [BUG] Rapid quota consumption in Claude Code with Opus 4.7 1M context — minutes-scale 90%+ burn from few prompts [3 comments, 4 participants]

Code Example

No error output — issue is silent quota consumption, not a crash.

Observed in Anthropic Console (console.anthropic.com → Usage):
- Date: 2026-04-29
- Plan: Max 5x
- Quota consumed: ~90% within minutes of session start
- Number of user turns before exhaustion: ~[5]
- Model selected in Claude Code: claude-opus-4-7[1m] (Opus 4.7, 1M context)
- Prior occurrence: late March 2026, same project, same symptom

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

Environment:

Tool: Claude Code (CLI, VSCode extension) Model: claude-opus-4-7[1m] (Opus 4.7, 1M context) Platform: macOS Darwin 24.6.0, zsh Plan: [your Claude plan tier — Pro / Max / Team] Date observed: 2026-04-29 Prior occurrence: late March 2026 (same project) What happened: Within minutes of starting a fresh session, a small number of prompts (~[N] turns) consumed 90%+ of my usage allotment. No long-running agents, no large tool outputs visible to me. Same behavior observed end of March.

Project context that may be relevant:

Large CLAUDE.md (~projected token count from your repo — this file is substantial; you can run wc -c CLAUDE.md to get bytes) Large memory/MEMORY.md (system reminder showed it was 25.9KB, exceeding the 24.4KB index limit, so partial-load fallback fired) Many deferred tools available via ToolSearch Many user-level + project-level skills registered Auto Mode active Hypothesis: The 1M-context model appears to be billing on full attached context (CLAUDE.md + MEMORY.md + tool schemas + skills index + transcript) on every turn, even for trivial prompts, rather than caching effectively across rapid same-session turns. A handful of turns × ~1M tokens of attached context = full quota burn.

What I'd like Anthropic to investigate:

Is prompt caching working as expected for Claude Code sessions on Opus 4.7 1M? (Anthropic's published cache TTL is 5 minutes; rapid turns should hit cache.) Is the 1M-context surcharge being applied even when the actual conversation is small? Is there a quota-meter accounting bug specifically for Opus 4.7 1M in Claude Code? Logs/data I can provide on request:

Session transcript CLAUDE.md size + MEMORY.md size Approximate turn count before quota exhaustion Anthropic Console usage chart screenshot showing the spike

What Should Happen?

Context should not burn in a matter of a few prompts. This seems very similar to March 2026 issue.

Error Messages/Logs

No error output — issue is silent quota consumption, not a crash.

Observed in Anthropic Console (console.anthropic.com → Usage):
- Date: 2026-04-29
- Plan: Max 5x
- Quota consumed: ~90% within minutes of session start
- Number of user turns before exhaustion: ~[5]
- Model selected in Claude Code: claude-opus-4-7[1m] (Opus 4.7, 1M context)
- Prior occurrence: late March 2026, same project, same symptom

Steps to Reproduce

Open a project in Claude Code (VSCode extension on macOS Darwin 24.6.0). Project has substantial attached context:
- CLAUDE.md (~large project instructions file)
- memory/MEMORY.md (system reminder reported it was 25.9KB, exceeding the 24.4KB index limit; partial-load fallback fired)
- Many user-level + project-level skills registered
- Many deferred MCP tools available via ToolSearch
- Auto Mode active
Confirm model is set to Opus 4.7 with 1M context window (model ID: claude-opus-4-7[1m]).
Send a small number of routine prompts (~3-6 turns). No long-running agents, no large file reads, no big tool outputs.
Open Anthropic Console → Usage.

Expected: Usage proportional to actual conversation size (a handful of small turns should consume a small fraction of weekly quota, especially with prompt caching active — Anthropic docs state cache reads are billed at ~10% of base rate with a 5-minute TTL, so rapid same-session turns should hit cache).

Actual: ~90%+ of weekly quota consumed within minutes. Same behavior reproduced in late March 2026 on the same project.

Hypothesis: One or more of the following may be happening: (a) Prompt caching is not engaging on Opus 4.7 1M in Claude Code, so every turn rebills the full attached context. (b) The 1M-context surcharge is being applied to the full window even when the actual conversation is small. (c) MEMORY.md exceeding the index limit triggers some fallback path that defeats caching. (d) Claude Code's quota meter has an accounting bug specific to Opus 4.7 1M.

Possibly related: issue #54755 (weekly reset anchor shift on Apr 28-29 2026, labeled area:cost), issue #54761 (extreme token consumption).

Repro reliability: 2-for-2 — observed end of March 2026 and again 2026-04-29 on the same project.

Claude Model

Opus

Is this a regression?

Yes, this worked in a previous version

Last Working Version

No response

Claude Code Version

2.1.123 (VSCode native extension, darwin-arm64)

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

VS Code integrated terminal

Additional Information

Surface: Claude Code VSCode native extension (NOT the standalone CLI).

Project context (likely relevant to the quota burn):

Project root: ARM (AdvisorRM) — large multi-tenant CRM codebase
CLAUDE.md: very large project instructions file (thousands of lines)
memory/MEMORY.md: 25.9 KB at session start, exceeded the 24.4 KB index limit — system reminder reported partial-load fallback fired
Many user-level skills registered (impeccable, audit, critique, polish, normalize, animate, adapt, arrange, bolder, clarify, colorize, delight, distill, extract, harden, onboard, optimize, overdrive, quieter, shape, typeset, plus project-level phase-start, phase-wrap, scan, deploy, etc.)
Many MCP servers connected (n8n-mcp, supabase, playwright, gcloud, Gmail, Google Calendar, Google Drive)
Many deferred tools available via ToolSearch
Auto Mode active

Model: claude-opus-4-7[1m] (Opus 4.7 with 1M context window)

Symptom: ~90%+ of weekly quota consumed within minutes from a small number of routine prompts. No long-running agents, no large file reads on my end, no large tool outputs visible to me. Same symptom observed late March 2026 on the same project.

Hypothesis: Prompt caching may not be engaging on Opus 4.7 1M in Claude Code, so every turn rebills the full attached context (CLAUDE.md + MEMORY.md + skills index + tool schemas + MCP descriptions). With a project this size on the 1M-context model, each uncached turn could conceivably bill hundreds of thousands of input tokens.

Possibly related already-open issues:

#54755 ("Weekly reset anchor shifted Thursday → Saturday in ~12hr window, Max 20x, no user action, Apr 28-29 2026") — labeled area:cost, opened ~1 hr before this report
#54761 ("Extreme token co...") — open

Repro reliability: 2-for-2 on this project (late March 2026 and 2026-04-29), but I cannot reproduce on demand — it appears intermittent across sessions.

What would help me diagnose:

Per-turn token counts visible in Claude Code itself (not just Console aggregate)
Confirmation that prompt caching is engaging on Opus 4.7 1M
Documentation of which tier (Pro/Max/Team) the 1M-context model is available on and at what billing rate

extent analysis

TL;DR

Investigate and potentially adjust the prompt caching behavior for Opus 4.7 1M in Claude Code to prevent excessive quota consumption.

Guidance

Verify if prompt caching is working as expected for Claude Code sessions on Opus 4.7 1M by checking the Anthropic Console usage chart for cache hits.
Check the billing rate for the 1M-context model on the user's plan (Pro/Max/Team) to understand the cost implications.
Review the project's context, including the size of CLAUDE.md and MEMORY.md, to determine if it's contributing to the excessive quota burn.
Consider testing with a smaller project context or a different model to isolate the issue.

Example

No code snippet is provided as the issue is related to the Anthropic API and Claude Code configuration.

Notes

The issue seems to be intermittent and specific to the Opus 4.7 1M model in Claude Code. The user has already reported similar behavior in late March 2026, suggesting a potential regression.

Recommendation

Apply a workaround by optimizing the project context, such as reducing the size of CLAUDE.md and MEMORY.md, or exploring alternative models until the prompt caching issue is resolved.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [BUG] Rapid quota consumption in Claude Code with Opus 4.7 1M context — minutes-scale 90%+ burn from few prompts [3 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Code Example

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Rapid quota consumption in Claude Code with Opus 4.7 1M context — minutes-scale 90%+ burn from few prompts [3 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Code Example

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING