claude-code - 💡(How to fix) Fix Massive token burn: 80% → 11% in ~5 minutes on a single session

Q: Expected behavior

Token consumption should be roughly proportional to *new* content per turn. Cache-miss amplification on a >1 MB transcript across 225 rapid Bash iterations should not drain ~70% of a session budget in 5 minutes.

claude-code2026-04-20 15:59:55

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

A single Claude Code session consumed context from ~80% to ~11% of the session budget in roughly 5 minutes, despite no individual tool call returning a large payload. The burn appears to come from turn volume + repeated cache-miss re-sends, not a runaway tool result.

Error Message

Session transcript analysis (~/.claude/projects/.../8695a41b-*.jsonl):

Root Cause

RAW_BUFFERClick to expand / collapse

Summary

Environment

Claude Code CLI, macOS (Darwin 25.4.0)
Model: Claude Opus 4.7 (1M context)
Date: 2026-04-20
Session ID: 8695a41b-71c2-4a0e-b1bf-0ea6a0972c67
Account email: [email protected]

Observed behavior

Session transcript analysis (~/.claude/projects/.../8695a41b-*.jsonl):

760 assistant turns / 556 user turns in one session
Tool-use counts: 225 Bash, 71 Edit, 55 Read, 49 TaskUpdate, 32 TaskCreate, 27 Grep, 22 WebFetch, 17 Write, 5 Agent
Largest single tool_use payloads: ExitPlanMode 25 KB, Write 24 KB
Tool results stayed small (avg 769 chars, max 25 KB, total ~400 KB)
No single "giant blob" — but every new turn re-sent the growing transcript

Expected behavior

Token consumption should be roughly proportional to new content per turn. Cache-miss amplification on a >1 MB transcript across 225 rapid Bash iterations should not drain ~70% of a session budget in 5 minutes.

Hypothesis

Once a session crosses a few hundred turns, cache-miss re-sends of the full transcript dominate token cost. There does not appear to be an in-CLI warning or automatic compaction threshold that fires before the user notices the drain.

Asks

Can the CLI surface a proactive warning (or auto-compact suggestion) when per-turn token cost crosses a threshold?
Is there a known issue with cache invalidation on long sessions that would cause repeated full re-sends?
Billing clarity: for sessions that burn through context this fast, is there a path to review/refund?

Happy to share the .jsonl transcript privately if useful.

extent analysis

TL;DR

Implementing a proactive warning or auto-compact suggestion in the CLI when per-turn token cost crosses a threshold may help mitigate the issue of rapid context budget consumption.

Guidance

Investigate the cache invalidation mechanism to determine if it's causing repeated full re-sends of the transcript, leading to excessive token consumption.
Review the CLI's current threshold for warning users about high token costs and consider lowering it to prevent unexpected budget drains.
Analyze the transcript data to identify patterns or correlations between tool usage, turn volume, and token consumption that could inform optimization strategies.
Consider implementing a compaction or summarization mechanism for long transcripts to reduce the payload size and mitigate cache-miss re-sends.

Example

No code snippet is provided as the issue does not imply a specific code-related solution.

Notes

The issue seems to be related to the interaction between the CLI, cache invalidation, and token consumption. Without further information about the underlying implementation, it's challenging to provide a definitive solution. However, the suggested guidance points should help in identifying and mitigating the issue.

Recommendation

Apply workaround: Implement a proactive warning or auto-compact suggestion in the CLI to alert users when per-turn token cost crosses a threshold, allowing them to take corrective action and prevent unexpected budget drains. This approach addresses the immediate issue and provides a stopgap measure until a more comprehensive solution can be developed.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #mixed precision #training loop #device allocation #model download

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Massive token burn: 80% → 11% in ~5 minutes on a single session

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Summary

Environment

Observed behavior

Expected behavior

Hypothesis

Asks

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Massive token burn: 80% → 11% in ~5 minutes on a single session

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Summary

Environment

Observed behavior

Expected behavior

Hypothesis

Asks

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING