claude-code - 💡(How to fix) Fix [FEATURE] Usage display should surface cache read and write token consumption [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#55133Fetched 2026-05-01 05:45:25
View on GitHub
Comments
3
Participants
2
Timeline
8
Reactions
0
Author
Timeline (top)
commented ×3labeled ×3cross-referenced ×2

Root Cause

#28723 is also related and platform-agnostic, but scoped specifically to Opus and the 1M context window. The problem it describes — quota depleting faster than expected with no visibility into why — is a symptom of the same root cause, but the fix is not Opus-specific or context-size-specific.

RAW_BUFFERClick to expand / collapse

Is your feature request related to a problem?

The token counter and Plan usage display show output tokens but nothing about cache reads or cache writes. In practice, cache reads are often the dominant cost in a session — not output tokens — yet they are entirely invisible to the user.

This is not limited to large context windows or flagship models. Cache reads accumulate proportionally to context size on every model and at every plan tier. A user watching the current display has no way to understand, anticipate, or manage the costs that actually drive their bill.

Concretely: in a long agentic session, every single API call re-bills the user for the entire accumulated context at the cache-read rate. Compaction events write the full context to cache at the (higher) cache-write rate and fire automatically. Neither is visible anywhere in the UI.

Describe the solution you'd like

The session usage display should break out:

  • Cache read tokens (running total and per-turn)
  • Cache write tokens (with compaction events identifiable as the large discrete spikes they are)
  • Output tokens (already shown)

A single aggregate "total billable token-equivalents" at current model rates would also be more useful than output tokens alone.

Additional context

This compounds #55121: sub-agent cache reads are doubly invisible — not counted in the main-thread total, and not broken out even when they are counted.

Independent analysis by community members corroborates the scale of the problem: one user documented that cache read tokens consumed 97.7% of their session costs, with actual API cost of $1.47 versus a total billed cost of $64.98 — a 44× ratio (source).

Prior art / not a duplicate

#44779 is related but scoped to 1M context + Opus + Linux, framing cache cost visibility as a niche concern. This request is general: cache reads dominate cost across all context sizes, all models, and all platforms. The visibility gap is not a special case.

#28723 is also related and platform-agnostic, but scoped specifically to Opus and the 1M context window. The problem it describes — quota depleting faster than expected with no visibility into why — is a symptom of the same root cause, but the fix is not Opus-specific or context-size-specific.

extent analysis

TL;DR

Displaying cache read and write tokens in the session usage display can help users understand and manage their actual costs.

Guidance

  • Break out cache read tokens and cache write tokens in the session usage display to provide visibility into the dominant cost factor.
  • Include a running total and per-turn breakdown for cache read tokens to help users anticipate costs.
  • Identify compaction events as large discrete spikes in cache write tokens to provide transparency into billing.
  • Consider displaying a single aggregate "total billable token-equivalents" at current model rates for a more accurate representation of costs.

Example

No code snippet is provided as the issue is focused on the display of usage metrics rather than a specific code implementation.

Notes

The issue highlights the importance of visibility into cache reads and writes, which can dominate costs across all context sizes, models, and platforms. The proposed solution aims to address this visibility gap.

Recommendation

Apply a workaround by modifying the session usage display to include cache read and write tokens, as this will provide users with a more accurate understanding of their costs and help them manage their usage more effectively.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [FEATURE] Usage display should surface cache read and write token consumption [3 comments, 2 participants]