claude-code - 💡(How to fix) Fix Feature request: include thinking_tokens in API usage response [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#49320Fetched 2026-04-17 08:44:35
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
3
Timeline (top)
labeled ×3cross-referenced ×2commented ×1mentioned ×1

The API usage response object includes input_tokens, output_tokens, cache_read_input_tokens, and cache_creation_input_tokens — but does not include thinking tokens as a separate field. With Opus 4.7's shift to mandatory adaptive thinking, this gap has become a measurable cost transparency issue.

Root Cause

The API usage response object includes input_tokens, output_tokens, cache_read_input_tokens, and cache_creation_input_tokens — but does not include thinking tokens as a separate field. With Opus 4.7's shift to mandatory adaptive thinking, this gap has become a measurable cost transparency issue.

Code Example

{
  "usage": {
    "input_tokens": 6,
    "output_tokens": 490,
    "cache_read_input_tokens": 67450,
    "cache_creation_input_tokens": 541,
    "thinking_tokens": 12500
  }
}
RAW_BUFFERClick to expand / collapse

Summary

The API usage response object includes input_tokens, output_tokens, cache_read_input_tokens, and cache_creation_input_tokens — but does not include thinking tokens as a separate field. With Opus 4.7's shift to mandatory adaptive thinking, this gap has become a measurable cost transparency issue.

The problem

Our metered telemetry (claude-code-meter) shows that Opus 4.7 consumes Q5h quota at 2.4x the rate of Opus 4.6 for equivalent visible token counts:

MetricOpus 4.6Opus 4.7
Avg Q5h per turn~0.3%~0.73%
Ratio1x2.4x

The visible token counts don't account for this difference. The gap is consistent with invisible thinking token overhead being charged against quota but not reported in the usage response.

Users cannot:

  • See how many thinking tokens were consumed per call
  • Distinguish thinking cost from inference cost
  • Optimize their thinking budget (manual control was removed in 4.7)
  • Verify that quota charges match visible usage

Under the new per-token enterprise billing, this is an invisible line item on the invoice.

Request

Add thinking_tokens (or adaptive_thinking_tokens) to the usage response object:

{
  "usage": {
    "input_tokens": 6,
    "output_tokens": 490,
    "cache_read_input_tokens": 67450,
    "cache_creation_input_tokens": 541,
    "thinking_tokens": 12500
  }
}

The data exists server-side — it's computed for billing. Exposing it is a documentation decision, not an engineering project.

Related

extent analysis

TL;DR

Add thinking_tokens to the usage response object to provide transparency into the invisible thinking token overhead being charged against quota.

Guidance

  • Review the usage response object to confirm the absence of thinking_tokens and its impact on quota transparency.
  • Investigate the server-side computation of thinking tokens for billing to understand the data availability.
  • Consider the proposed JSON structure for the usage response object, including the addition of thinking_tokens.
  • Evaluate the potential benefits of exposing thinking token data, such as enabling users to optimize their thinking budget and verify quota charges.

Example

The proposed usage response object with thinking_tokens could be implemented as follows:

{
  "usage": {
    "input_tokens": 6,
    "output_tokens": 490,
    "cache_read_input_tokens": 67450,
    "cache_creation_input_tokens": 541,
    "thinking_tokens": 12500
  }
}

Notes

The addition of thinking_tokens to the usage response object is a documentation decision, and the data already exists server-side for billing purposes.

Recommendation

Apply the workaround by adding thinking_tokens to the usage response object, as it provides a clear solution to the transparency issue and enables users to better understand and manage their quota usage.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Feature request: include thinking_tokens in API usage response [1 comments, 2 participants]