claude-code - 💡(How to fix) Fix Feature request: include thinking_tokens in API usage response [1 comments, 2 participants]

claude-code2026-04-16 17:25:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#49320•Fetched 2026-04-17 08:44:35

View on GitHub

Comments

Participants

Timeline

Reactions

Author

cnighswonger

Participants

cnighswonger

github-actions[bot]

Timeline (top)

labeled ×3cross-referenced ×2commented ×1mentioned ×1

The API usage response object includes input_tokens, output_tokens, cache_read_input_tokens, and cache_creation_input_tokens — but does not include thinking tokens as a separate field. With Opus 4.7's shift to mandatory adaptive thinking, this gap has become a measurable cost transparency issue.

Root Cause

Code Example

{
  "usage": {
    "input_tokens": 6,
    "output_tokens": 490,
    "cache_read_input_tokens": 67450,
    "cache_creation_input_tokens": 541,
    "thinking_tokens": 12500
  }
}

RAW_BUFFERClick to expand / collapse

Summary

The problem

Our metered telemetry (claude-code-meter) shows that Opus 4.7 consumes Q5h quota at 2.4x the rate of Opus 4.6 for equivalent visible token counts:

Metric	Opus 4.6	Opus 4.7
Avg Q5h per turn	~0.3%	~0.73%
Ratio	1x	2.4x

The visible token counts don't account for this difference. The gap is consistent with invisible thinking token overhead being charged against quota but not reported in the usage response.

Users cannot:

See how many thinking tokens were consumed per call
Distinguish thinking cost from inference cost
Optimize their thinking budget (manual control was removed in 4.7)
Verify that quota charges match visible usage

Under the new per-token enterprise billing, this is an invisible line item on the invoice.

Request

Add thinking_tokens (or adaptive_thinking_tokens) to the usage response object:

{
  "usage": {
    "input_tokens": 6,
    "output_tokens": 490,
    "cache_read_input_tokens": 67450,
    "cache_creation_input_tokens": 541,
    "thinking_tokens": 12500
  }
}

The data exists server-side — it's computed for billing. Exposing it is a documentation decision, not an engineering project.

anthropics/claude-code#42796 — Opus 4.7 quality regressions, adaptive thinking discussion
@ArkNill's thinking token blind spot analysis
claude-code-cache-fix Discussion #25 — first metered 4.7 session data

extent analysis

TL;DR

Add thinking_tokens to the usage response object to provide transparency into the invisible thinking token overhead being charged against quota.

Guidance

Review the usage response object to confirm the absence of thinking_tokens and its impact on quota transparency.
Investigate the server-side computation of thinking tokens for billing to understand the data availability.
Consider the proposed JSON structure for the usage response object, including the addition of thinking_tokens.
Evaluate the potential benefits of exposing thinking token data, such as enabling users to optimize their thinking budget and verify quota charges.

Example

The proposed usage response object with thinking_tokens could be implemented as follows:

{
  "usage": {
    "input_tokens": 6,
    "output_tokens": 490,
    "cache_read_input_tokens": 67450,
    "cache_creation_input_tokens": 541,
    "thinking_tokens": 12500
  }
}

Notes

The addition of thinking_tokens to the usage response object is a documentation decision, and the data already exists server-side for billing purposes.

Recommendation

Apply the workaround by adding thinking_tokens to the usage response object, as it provides a clear solution to the transparency issue and enables users to better understand and manage their quota usage.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #docker error #permission error #memory optimization #batch processing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Feature request: include thinking_tokens in API usage response [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

The problem

Request

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Feature request: include thinking_tokens in API usage response [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

The problem

Request

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING