claude-code - 💡(How to fix) Fix [BUG] Opus 4.8 medium effort spends 46k output tokens on hidden thinking for a simple coding turn [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Error Messages/Logs

No API 400 or tool parsing error for the high-token request.

Fix Action

Fixed

Code Example

No API 400 or tool parsing error for the high-token request.

Transcript usage for the completed request:
input_tokens: 131
cache_read_input_tokens: 91877
cache_creation_input_tokens: 4054
output_tokens: 46433
stop_reason: end_turn
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and did not find a clean duplicate for this specific failure mode
  • This is a single bug report
  • I am using the latest version of Claude Code

What's Wrong?

Opus 4.8 in Claude Code can spend an unexpectedly large amount of hidden thinking/output tokens on a routine coding turn even when effort is set to medium.

In one session, the visible task was a small mechanical code follow-up: inspect the impact of a rename / retry-label change before editing. The visible work was mostly grep/read/tool use and a short conclusion. No web search was involved.

Claude Code UI showed the turn thinking for about 22m 43s and about 105.6k tokens.

The underlying transcript has a completed request with:

  • model: claude-opus-4-8
  • input_tokens: 131
  • cache_read_input_tokens: 91,877
  • cache_creation_input_tokens: 4,054
  • output_tokens: 46,433
  • stop_reason: end_turn

The high cost appears to be mostly hidden thinking/output, not user-visible content. This happened under effort=medium, so medium effort behaved much closer to a high/xhigh thinking budget than a normal routine coding mode.

There was an ECONNRESET retry nearby in the session, but this specific high-token request completed normally with stop_reason=end_turn, so this does not look like only a networking/retry issue.

Related but not identical issues:

  • #64102: excessive token consumption mixed with API disconnects
  • #63455: simple tasks consuming 40-50k tokens, but the title/model metadata are unclear
  • #44344: extended thinking hangs/drains tokens, but older/different hang behavior
  • #63954: abnormal session-limit drain, more about quota accounting

What Should Happen?

effort=medium should materially constrain hidden thinking for routine coding/tool turns.

A simple rename-impact scan or small code follow-up should not spend ~46k output tokens mostly as hidden thinking. If medium effort is only soft guidance, Claude Code should provide either:

  • a hard per-turn thinking/output budget,
  • a clearer warning when a turn is burning tens of thousands of hidden output tokens,
  • or automatic effort step-down for simple tool/refactor turns.

Error Messages/Logs

No API 400 or tool parsing error for the high-token request.

Transcript usage for the completed request:
input_tokens: 131
cache_read_input_tokens: 91877
cache_creation_input_tokens: 4054
output_tokens: 46433
stop_reason: end_turn

Steps to Reproduce

  1. Start or resume a non-trivial Claude Code session with Opus 4.8.
  2. Set effort to medium.
  3. Ask for a routine coding follow-up, such as checking the impact of a simple rename before editing.
  4. Observe that Claude Code may spend tens of thousands of output tokens in hidden thinking before doing ordinary grep/read/tool work.
  5. Inspect the session JSONL and group assistant records by requestId; check message.usage.output_tokens.

Claude Model

Opus

Is this a regression?

Yes, this worked in a previous version.

Last Working Version

Opus 4.6 / Opus 4.7 did not show this level of hidden thinking cost for comparable routine coding turns in my usage.

Claude Code Version

2.1.158 (Claude Code)

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

Other

Additional Information

I can provide sanitized transcript snippets if useful. The important distinction is that this was not a failed request hitting max_tokens or an API 400 loop; the request completed normally, but spent 46k output tokens largely on hidden thinking under medium effort.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Opus 4.8 medium effort spends 46k output tokens on hidden thinking for a simple coding turn [1 pull requests]