claude-code - 💡(How to fix) Fix [BUG] Thinking stalls exceeding conversation cache TTL [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#54934Fetched 2026-05-01 05:50:36
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
labeled ×5

Error Message

I tried waiting it out but got hit with API Error: Claude's response exceeded the 32000 output token maximum. To configure this behaviour, set the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable

Error Messages/Logs

RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

When Claude Code uses thinking (medium), the thinking process stalls repeatedly at round token counts (observed: 33.0k, 65.0k, 97.0k tokens). Each stall lasts long enough (more than 5 minutes) that the conversation cache expires. After a long wait I see tokens being spent again, but notice >20% jump in token usage for my 5-hour session! This makes using Claude Code unusable on Pro for any non-trivial task where you wait for Claude Code to finish its thinking process without interrupting.

I tried waiting it out but got hit with API Error: Claude's response exceeded the 32000 output token maximum. To configure this behaviour, set the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable

I've been experiencing the same issue in Claude.ai Code on desktop where my usage for the 5 hour interval is used up without a single line of code output. (Not exaggerating here)

What Should Happen?

Thinking should progress continuously or resumes within cache TTL window.

Error Messages/Logs

Steps to Reproduce

  1. Start Claude Code CLI in terminal
  2. Give a complex implementation task
  3. Observe thinking token counter pausing at round numbers
  4. Stall duration exceeds cache TTL, see hits on session token usage

Claude Model

Sonnet (default)

Is this a regression?

Yes, this worked in a previous version

Last Working Version

No response

Claude Code Version

2.1.123

Platform

Anthropic API

Operating System

Ubuntu/Debian Linux

Terminal/Shell

Other

Additional Information

No response

extent analysis

TL;DR

Set the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable to a higher value to potentially mitigate the issue of Claude Code stalling at round token counts.

Guidance

  • Investigate the relationship between the stalling token counts (33.0k, 65.0k, 97.0k) and the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable to determine if increasing this value can prevent stalling.
  • Verify that the conversation cache expiration is indeed causing the token usage jumps by monitoring the cache TTL and token usage patterns.
  • Test with a smaller task to see if the stalling issue still occurs, which could help determine if the problem is task-size related.
  • Consider reaching out to the Anthropic API support to inquire about any known issues with the Sonnet model and token usage.

Example

No code snippet is provided as the issue seems to be related to configuration and API interaction rather than code.

Notes

The exact cause of the stalling issue is unclear, and it's uncertain whether increasing CLAUDE_CODE_MAX_OUTPUT_TOKENS will fully resolve the problem. The fact that this worked in a previous version suggests a potential regression, but without the last working version specified, it's difficult to pinpoint the change that introduced this issue.

Recommendation

Apply workaround: Increase the CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable to a higher value to potentially mitigate the stalling issue, as this directly addresses the error message received and may help in preventing the conversation cache from expiring due to token limits.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Thinking stalls exceeding conversation cache TTL [1 participants]