claude-code - 💡(How to fix) Fix [BUG] "Prompt too long" with respect to EXTREMELY inflated token usage. [2 comments, 2 participants]

claude-code2026-04-24 01:06:28

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#52646•Fetched 2026-04-24 06:01:34

View on GitHub

Comments

Participants

Timeline

Reactions

Author

SaturnityGL

Participants

github-actions[bot]

SaturnityGL

Timeline (top)

labeled ×4commented ×2

Error Message

Claude Code is catastrophically inflating message token counts, reporting approximately 15-16x more tokens than the session actually contains. This is not a compaction issue, not a context window sizing issue, and not a configuration problem. The message history is being massively overcounted, which then triggers "Prompt too long" API errors well before the context window is legitimately full, burning through session usage limits on trivial operations. My original report was focused on getting the "Prompt too long" error, as it was kicking in at 170k tokens even with the limit being 200k. When looking into it, I believe I found an even worse bug that directly correlates.

Error Messages/Logs

After repeatedly hitting this, I asked Claude Code directly why auto-compaction wasn't triggering. It checked, found no threshold was configured, and wrote an <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">autoCompactThreshold</code> value to <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">settings.json</code>. The error persisted anyway, still hitting the wall at approximately 170k tokens. If the configured threshold is 180k and the session is bricking at 170k, the setting is either being ignored entirely or the token inflation is so severe that it blows past the compaction trigger window before compaction ever has a chance to run. Given the inflation math stated above, the latter seems more likely. <li class="whitespace-normal break-words pl-2"><code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">claude --version</code> is currently throwing an error due to session usage limit, exact build cannot be confirmed. Most recent version as of April 23, 2026</li>

Fix Action

Fix / Workaround

Message token counts should reflect the actual content of the conversation. A session containing ~8,000-15,000 tokens of real content should not report 147,500 tokens in the messages field. Tool call results and file reads should be handled without being permanently appended to the message chain in full on every subsequent request. Auto-compaction should serve as a fallback for legitimately long sessions, but it should not need to exist as a workaround for a token counting bug that inflates a 9k session to 147k.

Partial workaround discovered: While "Prompt too long" blocks all normal commands including <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">/compact</code> and debug tools, **BUT** MCP tools appear to remain callable in some cases. Manually invoking a systematic-debugging skill through a Superpowers plugin forced a compaction and brought the context from ~170k down to ~20k. This is not a real fix, it's an accidental workaround I found when trying to figure out what was going on, but it does confirm that the underlying model is still functional and the session data is not actually 147k tokens worth of real content. It also suggests the block is happening at the message handling layer rather than at the model level. <hr class="border-border-200 border-t-0.5 my-3 mx-1.5"> Other relevant details: <ul class="[li_&]:mb-0 [li_&]:mt-1 [li_&]:gap-1 [&:not(:last-child)_ul]:pb-1 [&:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="whitespace-normal break-words pl-2"><code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">claude --version</code> is currently throwing an error due to session usage limit, exact build cannot be confirmed. Most recent version as of April 23, 2026</li> <li class="whitespace-normal break-words pl-2">"Prompt too long" blocks all commands within the session including <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">/compact</code> and debug tools</li> <li class="whitespace-normal break-words pl-2">High effort used in Session 1, Medium in Sessions 2 and 3</li> <li class="whitespace-normal break-words pl-2">Identical workflow ran without issue prior.</li> <li class="whitespace-normal break-words pl-2">A non-Anthropic agent completed equivalent work in the same period without approaching its session limits, on a free tier. Had to use it to basically fix what was going on every time there was a limit of some sort hit in the literal middle of doing something. </li></ul> </body> </html>

Code Example

Message token counts should reflect the actual content of the conversation. A session containing ~8,000-15,000 tokens of real content should not report 147,500 tokens in the messages field. Tool call results and file reads should be handled without being permanently appended to the message chain in full on every subsequent request. Auto-compaction should serve as a fallback for legitimately long sessions, but it should not need to exist as a workaround for a token counting bug that inflates a 9k session to 147k.

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

For reference, this session had this context usage;

<img width="347" height="260" alt="Image" src="https://github.com/user-attachments/assets/4ba6f5e2-d188-4da6-81ec-73b60371ff2e" /> 147k for messages is.. Weird.

Full session text content: ~13,334 characters. At a standard average of ~4 characters per token for plain English prose, this comes to approximately ~3,335 tokens of text. However, the session included code output, file reads, edit notations (+137 -12 style), file paths, and mixed technical content, all of which tokenize less efficiently than plain prose. Accounting for that variance, a realistic upper estimate for text content sits closer to ~5,000-6,000 tokens.

Total image cost using Claude's tiling model (170 tokens per tile + 85 base): ~5,695 tokens.

Conservative total estimate: ~12,000-15,000 tokens accounting for tokenization variance on code-heavy content and generous rounding on overhead. Reported messages: 147,500 tokens. That is approximately 8-15x more than the entire session content could reasonably account for, even applying the most generous possible tokenization assumptions. This level of inflation is not explainable by CLAUDE.md overhead, tool schema injection, or tokenization variance on any combination of content in this session.

I am glad I found this with relation to my original bug, as it explains everything I observed. "Prompt too long" at ~170k tokens despite minimal actual content

Sessions hitting usage limits on trivially small operations

Auto-compaction doing nothing even after an autoCompactThreshold was explicitly written to settings.json by Claude Code itself during the affected session

Context window reading as 90% full with content that should occupy maybe 5%

This is a clear and recent regression. The same workflow ran without any of these issues prior to approximately April 22, 2026.

What Should Happen?

Auto-compaction should fire proactively before context approaches limits by default with some sort of fallback if there isn't anything specifically in place, freeing enough headroom to continue the session. If compaction genuinely can't recover enough space, Claude Code should surface a clear and actionable warning before the session bricks, not a cryptic mid-task "Prompt too long" that leaves zero recovery options. At minimum, a manual /compact should still be executable even when context is high. Token and session usage accounting also should not be inflating trivial operations to the cost of full project generations.

Error Messages/Logs

Message token counts should reflect the actual content of the conversation. A session containing ~8,000-15,000 tokens of real content should not report 147,500 tokens in the messages field. Tool call results and file reads should be handled without being permanently appended to the message chain in full on every subsequent request. Auto-compaction should serve as a fallback for legitimately long sessions, but it should not need to exist as a workaround for a token counting bug that inflates a 9k session to 147k.

Steps to Reproduce

Start a Claude Code session with a CLAUDE.md present, using High effort mode
Issue a large generative task spanning multiple files
Observe session usage limit hit mid-execution
When usage resets, open a new session and send a short briefing prompt summarizing external changes
Observe "Prompt too long" immediately despite minimal context
Open another fresh session, have Claude review main project files and confirm state
Issue a minor refining prompt resulting in ~150 lines changed total
Observe context window already at approximately 50% before meaningful work has begun
Issue a second minor prompt, moderate redesign, approximately 300 lines changed
Observe "Prompt too long" mid-execution with session usage at ~60%
Open a third session, ask Claude to review current project state
Observe session usage limit hit after Claude reads under 150 lines
Within any affected session, open the context window breakdown and observe Messages consuming a disproportionate share of total tokens
Do a full CTRL+A of the session text, count characters, calculate expected token cost, compare against reported Messages value

Claude Model

Sonnet (default)

Is this a regression?

Yes, this worked in a previous version

Last Working Version

No response

Claude Code Version

Prompt is too long.

Platform

Anthropic API

Operating System

Windows

Terminal/Shell

Windows Terminal

Additional Information

<html> <body> <h4 class="text-text-100 mt-2 -mb-1 text-base font-bold">Additional Information</h4> Impact across three sessions: Session 1 Full project scaffold, High effort mode, thousands of lines across multiple files. Session usage limit hit mid-execution which is reluctantly acceptable given the scale, but the context window itself was still fine. Had to bring in an external non-Anthropic agent to finish what got cut off. When usage reset, sent a short briefing prompt to update Claude on what had been fixed. Single prompt. "Prompt too long." Context window was nowhere near full, sitting around 170k despite the brief being short. Really?? <img width="525" height="266" alt="Image" src="https://github.com/user-attachments/assets/dac2c455-33f2-42b2-8e69-accfda3b4e3d" /> Session 2 Fresh session, had Claude review the core ~1500 line Python file and confirm current state. Context window was already around 50% full before any real work started. First refining prompt: 137 lines added, 12 removed, app launched fine. Second prompt: moderate redesign, plan confirmed, 114 lines added, 194 removed. "Prompt too long" mid-execution. Session usage at ~60%. Reading under 1500 lines and changing under 300 lines across two prompts somehow consumed 60% of a 5-hour session budget. Very useful. <img width="350" height="247" alt="Image" src="https://github.com/user-attachments/assets/a10e6cb3-c558-4e38-863c-b34c2c1b3d31" /> Session 3 Asked Claude to review the current state of the project and list issues. It found several crash-causing bugs, listed them, gave a short plan. I said continue. It read under 150 lines. Session usage limit hit. <hr class="border-border-200 border-t-0.5 my-3 mx-1.5"> Attempted fix that also did nothing: After repeatedly hitting this, I asked Claude Code directly why auto-compaction wasn't triggering. It checked, found no threshold was configured, and wrote an <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">autoCompactThreshold</code> value to <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">settings.json</code>. The error persisted anyway, still hitting the wall at approximately 170k tokens. If the configured threshold is 180k and the session is bricking at 170k, the setting is either being ignored entirely or the token inflation is so severe that it blows past the compaction trigger window before compaction ever has a chance to run. Given the inflation math stated above, the latter seems more likely. Partial workaround discovered: While "Prompt too long" blocks all normal commands including <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">/compact</code> and debug tools, **BUT** MCP tools appear to remain callable in some cases. Manually invoking a systematic-debugging skill through a Superpowers plugin forced a compaction and brought the context from ~170k down to ~20k. This is not a real fix, it's an accidental workaround I found when trying to figure out what was going on, but it does confirm that the underlying model is still functional and the session data is not actually 147k tokens worth of real content. It also suggests the block is happening at the message handling layer rather than at the model level. <hr class="border-border-200 border-t-0.5 my-3 mx-1.5"> Other relevant details: <ul class="[li_&]:mb-0 [li_&]:mt-1 [li_&]:gap-1 [&:not(:last-child)_ul]:pb-1 [&:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="whitespace-normal break-words pl-2"><code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">claude --version</code> is currently throwing an error due to session usage limit, exact build cannot be confirmed. Most recent version as of April 23, 2026</li> <li class="whitespace-normal break-words pl-2">"Prompt too long" blocks all commands within the session including <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">/compact</code> and debug tools</li> <li class="whitespace-normal break-words pl-2">High effort used in Session 1, Medium in Sessions 2 and 3</li> <li class="whitespace-normal break-words pl-2">Identical workflow ran without issue prior.</li> <li class="whitespace-normal break-words pl-2">A non-Anthropic agent completed equivalent work in the same period without approaching its session limits, on a free tier. Had to use it to basically fix what was going on every time there was a limit of some sort hit in the literal middle of doing something. </li></ul> </body> </html>

extent analysis

TL;DR

The issue can be temporarily mitigated by manually invoking a systematic-debugging skill through a Superpowers plugin to force compaction and reduce the context size.

Guidance

Investigate the token counting mechanism to identify the cause of the inflation, as the reported token count is significantly higher than the estimated token count based on the session content.
Verify that the autoCompactThreshold setting is being applied correctly, as the issue persists even after configuring this setting.
Test the workflow with a different effort mode (e.g., Low or Medium) to see if the issue is specific to High effort mode.
Consider using a non-Anthropic agent as a workaround, as it was able to complete equivalent work without approaching session limits.

Example

No code snippet is provided, as the issue is related to the Claude Code's token counting mechanism and session management.

Notes

The issue is likely related to a regression introduced in a recent version of Claude Code, as the same workflow ran without issues prior to April 22, 2026. The exact cause of the token inflation is unknown and requires further investigation.

Recommendation

Apply the workaround by manually invoking a systematic-debugging skill through a Superpowers plugin to force compaction and reduce the context size, as this has been shown to temporarily mitigate the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [BUG] "Prompt too long" with respect to EXTREMELY inflated token usage. [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Fix Action

Fix / Workaround

Code Example

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [BUG] "Prompt too long" with respect to EXTREMELY inflated token usage. [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Fix Action

Fix / Workaround

Code Example

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING