claude-code - 💡(How to fix) Fix [BUG] "Prompt too long" with respect to EXTREMELY inflated token usage. [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#52646Fetched 2026-04-24 06:01:34
View on GitHub
Comments
2
Participants
2
Timeline
6
Reactions
0
Timeline (top)
labeled ×4commented ×2

Error Message

Claude Code is catastrophically inflating message token counts, reporting approximately 15-16x more tokens than the session actually contains. This is not a compaction issue, not a context window sizing issue, and not a configuration problem. The message history is being massively overcounted, which then triggers "Prompt too long" API errors well before the context window is legitimately full, burning through session usage limits on trivial operations. My original report was focused on getting the "Prompt too long" error, as it was kicking in at 170k tokens even with the limit being 200k. When looking into it, I believe I found an even worse bug that directly correlates.

Error Messages/Logs

<p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">After repeatedly hitting this, I asked Claude Code directly why auto-compaction wasn't triggering. It checked, found no threshold was configured, and wrote an <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">autoCompactThreshold</code> value to <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">settings.json</code>. The error persisted anyway, still hitting the wall at approximately 170k tokens. If the configured threshold is 180k and the session is bricking at 170k, the setting is either being ignored entirely or the token inflation is so severe that it blows past the compaction trigger window before compaction ever has a chance to run. Given the inflation math stated above, the latter seems more likely. <li class="whitespace-normal break-words pl-2"><code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">claude --version</code> is currently throwing an error due to session usage limit, exact build cannot be confirmed. Most recent version as of April 23, 2026</li>

Fix Action

Fix / Workaround

Message token counts should reflect the actual content of the conversation. A session containing ~8,000-15,000 tokens of real content should not report 147,500 tokens in the messages field. Tool call results and file reads should be handled without being permanently appended to the message chain in full on every subsequent request. Auto-compaction should serve as a fallback for legitimately long sessions, but it should not need to exist as a workaround for a token counting bug that inflates a 9k session to 147k.
<p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"><strong>Partial workaround discovered:</strong></p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">While "Prompt too long" blocks all normal commands including <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">/compact</code> and debug tools, **BUT** MCP tools appear to remain callable in some cases. Manually invoking a systematic-debugging skill through a Superpowers plugin forced a compaction and brought the context from ~170k down to ~20k. This is not a real fix, it's an accidental workaround I found when trying to figure out what was going on, but it does confirm that the underlying model is still functional and the session data is not actually 147k tokens worth of real content. It also suggests the block is happening at the message handling layer rather than at the model level.</p> <hr class="border-border-200 border-t-0.5 my-3 mx-1.5"> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"><strong>Other relevant details:</strong></p> <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="whitespace-normal break-words pl-2"><code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">claude --version</code> is currently throwing an error due to session usage limit, exact build cannot be confirmed. Most recent version as of April 23, 2026</li> <li class="whitespace-normal break-words pl-2">"Prompt too long" blocks all commands within the session including <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">/compact</code> and debug tools</li> <li class="whitespace-normal break-words pl-2">High effort used in Session 1, Medium in Sessions 2 and 3</li> <li class="whitespace-normal break-words pl-2">Identical workflow ran without issue prior.</li> <li class="whitespace-normal break-words pl-2">A non-Anthropic agent completed equivalent work in the same period without approaching its session limits, on a free tier. Had to use it to basically fix what was going on every time there was a limit of some sort hit in the literal middle of doing something. </li></ul><!--EndFragment--> </body> </html>

Code Example

Message token counts should reflect the actual content of the conversation. A session containing ~8,000-15,000 tokens of real content should not report 147,500 tokens in the messages field. Tool call results and file reads should be handled without being permanently appended to the message chain in full on every subsequent request. Auto-compaction should serve as a fallback for legitimately long sessions, but it should not need to exist as a workaround for a token counting bug that inflates a 9k session to 147k.
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

Claude Code is catastrophically inflating message token counts, reporting approximately 15-16x more tokens than the session actually contains. This is not a compaction issue, not a context window sizing issue, and not a configuration problem. The message history is being massively overcounted, which then triggers "Prompt too long" API errors well before the context window is legitimately full, burning through session usage limits on trivial operations. My original report was focused on getting the "Prompt too long" error, as it was kicking in at 170k tokens even with the limit being 200k. When looking into it, I believe I found an even worse bug that directly correlates.

For reference, this session had this context usage;

<img width="347" height="260" alt="Image" src="https://github.com/user-attachments/assets/4ba6f5e2-d188-4da6-81ec-73b60371ff2e" /> 147k for messages is.. Weird.

Full session text content: ~13,334 characters. At a standard average of ~4 characters per token for plain English prose, this comes to approximately ~3,335 tokens of text. However, the session included code output, file reads, edit notations (+137 -12 style), file paths, and mixed technical content, all of which tokenize less efficiently than plain prose. Accounting for that variance, a realistic upper estimate for text content sits closer to ~5,000-6,000 tokens.

Image 1 | 1270x804 | 6 tiles | ~1,105 tokens Image 2 | 1262x1012 | 6 tiles | ~1,105 tokens Image 3 | 1339x1036 | 9 tiles | ~1,615 tokens Image 4 | 1264x935 | 6 tiles | ~1,105 tokens Image 5 | 868x796 | 4 tiles | ~765 tokens

Total image cost using Claude's tiling model (170 tokens per tile + 85 base): ~5,695 tokens.

Conservative total estimate: ~12,000-15,000 tokens accounting for tokenization variance on code-heavy content and generous rounding on overhead. Reported messages: 147,500 tokens. That is approximately 8-15x more than the entire session content could reasonably account for, even applying the most generous possible tokenization assumptions. This level of inflation is not explainable by CLAUDE.md overhead, tool schema injection, or tokenization variance on any combination of content in this session.

I am glad I found this with relation to my original bug, as it explains everything I observed. "Prompt too long" at ~170k tokens despite minimal actual content

Sessions hitting usage limits on trivially small operations

Auto-compaction doing nothing even after an autoCompactThreshold was explicitly written to settings.json by Claude Code itself during the affected session

Context window reading as 90% full with content that should occupy maybe 5%

This is a clear and recent regression. The same workflow ran without any of these issues prior to approximately April 22, 2026.

What Should Happen?

Auto-compaction should fire proactively before context approaches limits by default with some sort of fallback if there isn't anything specifically in place, freeing enough headroom to continue the session. If compaction genuinely can't recover enough space, Claude Code should surface a clear and actionable warning before the session bricks, not a cryptic mid-task "Prompt too long" that leaves zero recovery options. At minimum, a manual /compact should still be executable even when context is high. Token and session usage accounting also should not be inflating trivial operations to the cost of full project generations.

Error Messages/Logs

Message token counts should reflect the actual content of the conversation. A session containing ~8,000-15,000 tokens of real content should not report 147,500 tokens in the messages field. Tool call results and file reads should be handled without being permanently appended to the message chain in full on every subsequent request. Auto-compaction should serve as a fallback for legitimately long sessions, but it should not need to exist as a workaround for a token counting bug that inflates a 9k session to 147k.

Steps to Reproduce

  1. Start a Claude Code session with a CLAUDE.md present, using High effort mode
  2. Issue a large generative task spanning multiple files
  3. Observe session usage limit hit mid-execution
  4. When usage resets, open a new session and send a short briefing prompt summarizing external changes
  5. Observe "Prompt too long" immediately despite minimal context
  6. Open another fresh session, have Claude review main project files and confirm state
  7. Issue a minor refining prompt resulting in ~150 lines changed total
  8. Observe context window already at approximately 50% before meaningful work has begun
  9. Issue a second minor prompt, moderate redesign, approximately 300 lines changed
  10. Observe "Prompt too long" mid-execution with session usage at ~60%
  11. Open a third session, ask Claude to review current project state
  12. Observe session usage limit hit after Claude reads under 150 lines
  13. Within any affected session, open the context window breakdown and observe Messages consuming a disproportionate share of total tokens
  14. Do a full CTRL+A of the session text, count characters, calculate expected token cost, compare against reported Messages value

Claude Model

Sonnet (default)

Is this a regression?

Yes, this worked in a previous version

Last Working Version

No response

Claude Code Version

Prompt is too long.

Platform

Anthropic API

Operating System

Windows

Terminal/Shell

Windows Terminal

Additional Information

<html> <body> <!--StartFragment--><h4 class="text-text-100 mt-2 -mb-1 text-base font-bold">Additional Information</h4> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"><strong>Impact across three sessions:</strong></p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"><strong>Session 1</strong> Full project scaffold, High effort mode, thousands of lines across multiple files. Session usage limit hit mid-execution which is reluctantly acceptable given the scale, but the context window itself was still fine. Had to bring in an external non-Anthropic agent to finish what got cut off. When usage reset, sent a short briefing prompt to update Claude on what had been fixed. Single prompt. "Prompt too long." Context window was nowhere near full, sitting around 170k despite the brief being short. Really??</p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"> <img width="525" height="266" alt="Image" src="https://github.com/user-attachments/assets/dac2c455-33f2-42b2-8e69-accfda3b4e3d" /> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"><strong>Session 2</strong> Fresh session, had Claude review the core ~1500 line Python file and confirm current state. Context window was already around 50% full before any real work started. First refining prompt: 137 lines added, 12 removed, app launched fine. Second prompt: moderate redesign, plan confirmed, 114 lines added, 194 removed. "Prompt too long" mid-execution. Session usage at ~60%. Reading under 1500 lines and changing under 300 lines across two prompts somehow consumed 60% of a 5-hour session budget. Very useful.</p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"> <img width="350" height="247" alt="Image" src="https://github.com/user-attachments/assets/a10e6cb3-c558-4e38-863c-b34c2c1b3d31" /> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"><strong>Session 3</strong> Asked Claude to review the current state of the project and list issues. It found several crash-causing bugs, listed them, gave a short plan. I said continue. It read under 150 lines. Session usage limit hit.</p> <hr class="border-border-200 border-t-0.5 my-3 mx-1.5"> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"><strong>Attempted fix that also did nothing:</strong></p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">After repeatedly hitting this, I asked Claude Code directly why auto-compaction wasn't triggering. It checked, found no threshold was configured, and wrote an <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">autoCompactThreshold</code> value to <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">settings.json</code>. The error persisted anyway, still hitting the wall at approximately 170k tokens. If the configured threshold is 180k and the session is bricking at 170k, the setting is either being ignored entirely or the token inflation is so severe that it blows past the compaction trigger window before compaction ever has a chance to run. Given the inflation math stated above, the latter seems more likely. <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"><strong>Partial workaround discovered:</strong></p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">While "Prompt too long" blocks all normal commands including <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">/compact</code> and debug tools, **BUT** MCP tools appear to remain callable in some cases. Manually invoking a systematic-debugging skill through a Superpowers plugin forced a compaction and brought the context from ~170k down to ~20k. This is not a real fix, it's an accidental workaround I found when trying to figure out what was going on, but it does confirm that the underlying model is still functional and the session data is not actually 147k tokens worth of real content. It also suggests the block is happening at the message handling layer rather than at the model level.</p> <hr class="border-border-200 border-t-0.5 my-3 mx-1.5"> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]"><strong>Other relevant details:</strong></p> <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="whitespace-normal break-words pl-2"><code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">claude --version</code> is currently throwing an error due to session usage limit, exact build cannot be confirmed. Most recent version as of April 23, 2026</li> <li class="whitespace-normal break-words pl-2">"Prompt too long" blocks all commands within the session including <code class="bg-text-200/5 border border-0.5 border-border-300 text-danger-000 whitespace-pre-wrap rounded-[0.4rem] px-1 py-px text-[0.9rem]">/compact</code> and debug tools</li> <li class="whitespace-normal break-words pl-2">High effort used in Session 1, Medium in Sessions 2 and 3</li> <li class="whitespace-normal break-words pl-2">Identical workflow ran without issue prior.</li> <li class="whitespace-normal break-words pl-2">A non-Anthropic agent completed equivalent work in the same period without approaching its session limits, on a free tier. Had to use it to basically fix what was going on every time there was a limit of some sort hit in the literal middle of doing something. </li></ul><!--EndFragment--> </body> </html>

extent analysis

TL;DR

The issue can be temporarily mitigated by manually invoking a systematic-debugging skill through a Superpowers plugin to force compaction and reduce the context size.

Guidance

  • Investigate the token counting mechanism to identify the cause of the inflation, as the reported token count is significantly higher than the estimated token count based on the session content.
  • Verify that the autoCompactThreshold setting is being applied correctly, as the issue persists even after configuring this setting.
  • Test the workflow with a different effort mode (e.g., Low or Medium) to see if the issue is specific to High effort mode.
  • Consider using a non-Anthropic agent as a workaround, as it was able to complete equivalent work without approaching session limits.

Example

No code snippet is provided, as the issue is related to the Claude Code's token counting mechanism and session management.

Notes

The issue is likely related to a regression introduced in a recent version of Claude Code, as the same workflow ran without issues prior to April 22, 2026. The exact cause of the token inflation is unknown and requires further investigation.

Recommendation

Apply the workaround by manually invoking a systematic-debugging skill through a Superpowers plugin to force compaction and reduce the context size, as this has been shown to temporarily mitigate the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [BUG] "Prompt too long" with respect to EXTREMELY inflated token usage. [2 comments, 2 participants]