claude-code - 💡(How to fix) Fix [BUG] Prompt cache expires during active session, causing massive token spikes on next prompt

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

  • Warn before sending a prompt that will hit a cold cache on a large context
  1. Before sending a prompt that would hit a cold cache on a large context, Claude Code should warn the user about the expected token cost, giving them a chance to /compact first.

Error Messages/Logs

Root Cause

  1. For comparison, send similar small prompts back-to-back without pauses and observe that each one consumes <1% of the window. This confirms the spike is caused by cache expiry during the idle period, not by the prompt itself.
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

Problem: Claude Code's prompt cache TTL (5 min default) expires based on time since last API call, not session activity. When a user pauses to think, review code, or handle interruptions for >5 minutes without sending a prompt, the entire context cache invalidates. The next prompt — even a trivial one — is re-processed at full input price.

Impact: With a ~900k token context on Opus 4.7 (1M beta), a single small prompt after a 45-minute pause consumed ~20% of the 5-hour usage window. The same prompt during continuous activity consumes <1%. The user has no visibility into cache state and no way to extend TTL from the CLI.

Reproduction:

  1. Build up context to 500k+ tokens (Opus 4.7, 1M context mode)
  2. Work actively for 30 min (cache stays warm, low token burn)
  3. Idle 10+ min without sending a prompt
  4. Send any small prompt → observe sudden large usage jump

Requested fixes (any one helps):

  • Make cache TTL configurable per session (expose the 1h option that already exists at API level)
  • Keep cache warm while the session is active (idle ping or session heartbeat)
  • Show cache expiry countdown in the statusline so users can act before it expires
  • Warn before sending a prompt that will hit a cold cache on a large context

Why this matters: Users are forced to choose between breaking their workflow (/compact and /clear lose task continuity) or losing significant quota on pauses they cannot avoid. Cache lifetime should follow session activity, not wall-clock idle time.

Environment:

  • Claude Code 2.1.114
  • Opus 4.7 (1M context beta)
  • Windows

What Should Happen?

When a user is actively working in a Claude Code session, the prompt cache should remain valid regardless of how long the user pauses between prompts. As long as the session is active (terminal open, conversation ongoing), the cache should not expire based on wall-clock idle time.

Specifically, any one of the following would solve the issue:

  1. Cache TTL should be configurable from the CLI (the 1-hour TTL option already exists at the API level but is not exposed to Claude Code users).

  2. The session itself should act as a heartbeat — as long as the Claude Code process is running and the conversation is open, the cache should be kept warm automatically.

  3. If cache expiry must remain time-based, the statusline should display a countdown showing when the cache will expire, so users can send a keep-alive prompt before losing their cache.

  4. Before sending a prompt that would hit a cold cache on a large context, Claude Code should warn the user about the expected token cost, giving them a chance to /compact first.

The current behavior — where a 45-minute pause silently invalidates the cache and causes the next small prompt to consume 20%+ of the 5-hour usage window — is unexpected and punishes normal workflow patterns (reviewing code, handling interruptions, thinking between prompts).

Error Messages/Logs

Steps to Reproduce

Environment:

  • Claude Code version: 2.1.114
  • Model: Opus 4.7 with 1M context beta enabled
  • OS: Windows
  • Plan: Claude Pro/Max subscription

Steps:

  1. Start a new Claude Code session in a medium-to-large project directory (any codebase works; the bug is about cache behavior, not project specifics).

  2. Select Opus 4.7 with 1M context via /model.

  3. Work actively for 30-60 minutes: read multiple files, run tool calls, have back-and-forth conversation. Goal is to fill the context to 500k+ tokens. Verify with /context — the "Messages" category should show 500k+ tokens.

  4. Run /cost and note the current 5-hour usage percentage (e.g., 75%).

  5. Stop interacting with Claude Code. Do not send any prompts. Leave the terminal open and the session active. Wait 10-15 minutes (or longer — I experienced this after a ~45 minute pause).

  6. Send any small prompt, for example: "list the files in the current directory" or even a one-word message like "continue".

  7. Before the response even completes, run /cost again. Observe that the usage percentage has jumped significantly — in my case, from ~80% to 100% on a single trivial prompt.

  8. For comparison, send similar small prompts back-to-back without pauses and observe that each one consumes <1% of the window. This confirms the spike is caused by cache expiry during the idle period, not by the prompt itself.

Expected: Small prompts should consume a small amount of quota, regardless of how long the user paused before sending them — as long as the session is still open and active.

Actual: After ~5+ minutes of idle time, the cache invalidates. The next prompt re-processes the entire context (900k+ tokens in my case) at full input price, with 1M context mode charging premium rates (2.5x) above 200k tokens. A 200-token prompt can end up billed as ~900k input tokens.

Minimal context for math:

  • Context at time of the problematic prompt: 912k tokens total (884.9k in Messages, per /context output)
  • Prompt size: ~200 tokens
  • Time since last prompt: ~45 minutes
  • Session was still open and active; no /clear or /compact had been run
  • Result: single prompt consumed ~20% of the 5-hour window

The bug is not in any specific code or file — it is in how cache TTL interacts with session activity. Any sufficiently large context + idle pause reproduces it.

Claude Model

None

Is this a regression?

Yes, this worked in a previous version

Last Working Version

No response

Claude Code Version

2.1.114

Platform

Anthropic API

Operating System

Windows

Terminal/Shell

Windows Terminal

Additional Information

No response

extent analysis

TL;DR

The most likely fix is to make the cache TTL configurable per session or implement a session heartbeat to keep the cache warm while the user is actively working.

Guidance

  • Investigate the possibility of exposing the existing 1-hour TTL option at the API level to Claude Code users, allowing them to configure the cache TTL per session.
  • Consider implementing a session heartbeat or idle ping to keep the cache warm while the user is actively working, preventing cache expiry due to idle time.
  • Display a cache expiry countdown in the statusline to inform users when the cache will expire, allowing them to send a keep-alive prompt before losing their cache.
  • Warn users before sending a prompt that will hit a cold cache on a large context, giving them a chance to compact the context first.

Example

No code snippet is provided as the issue is related to the interaction between the cache TTL and session activity, and the solution requires changes to the underlying system rather than a specific code fix.

Notes

The issue is a regression, and the previous working version is not specified. The fix should be tested thoroughly to ensure it resolves the issue without introducing new problems.

Recommendation

Apply a workaround, such as implementing a session heartbeat or displaying a cache expiry countdown, as these solutions can be implemented without requiring significant changes to the underlying system.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Prompt cache expires during active session, causing massive token spikes on next prompt