claude-code - 💡(How to fix) Fix [BUG] Prompt cache expires during active session, causing massive token spikes on next prompt

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

Problem: Claude Code's prompt cache TTL (5 min default) expires based on time since last API call, not session activity. When a user pauses to think, review code, or handle interruptions for >5 minutes without sending a prompt, the entire context cache invalidates. The next prompt — even a trivial one — is re-processed at full input price.

Impact: With a ~900k token context on Opus 4.7 (1M beta), a single small prompt after a 45-minute pause consumed ~20% of the 5-hour usage window. The same prompt during continuous activity consumes <1%. The user has no visibility into cache state and no way to extend TTL from the CLI.

Reproduction:

Build up context to 500k+ tokens (Opus 4.7, 1M context mode)
Work actively for 30 min (cache stays warm, low token burn)
Idle 10+ min without sending a prompt
Send any small prompt → observe sudden large usage jump

Requested fixes (any one helps):

Make cache TTL configurable per session (expose the 1h option that already exists at API level)
Keep cache warm while the session is active (idle ping or session heartbeat)
Show cache expiry countdown in the statusline so users can act before it expires
Warn before sending a prompt that will hit a cold cache on a large context

Why this matters: Users are forced to choose between breaking their workflow (/compact and /clear lose task continuity) or losing significant quota on pauses they cannot avoid. Cache lifetime should follow session activity, not wall-clock idle time.

Environment:

Claude Code 2.1.114
Opus 4.7 (1M context beta)
Windows

What Should Happen?

When a user is actively working in a Claude Code session, the prompt cache should remain valid regardless of how long the user pauses between prompts. As long as the session is active (terminal open, conversation ongoing), the cache should not expire based on wall-clock idle time.

Specifically, any one of the following would solve the issue:

Cache TTL should be configurable from the CLI (the 1-hour TTL option already exists at the API level but is not exposed to Claude Code users).
The session itself should act as a heartbeat — as long as the Claude Code process is running and the conversation is open, the cache should be kept warm automatically.
If cache expiry must remain time-based, the statusline should display a countdown showing when the cache will expire, so users can send a keep-alive prompt before losing their cache.
Before sending a prompt that would hit a cold cache on a large context, Claude Code should warn the user about the expected token cost, giving them a chance to /compact first.

The current behavior — where a 45-minute pause silently invalidates the cache and causes the next small prompt to consume 20%+ of the 5-hour usage window — is unexpected and punishes normal workflow patterns (reviewing code, handling interruptions, thinking between prompts).

Error Messages/Logs

Steps to Reproduce

Environment:

Claude Code version: 2.1.114
Model: Opus 4.7 with 1M context beta enabled
OS: Windows
Plan: Claude Pro/Max subscription

Steps:

Start a new Claude Code session in a medium-to-large project directory (any codebase works; the bug is about cache behavior, not project specifics).
Select Opus 4.7 with 1M context via /model.
Work actively for 30-60 minutes: read multiple files, run tool calls, have back-and-forth conversation. Goal is to fill the context to 500k+ tokens. Verify with /context — the "Messages" category should show 500k+ tokens.
Run /cost and note the current 5-hour usage percentage (e.g., 75%).
Stop interacting with Claude Code. Do not send any prompts. Leave the terminal open and the session active. Wait 10-15 minutes (or longer — I experienced this after a ~45 minute pause).
Send any small prompt, for example: "list the files in the current directory" or even a one-word message like "continue".
Before the response even completes, run /cost again. Observe that the usage percentage has jumped significantly — in my case, from ~80% to 100% on a single trivial prompt.
For comparison, send similar small prompts back-to-back without pauses and observe that each one consumes <1% of the window. This confirms the spike is caused by cache expiry during the idle period, not by the prompt itself.

Expected: Small prompts should consume a small amount of quota, regardless of how long the user paused before sending them — as long as the session is still open and active.

Actual: After ~5+ minutes of idle time, the cache invalidates. The next prompt re-processes the entire context (900k+ tokens in my case) at full input price, with 1M context mode charging premium rates (2.5x) above 200k tokens. A 200-token prompt can end up billed as ~900k input tokens.

Minimal context for math:

Context at time of the problematic prompt: 912k tokens total (884.9k in Messages, per /context output)
Prompt size: ~200 tokens
Time since last prompt: ~45 minutes
Session was still open and active; no /clear or /compact had been run
Result: single prompt consumed ~20% of the 5-hour window

The bug is not in any specific code or file — it is in how cache TTL interacts with session activity. Any sufficiently large context + idle pause reproduces it.

Claude Model

None

Is this a regression?

Yes, this worked in a previous version

Last Working Version

No response

Claude Code Version

2.1.114

Platform

Anthropic API

Operating System

Windows

Terminal/Shell

Windows Terminal

Additional Information

No response

extent analysis

TL;DR

The most likely fix is to make the cache TTL configurable per session or implement a session heartbeat to keep the cache warm while the user is actively working.

Guidance

Investigate the possibility of exposing the existing 1-hour TTL option at the API level to Claude Code users, allowing them to configure the cache TTL per session.
Consider implementing a session heartbeat or idle ping to keep the cache warm while the user is actively working, preventing cache expiry due to idle time.
Display a cache expiry countdown in the statusline to inform users when the cache will expire, allowing them to send a keep-alive prompt before losing their cache.
Warn users before sending a prompt that will hit a cold cache on a large context, giving them a chance to compact the context first.

Example

No code snippet is provided as the issue is related to the interaction between the cache TTL and session activity, and the solution requires changes to the underlying system rather than a specific code fix.

Notes

The issue is a regression, and the previous working version is not specified. The fix should be tested thoroughly to ensure it resolves the issue without introducing new problems.

Recommendation

Apply a workaround, such as implementing a session heartbeat or displaying a cache expiry countdown, as these solutions can be implemented without requiring significant changes to the underlying system.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [BUG] Prompt cache expires during active session, causing massive token spikes on next prompt

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Root Cause

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Prompt cache expires during active session, causing massive token spikes on next prompt

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Root Cause

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING