claude-code - 💡(How to fix) Fix Long sessions cause dramatically increased response times — is caching/context size the culprit?

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

At the start of a session, responses and code generation feel fast. As the session accumulates context (long debugging sessions, many tool calls, etc.), response times slow significantly — to the point where simple reasoning or short code edits take noticeably longer.

Root Cause

After extended coding sessions, response times appear to degrade dramatically (roughly 10x slower compared to the start of the session). Wondering if others are experiencing this and what the root cause is.

Fix Action

Fix / Workaround

  • Is the large cached context (98k tokens) the primary driver of latency, even with a 100% cache hit rate?
  • Does conditioning on a large context window add non-trivial generation latency even when the tokens are cached?
  • Is anyone else seeing a ~10x slowdown as sessions approach 50%+ of the context window?
  • Is the recommended mitigation just /compact more aggressively, or is there a smarter strategy?

Workaround tried

RAW_BUFFERClick to expand / collapse

Issue

After extended coding sessions, response times appear to degrade dramatically (roughly 10x slower compared to the start of the session). Wondering if others are experiencing this and what the root cause is.

Session details when noticed

  • Context: ~98k / 200k tokens (49% of window)
  • Cache hit rate: 100% (98k cached, ~185 new tokens per turn)
  • Compactions: 0 in the current session (prior session was compacted before handoff)
  • Model: claude-sonnet-4-6

Observed behavior

At the start of a session, responses and code generation feel fast. As the session accumulates context (long debugging sessions, many tool calls, etc.), response times slow significantly — to the point where simple reasoning or short code edits take noticeably longer.

Questions

  • Is the large cached context (98k tokens) the primary driver of latency, even with a 100% cache hit rate?
  • Does conditioning on a large context window add non-trivial generation latency even when the tokens are cached?
  • Is anyone else seeing a ~10x slowdown as sessions approach 50%+ of the context window?
  • Is the recommended mitigation just /compact more aggressively, or is there a smarter strategy?

Workaround tried

Running /compact reduces context size and appears to help. Starting a fresh session also resolves it. But it would be helpful to understand the threshold at which this becomes noticeable and whether there's a way to get ahead of it automatically.


Happy to provide more diagnostics if useful.

extent analysis

TL;DR

Compacting the context more aggressively may help mitigate the significant slowdown in response times observed as the session accumulates context.

Guidance

  • Investigate the relationship between context size and response time to determine the threshold at which slowdowns become noticeable.
  • Consider implementing a strategy to automatically compact the context when it reaches a certain size or percentage of the window.
  • Monitor cache hit rates and compaction frequency to understand their impact on performance.
  • Experiment with different compaction intervals or thresholds to find a balance between performance and context retention.

Example

No specific code snippet is provided as the issue is more related to the usage and configuration of the model rather than a specific code implementation.

Notes

The exact threshold at which the slowdown becomes noticeable and the most effective compaction strategy may vary depending on the specific use case and model configuration. Further experimentation and diagnostics may be necessary to determine the optimal approach.

Recommendation

Apply workaround: Compact the context more aggressively, as it has been shown to reduce context size and improve response times. This approach can help mitigate the slowdown until a more permanent solution is found.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Long sessions cause dramatically increased response times — is caching/context size the culprit?