claude-code - 💡(How to fix) Fix Long sessions cause dramatically increased response times — is caching/context size the culprit?

claude-code2026-04-20 22:08:08

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

At the start of a session, responses and code generation feel fast. As the session accumulates context (long debugging sessions, many tool calls, etc.), response times slow significantly — to the point where simple reasoning or short code edits take noticeably longer.

Root Cause

After extended coding sessions, response times appear to degrade dramatically (roughly 10x slower compared to the start of the session). Wondering if others are experiencing this and what the root cause is.

Fix Action

Fix / Workaround

Is the large cached context (98k tokens) the primary driver of latency, even with a 100% cache hit rate?
Does conditioning on a large context window add non-trivial generation latency even when the tokens are cached?
Is anyone else seeing a ~10x slowdown as sessions approach 50%+ of the context window?
Is the recommended mitigation just /compact more aggressively, or is there a smarter strategy?

Workaround tried

RAW_BUFFERClick to expand / collapse

Issue

Session details when noticed

Context: ~98k / 200k tokens (49% of window)
Cache hit rate: 100% (98k cached, ~185 new tokens per turn)
Compactions: 0 in the current session (prior session was compacted before handoff)
Model: claude-sonnet-4-6

Observed behavior

Questions

Is the large cached context (98k tokens) the primary driver of latency, even with a 100% cache hit rate?
Does conditioning on a large context window add non-trivial generation latency even when the tokens are cached?
Is anyone else seeing a ~10x slowdown as sessions approach 50%+ of the context window?
Is the recommended mitigation just /compact more aggressively, or is there a smarter strategy?

Workaround tried

Running /compact reduces context size and appears to help. Starting a fresh session also resolves it. But it would be helpful to understand the threshold at which this becomes noticeable and whether there's a way to get ahead of it automatically.

Happy to provide more diagnostics if useful.

extent analysis

TL;DR

Compacting the context more aggressively may help mitigate the significant slowdown in response times observed as the session accumulates context.

Guidance

Investigate the relationship between context size and response time to determine the threshold at which slowdowns become noticeable.
Consider implementing a strategy to automatically compact the context when it reaches a certain size or percentage of the window.
Monitor cache hit rates and compaction frequency to understand their impact on performance.
Experiment with different compaction intervals or thresholds to find a balance between performance and context retention.

Example

No specific code snippet is provided as the issue is more related to the usage and configuration of the model rather than a specific code implementation.

Notes

The exact threshold at which the slowdown becomes noticeable and the most effective compaction strategy may vary depending on the specific use case and model configuration. Further experimentation and diagnostics may be necessary to determine the optimal approach.

Recommendation

Apply workaround: Compact the context more aggressively, as it has been shown to reduce context size and improve response times. This approach can help mitigate the slowdown until a more permanent solution is found.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#indexing error #inference speed #output truncation #response parsing #generation error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Long sessions cause dramatically increased response times — is caching/context size the culprit?

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround tried

Issue

Session details when noticed

Observed behavior

Questions

Workaround tried

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Long sessions cause dramatically increased response times — is caching/context size the culprit?

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround tried

Issue

Session details when noticed

Observed behavior

Questions

Workaround tried

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING