claude-code - 💡(How to fix) Fix [FEATURE] Document optimal context window ranges for 1M token models [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46802Fetched 2026-04-12 13:32:41
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×3

With the move to 1M token context windows, users observe significant quality degradation well before the theoretical limit. It would be very helpful if Anthropic documented the optimal operating ranges.

Error Message

In extensive multi-session testing with Opus 4.6 (1M context) on a complex project (~55 wiki pages, multi-agent coordination):

Root Cause

Long sessions are common for complex tasks (architecture review, multi-file refactoring, wiki maintenance). Users invest significant time building context, only to have quality silently degrade. Knowing the effective range in advance would let users plan sessions around it rather than discover it through failure.

RAW_BUFFERClick to expand / collapse

Summary

With the move to 1M token context windows, users observe significant quality degradation well before the theoretical limit. It would be very helpful if Anthropic documented the optimal operating ranges.

Observed behavior

In extensive multi-session testing with Opus 4.6 (1M context) on a complex project (~55 wiki pages, multi-agent coordination):

  • 200K-400K tokens: Performance is strong, reliable recall, accurate task tracking
  • ~500K tokens: Noticeable degradation — agent forgets items from only a few interactions ago
  • ~50% context (500K): In one session, the agent forgot the session's primary objective entirely. The /loose-ends scan (which shares the same degraded context) also missed it.

This suggests training is optimized for ~100K tokens (the middle of the previous 200K window), and the 1M extension stretches beyond the trained sweet spot.

Request

  1. Document the effective operating ranges — at what context percentage should users expect quality degradation? Is there a recommended "plan to finish by X% of context" guideline?
  2. Consider documenting this in Claude Code's help/docs — users planning long sessions need to know that 1M tokens ≠ 1M tokens of effective work.

Why this matters

Long sessions are common for complex tasks (architecture review, multi-file refactoring, wiki maintenance). Users invest significant time building context, only to have quality silently degrade. Knowing the effective range in advance would let users plan sessions around it rather than discover it through failure.

extent analysis

TL;DR

Documenting the effective operating ranges for the 1M token context window can help users plan sessions and avoid quality degradation.

Guidance

  • Identify the optimal operating range for the 1M token context window through testing and analysis.
  • Consider establishing a guideline for users to plan their sessions, such as "plan to finish by X% of context".
  • Document the effective operating ranges in a readily accessible location, such as Claude Code's help/docs.
  • Users can mitigate quality degradation by breaking up long sessions into smaller, more manageable chunks.
  • Further testing is needed to determine the exact point at which quality degradation occurs, but 500K tokens appears to be a threshold.

Notes

The current training data appears to be optimized for ~100K tokens, and the 1M extension may be stretching beyond the trained sweet spot. More research is needed to confirm this hypothesis.

Recommendation

Apply workaround: until the effective operating ranges are documented, users can apply a workaround by planning their sessions to finish before reaching 50% of the context window (around 500K tokens). This can help mitigate quality degradation and ensure reliable performance.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING