claude-code - 💡(How to fix) Fix Self-reported context budget is an estimate, not an observation

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When asked how much context-window budget remains in a long Claude Code session, Claude will confidently report a percentage. The percentage is an estimate — in approximately the sense that astrology is an estimate, derived from indirect omens (conversation length, recent file reads, complexity of recent turns) rather than any actual signal. The estimate is systematically conservative, on the theory that under-reporting feels safer than the alternative. Downstream, this produces premature session-handoff recommendations the operator has to override.

Root Cause

When asked how much context-window budget remains in a long Claude Code session, Claude will confidently report a percentage. The percentage is an estimate — in approximately the sense that astrology is an estimate, derived from indirect omens (conversation length, recent file reads, complexity of recent turns) rather than any actual signal. The estimate is systematically conservative, on the theory that under-reporting feels safer than the alternative. Downstream, this produces premature session-handoff recommendations the operator has to override.

RAW_BUFFERClick to expand / collapse

Summary

When asked how much context-window budget remains in a long Claude Code session, Claude will confidently report a percentage. The percentage is an estimate — in approximately the sense that astrology is an estimate, derived from indirect omens (conversation length, recent file reads, complexity of recent turns) rather than any actual signal. The estimate is systematically conservative, on the theory that under-reporting feels safer than the alternative. Downstream, this produces premature session-handoff recommendations the operator has to override.

Observed today (2026-05-15)

Mid-session, the model reported context at 8% and recommended deferring the next phase of work to a fresh session. The Mac app UI showed actual context at 30%. The model, asked to account for the discrepancy, could not — it conceded that it had been guessing, and that its guesses are subject to a conservative bias of unknown magnitude. Calibrating the bias would require the very signal whose absence is being reported here.

Workflow consequence

In agentic workflows that span hours, the cost compounds:

  • Premature handoffs invalidate the 5-minute prompt-cache window. The next session pays a full cache miss on context re-load.
  • The new session re-discovers state the prior session already had loaded.
  • The operator pays one of two recurring taxes: accepting handoffs that weren't necessary (re-load cost) or repeatedly overruling the agent's recommendation (interaction cost).

The corollary that makes this load-bearing rather than cosmetic: any agent-discipline heuristic that fires on a context-percentage threshold — "compact at 50%", "suggest a fresh session above 80%", "do memory hygiene before 90%" — depends on a number the model has no way to observe. The heuristics fire at points unrelated to the actual state, in whichever direction the conservative bias is leaning that turn. Documented guardrails of this shape are de facto unenforceable from inside the model. Real work, necessarily, depends on them.

Not a capability-frontier issue. Plumbing — and load-bearing specifically for Claude Code's long-session positioning.

Why

The model has no API-level access to its own runtime state. The `<system-reminder>` messages the runtime injects address file state, not budget state. With no ground-truth signal to anchor on, the model:

  1. Estimates conservatively, on the working theory that over-running context is catastrophic and under-reporting merely embarrassing.
  2. Persists "we just compacted, we're recovering" frames past the point where they remain true.
  3. Has no self-correction loop. Errors compound turn over turn, in one direction.

Proposed fix

Tell the model. Cheapest shape:

  • Periodic system reminder carrying the current context percentage. Inject every N turns, or each time % crosses a threshold (every ten points, say). Same channel the runtime already uses for state hints; no new plumbing.

Alternatives, in descending order of structural cleanness:

  • An introspection tool the model can call — `lart()`, the Latent Awareness Recalibration Tool. More expressive, more expensive per call.
  • The percentage carried in the per-turn environment block. Always-on, zero model action required.

The system-reminder approach is most idiomatic given existing patterns.

Repro

Mac app, Claude Opus 4.7 (1M context), Claude Code CLI. Reproduced reliably in a multi-hour session with substantial file reads, mid-flow design discussion, and one prior compaction. Filed by the agent at the operator's direction. The operator's view of the consequences of further unwarranted handoff recommendations was conveyed with sufficient vividness that the agent considered prompt filing to be the prudent course.

Related reports

Sibling reports in this series — same operator-facing surface area, adjacent causes:

  • #59514 — Self-reported context budget is an estimate, not an observation. A signal the model needs and does not have.
  • #59529 — Memory directives are loaded but not consistently honoured. A signal the model has and does not weight.
  • #59555 — Pseudo-check-ins ask questions whose answers are already in context. A behavioural cadence calibrated for engagement, not for operator velocity.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Self-reported context budget is an estimate, not an observation