openclaw - 💡(How to fix) Fix compaction: reserveTokens default (15000) leaves insufficient headroom for structured summarization at typical session sizes [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#77780Fetched 2026-05-06 06:21:35
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
2
Timeline (top)
mentioned ×3subscribed ×3commented ×1

The default reserveTokens: 15000 threshold causes compaction to fire when only ~15k tokens remain in the context window. At this point, the summarization model itself is severely context-constrained and cannot produce a structured, task-preserving summary. Instead it falls back to a verbatim transcript tail — silently dropping all goal state, in-progress task queues, and approved work.

Error Message

Manual /compact triggered at 182,615 tokens with reserveTokens: 15000 (200k context window):

  • Summarization model had ~15k tokens to produce a summary of 182k tokens of prior context
  • Result: verbatim transcript replay stored as summary field — not an LLM-generated summary
  • All structured state (goals, in-progress tasks, approved task queues) silently dropped
  • Contrast: safeguard compaction that fired earlier at 123,845 tokens produced a fully structured summary (## Goal, ## Progress, ## Done, ## Pending)

Root Cause

Compaction summarization is itself a model call. At 182k/200k fill, only 15k tokens remain for the summarization model to:

  1. Read the prior conversation context (injected as input)
  2. Generate a structured summary (output)

15k tokens is insufficient for both at any realistic session length. The summarizer is token-starved and cannot produce structured output. It effectively echoes back the tail of the transcript instead.

This means: the sessions most in need of a high-quality structured summary are exactly the sessions that receive the worst summary quality, because they only trigger compaction when they're already critically full.

RAW_BUFFERClick to expand / collapse

Summary

The default reserveTokens: 15000 threshold causes compaction to fire when only ~15k tokens remain in the context window. At this point, the summarization model itself is severely context-constrained and cannot produce a structured, task-preserving summary. Instead it falls back to a verbatim transcript tail — silently dropping all goal state, in-progress task queues, and approved work.

Observed behavior

Manual /compact triggered at 182,615 tokens with reserveTokens: 15000 (200k context window):

  • Summarization model had ~15k tokens to produce a summary of 182k tokens of prior context
  • Result: verbatim transcript replay stored as summary field — not an LLM-generated summary
  • All structured state (goals, in-progress tasks, approved task queues) silently dropped
  • Contrast: safeguard compaction that fired earlier at 123,845 tokens produced a fully structured summary (## Goal, ## Progress, ## Done, ## Pending)

The difference is not a code path difference between manual and safeguard compaction — it is purely a function of how much headroom the summarization model has when it runs.

Root cause

Compaction summarization is itself a model call. At 182k/200k fill, only 15k tokens remain for the summarization model to:

  1. Read the prior conversation context (injected as input)
  2. Generate a structured summary (output)

15k tokens is insufficient for both at any realistic session length. The summarizer is token-starved and cannot produce structured output. It effectively echoes back the tail of the transcript instead.

This means: the sessions most in need of a high-quality structured summary are exactly the sessions that receive the worst summary quality, because they only trigger compaction when they're already critically full.

User impact

  • Every session that hits the reserveTokens: 15000 threshold (fires at ~92% fill on a 200k window) has been silently degrading compaction quality
  • Approved task queues, in-flight investigations, and structured goal state are lost at exactly the moment context pressure is highest
  • Users experience this as "the agent forgot" — with no visible signal that compaction quality was degraded
  • This is not a new edge case: it is the default behavior for any user running a long or investigation-heavy session

Proposed fix

Raise the default reserveTokens threshold substantially. Based on testing:

  • reserveTokens: 30000 (fires at ~85% fill / ~170k tokens on 200k window) — sufficient headroom for structured summarization at typical session lengths
  • reserveTokens: 40000 (fires at ~80% fill / ~160k tokens) — more headroom, slightly more frequent compactions on heavy days

The structured safeguard-triggered compaction at 123k tokens confirms that ~77k tokens of remaining headroom produces a correct structured summary. The right default is somewhere between 15k (clearly insufficient) and 77k (clearly sufficient).

A minimum viable fix: reserveTokens: 25000–30000. This alone would have prevented the quality degradation observed.

Additional recommendation

Consider adding a warning or metric when the summarization result appears to be a verbatim transcript tail rather than a structured summary — e.g. if the summary contains raw tool results, JSON blobs, or turn-format markers (- User:, - Assistant:, - Tool result:). This would make the failure mode visible rather than silent.

Environment

  • OpenClaw 2026.5.3-1 (2eae30e)
  • Context window: 200k (venice/claude-sonnet-4-6)
  • Default reserveTokens: 15000
  • Session fill at manual compaction: 182,615 tokens (~91%)
  • Session fill at safeguard compaction (good quality): 123,845 tokens (~62%)

extent analysis

TL;DR

Raising the default reserveTokens threshold to at least 25000-30000 can prevent compaction quality degradation by providing sufficient headroom for structured summarization.

Guidance

  • Increase the reserveTokens threshold to a value between 25000-30000 to ensure the summarization model has enough tokens to produce a structured summary.
  • Consider implementing a warning or metric to detect when the summarization result is a verbatim transcript tail rather than a structured summary.
  • Test different reserveTokens values to find the optimal threshold for typical session lengths.
  • Review the session fill percentages to determine the best threshold for preventing compaction quality degradation.

Example

No code snippet is provided as the issue is related to configuration and model performance.

Notes

The optimal reserveTokens threshold may vary depending on the specific use case and session lengths. Testing and monitoring are necessary to determine the best value.

Recommendation

Apply a workaround by raising the default reserveTokens threshold to at least 25000-30000 to prevent compaction quality degradation. This change can help ensure that the summarization model has sufficient headroom to produce structured summaries, especially in sessions with high context pressure.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix compaction: reserveTokens default (15000) leaves insufficient headroom for structured summarization at typical session sizes [1 comments, 2 participants]