openclaw - 💡(How to fix) Fix Compaction emits empty fallback summary; tokensBefore counts cacheRead, triggering premature compactions on Opus 1M [1 participants]

openclaw2026-04-27 17:52:33

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72964•Fetched 2026-04-28 06:29:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

mrossit

Participants

mrossit

Two related bugs observed together in production:

tokensBefore overcounts. The compaction trigger uses total = input + output + cacheRead + cacheWrite from the model usage block as the "context size" signal. With Anthropic prompt caching, cacheRead reflects billing (cached prompt content reused on each call), not new content occupying the context window. On a session running on Opus 4.7 (1M context), this caused tokensBefore=844282 reported by the compaction event, when the actual prompt window usage was ~206k input + cache.
Empty fallback summary on real conversations. When compactionSafeguardExtension runs isRealConversationMessage over preparation.messagesToSummarize and returns false for all entries, it writes a fallback summary "Goal: (none — conversation is empty)" regardless of tokensBefore. This fired even when tokensBefore was 209k and 844k. The user's last message (a research request) was discarded.
Memory flush executes AFTER the compaction it should precede. The Pre-compaction memory flush prompt arrived ~3 minutes after each compaction event. The agent saw an already-compacted (empty) conversation and responded with "no facts to store" — fulfilling the prompt while losing the user's request.

Root Cause

Suspected root causes

Fix Action

Fix / Workaround

Workarounds applied locally

Trim bootstrap files (AGENTS.md, HEARTBEAT.md, MEMORY.md) per workspace.
Patch agent prompts to never respond NO_REPLY to memory flush triggers.
Considering disabling prompt caching for affected workspaces to avoid the cacheRead inflation, at the cost of higher token spend.

Code Example

ts=11:41:56 context.compiled msgCount=0 sysLen=75155 tools=28(48534)
ts=11:43:27 model.completed usage={input:1146 output:3047 cacheRead:162651 cacheWrite:42216 total:209060}
ts=11:43:27 compaction tokensBefore=209060 summary="(none — conversation is empty)"
... (real conversation continues for hours, with cacheRead steady ~11k per call)
ts=16:21:34 model.completed usage={input:3 output:301 cacheRead:11639 cacheWrite:36978 total:48921}
ts=16:23:16 model.completed usage={input:3 output:1627 cacheRead:11639 cacheWrite:38225 total:51494}
ts=16:28:22 model.completed usage={input:1127 output:5920 cacheRead:782227 cacheWrite:55008 total:844282}
ts=16:27:37 compaction tokensBefore=844282 summary="(none — conversation is empty)"
ts=16:30:36 user message: "Pre-compaction memory flush. Store durable memories ..."
ts=16:30:50 assistant response: writes "Sessão sem fatos novos" to memory

---

effectiveContextTokens = input + cacheRead + cacheWrite
// where cacheRead represents the cached portion of THIS call's prompt
// (not accumulated history)

---

const hasRealSummarizable = preparation.messagesToSummarize.some((m, i, ms) =>
    isRealConversationMessage(m, ms, i));
const hasRealTurnPrefix = preparation.turnPrefixMessages.some((m, i, ms) =>
    isRealConversationMessage(m, ms, i));
if (!hasRealSummarizable && !hasRealTurnPrefix) {
    return { compaction: {
        summary: buildStructuredFallbackSummary(preparation.previousSummary),
        firstKeptEntryId: preparation.firstKeptEntryId,
        tokensBefore: preparation.tokensBefore
    } };
}

RAW_BUFFERClick to expand / collapse

Compaction generates empty fallback summary; tokensBefore counter sums cache_read tokens, triggering compaction prematurely on Opus 1M

Version: 2026.4.24 (cbcfdf6) Severity: High — silently destroys conversation context, breaking the "compaction memory first" guarantee.

Summary

Two related bugs observed together in production:

tokensBefore overcounts. The compaction trigger uses total = input + output + cacheRead + cacheWrite from the model usage block as the "context size" signal. With Anthropic prompt caching, cacheRead reflects billing (cached prompt content reused on each call), not new content occupying the context window. On a session running on Opus 4.7 (1M context), this caused tokensBefore=844282 reported by the compaction event, when the actual prompt window usage was ~206k input + cache.
Empty fallback summary on real conversations. When compactionSafeguardExtension runs isRealConversationMessage over preparation.messagesToSummarize and returns false for all entries, it writes a fallback summary "Goal: (none — conversation is empty)" regardless of tokensBefore. This fired even when tokensBefore was 209k and 844k. The user's last message (a research request) was discarded.
Memory flush executes AFTER the compaction it should precede. The Pre-compaction memory flush prompt arrived ~3 minutes after each compaction event. The agent saw an already-compacted (empty) conversation and responded with "no facts to store" — fulfilling the prompt while losing the user's request.

Reproduction (observed in production)

Workspace: a research agent ("Digger") with bootstrap files totaling ~30k tokens (system prompt 75k chars / ~19k tokens + 28 tools / 48k chars / ~12k tokens). Provider: Anthropic Opus 4.7 (1M context window) behind a self-hosted reverse proxy.

Trace from trajectory.jsonl:

ts=11:41:56 context.compiled msgCount=0 sysLen=75155 tools=28(48534)
ts=11:43:27 model.completed usage={input:1146 output:3047 cacheRead:162651 cacheWrite:42216 total:209060}
ts=11:43:27 compaction tokensBefore=209060 summary="(none — conversation is empty)"
... (real conversation continues for hours, with cacheRead steady ~11k per call)
ts=16:21:34 model.completed usage={input:3 output:301 cacheRead:11639 cacheWrite:36978 total:48921}
ts=16:23:16 model.completed usage={input:3 output:1627 cacheRead:11639 cacheWrite:38225 total:51494}
ts=16:28:22 model.completed usage={input:1127 output:5920 cacheRead:782227 cacheWrite:55008 total:844282}
ts=16:27:37 compaction tokensBefore=844282 summary="(none — conversation is empty)"
ts=16:30:36 user message: "Pre-compaction memory flush. Store durable memories ..."
ts=16:30:50 assistant response: writes "Sessão sem fatos novos" to memory

The jump from cacheRead=11639 (steady-state) to cacheRead=782227 in a single call is the proximate trigger. We have not yet identified whether this is the proxy double-billing, the SDK reporting a synthetic total, or the gateway summing across multiple internal sub-calls. In any case, the compaction trigger should not depend on cacheRead — that field reflects billing for cached prompt content, not active context window occupancy.

Suspected root causes

Bug 1 — `tokensBefore` formula

In dist/agent-runner.runtime-CbAg9IpO.js and the contextEngine compaction trigger, the token count used to decide whether to compact appears to come from usage.total, which aggregates all token classes from the provider response. For Anthropic, cacheRead is the cached prompt content reused without re-uploading — it should not count as new context. Recommended:

effectiveContextTokens = input + cacheRead + cacheWrite
// where cacheRead represents the cached portion of THIS call's prompt
// (not accumulated history)

Or, when prompt caching is active, derive context size from the messages + systemPrompt + tools envelope size before submission rather than from the post-call usage block.

Bug 2 — empty fallback summary safeguard

In dist/wait-for-idle-before-flush-CkZJsBmY.js:4596-4606, compactionSafeguardExtension returns buildStructuredFallbackSummary when no entry passes isRealConversationMessage:

const hasRealSummarizable = preparation.messagesToSummarize.some((m, i, ms) =>
    isRealConversationMessage(m, ms, i));
const hasRealTurnPrefix = preparation.turnPrefixMessages.some((m, i, ms) =>
    isRealConversationMessage(m, ms, i));
if (!hasRealSummarizable && !hasRealTurnPrefix) {
    return { compaction: {
        summary: buildStructuredFallbackSummary(preparation.previousSummary),
        firstKeptEntryId: preparation.firstKeptEntryId,
        tokensBefore: preparation.tokensBefore
    } };
}

hasMeaningfulText (line 3706) also rejects:

isSilentReplyText (NO_REPLY/ACK and likely MEMORY_FLUSHED)
stripHeartbeatToken zero-length results

Suggested: when tokensBefore > 50_000, the safeguard should fall through to the LLM summarizer rather than emit the fallback boundary, even if no single entry passes the strict "real conversation" filter. The boundary is meant for empty heartbeat sessions, not for sessions with real bulk that happens to be metadata-wrapped.

Bug 3 — memoryFlush ↔ compaction ordering

agent-runner.runtime predicate threshold = contextWindow - reserveTokensFloor - softThresholdTokens uses softThresholdTokens as a margin, not a trigger value. When the contextEngine owns compaction, it can fire before the agent-runner loop re-evaluates the flush predicate, so the Pre-compaction memory flush prompt arrives after the conversation has already been replaced by the fallback summary.

Recommended: when memoryFlush.enabled === true, the compaction safeguard should refuse to commit a compaction (especially the empty fallback) unless entry.memoryFlushCompactionCount === entry.compactionCount — i.e. a flush has been recorded for this cycle.

Suggested vendor fixes

Stop including cacheRead in the token count used to decide compaction triggers; use envelope size or input + new only.
Gate the buildStructuredFallbackSummary path on tokensBefore < 50k (or previousSummary being missing); otherwise fall through to the real summarizer.
Coordinate flush ↔ compact ordering: do not commit a compaction boundary while a memory flush is pending for the current cycle.
Document softThresholdTokens as a margin, not a trigger value; surface the derived effective threshold in openclaw doctor.

Evidence

Session JSONL + trajectory available on request (~4 MB). Metadata table:

Compaction UTC	tokensBefore	input	output	cacheRead	cacheWrite
11:43:27	209.060	1.146	3.047	162.651	42.216
16:27:37	844.282	1.127	5.920	782.227	55.008

Both compactions emitted summary="## Goal\n(none — conversation is empty)".

Workarounds applied locally

Trim bootstrap files (AGENTS.md, HEARTBEAT.md, MEMORY.md) per workspace.
Patch agent prompts to never respond NO_REPLY to memory flush triggers.
Considering disabling prompt caching for affected workspaces to avoid the cacheRead inflation, at the cost of higher token spend.

extent analysis

TL;DR

The most likely fix involves modifying the compaction trigger to exclude cacheRead from the token count and adjusting the fallback summary safeguard to handle large conversations.

Guidance

Review the tokensBefore formula to ensure it accurately reflects the context size, considering the impact of cacheRead on the calculation.
Adjust the compactionSafeguardExtension to fall through to the LLM summarizer when tokensBefore > 50_000, even if no single entry passes the "real conversation" filter.
Coordinate the flush and compaction ordering to prevent the Pre-compaction memory flush prompt from arriving after the conversation has been replaced by the fallback summary.
Consider disabling prompt caching for affected workspaces to avoid cacheRead inflation, although this may increase token spend.

Example

// Modified tokensBefore calculation
effectiveContextTokens = input + cacheWrite
// where cacheWrite represents the new context added in this call

Notes

The provided solution focuses on the most critical aspects of the issue, but a comprehensive fix may require additional adjustments based on the specific implementation and requirements of the system.

Recommendation

Apply the suggested workaround of modifying the tokensBefore formula and adjusting the fallback summary safeguard, as these changes address the primary causes of the issue and can help prevent premature compaction and empty fallback summaries.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#network issue #logging issue #authentication issue #prompt issue #agent setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Compaction emits empty fallback summary; tokensBefore counts cacheRead, triggering premature compactions on Opus 1M [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Suspected root causes

Fix Action

Fix / Workaround

Workarounds applied locally

Code Example

Compaction generates empty fallback summary; tokensBefore counter sums cache_read tokens, triggering compaction prematurely on Opus 1M

Summary

Reproduction (observed in production)

Suspected root causes

Bug 1 — `tokensBefore` formula

Bug 2 — empty fallback summary safeguard

Bug 3 — memoryFlush ↔ compaction ordering

Suggested vendor fixes

Evidence

Workarounds applied locally

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Compaction emits empty fallback summary; tokensBefore counts cacheRead, triggering premature compactions on Opus 1M [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Suspected root causes

Fix Action

Fix / Workaround

Workarounds applied locally

Code Example

Compaction generates empty fallback summary; tokensBefore counter sums cache_read tokens, triggering compaction prematurely on Opus 1M

Summary

Reproduction (observed in production)

Suspected root causes

Bug 1 — tokensBefore formula

Bug 2 — empty fallback summary safeguard

Bug 3 — memoryFlush ↔ compaction ordering

Suggested vendor fixes

Evidence

Workarounds applied locally

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Bug 1 — `tokensBefore` formula