openclaw - 💡(How to fix) Fix [Bug]: 2026.4.23 tool-heavy main sessions can jump from low context to near-full within 1-2 replay turns [3 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71615Fetched 2026-04-26 05:10:37
View on GitHub
Comments
3
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
commented ×3cross-referenced ×2

After updating to OpenClaw 2026.4.23, a human-facing main session can jump from low context usage to near-full context within one or two diagnostic turns.

The strongest evidence points to heavier persistence + replay of large tool results in the session transcript. Per-result caps help, but aggregate replay pressure across multiple tool results still remains large enough to need a second truncation pass.

This does not look like a pure local-config problem.

Error Message

  1. Make post-error / post-fallback / post-stuck-session replay behavior more conservative.

Root Cause

That appears related to the existing Copilot issue family rather than this being the sole root cause, but it likely makes recovery / replay / fallback behavior worse.

Fix Action

Fix / Workaround

To verify this was really transcript persistence / replay related, I applied a local mitigation to the main agent:

So the current mitigation works as first aid, but not as a complete fix.

  • Before mitigation, a tool-heavy turn in the live main session could create immediate six-figure prompt input on the next turn.
  • After mitigation, replay behavior improved substantially, but aggregate tool-result pressure still remained reducible by a second truncation pass.

Code Example

"contextLimits": {
  "toolResultMaxChars": 4000
}

---

400 The encrypted content for item rs_… could not be verified.
Reason: Encrypted content item_id did not match the target item id.
RAW_BUFFERClick to expand / collapse

Bug type

Performance / reliability regression

Summary

After updating to OpenClaw 2026.4.23, a human-facing main session can jump from low context usage to near-full context within one or two diagnostic turns.

The strongest evidence points to heavier persistence + replay of large tool results in the session transcript. Per-result caps help, but aggregate replay pressure across multiple tool results still remains large enough to need a second truncation pass.

This does not look like a pure local-config problem.

Environment

  • OpenClaw: 2026.4.23
  • OS: macOS 26.4.1 arm64
  • Runtime: Node v25.9.0
  • Affected agent/session type: human-facing main session (agent:main:main)
  • Main model path during investigation: github-copilot/gpt-5.4
  • Fallbacks observed in practice: github-copilot/gpt-5-mini, google/gemini-3.1-pro-preview

User-visible impact

  • A main session can go from nearly empty / freshly compacted to near context limit after 1-2 troubleshooting turns.
  • Diagnostic-heavy work becomes unsafe in the main session.
  • To the operator, this looks like "I only asked a couple things and CTX exploded again."

Why this looks like a regression

  • Config metadata for the environment did not materially change on the same day.
  • Similar workflows did not previously explode this aggressively.
  • The largest growth is concentrated in tool-heavy turns, not ordinary user text.
  • The same environment also showed provider instability (github-copilot/gpt-5.4 encrypted-content mismatch), which may worsen replay / compaction interactions but does not fully explain the transcript growth itself.

Strong evidence from the live main session

Affected live session:

  • session key: agent:main:main
  • session id: eda7ea33-4058-4294-ac56-690174c45e73

Observed abnormal prompt sizes after compaction:

  • input = 136955
  • input = 48552

Largest persisted tool-result payloads observed in that session:

  • exec: 314047 bytes
  • process: 259679 bytes
  • process: 203647 bytes
  • additional large memory_search / exec payloads were also present

This is far beyond normal user-message growth.

Local stopgap and isolated validation

To verify this was really transcript persistence / replay related, I applied a local mitigation to the main agent:

"contextLimits": {
  "toolResultMaxChars": 4000
}

Then I ran an isolated validation session for the same main agent without touching the live agent:main:main conversation.

Validation session:

  • session key: agent:main:explicit:codex-ctx-verify-readonly-20260425-220906
  • session id: 8bfc8aa7-cecc-4f01-9259-08cdfa4b98c6

Reproduction shape:

  • first turn: memory_search once + read four times against large logs / report files
  • second turn: one short follow-up asking for a one-sentence conclusion, explicitly no tool re-run

Observed results with the 4k cap active:

First tool-heavy turn:

  • input = 27993
  • output = 1913
  • cacheRead = 56956

Second short follow-up:

  • input = 10766
  • output = 705
  • cacheRead = 28663

Persisted tool-result sizes in that isolated test transcript:

  • 3996 chars
  • 3251 chars
  • 3803 chars
  • 3977 chars
  • 3979 chars

Interpretation:

  • the per-result cap materially reduced replay growth
  • the immediate six-figure replay explosion no longer happened on the next turn
  • but aggregate replay pressure still remained

Specifically, running OpenClaw's built-in transcript truncation helper again on that isolated validation session still returned:

  • truncated = true
  • truncatedCount = 5

So the current mitigation works as first aid, but not as a complete fix.

Expected behavior

  • Large tool results should not be replayed back into the main conversation in a way that rapidly drives context usage toward the limit.
  • A short follow-up after one tool-heavy turn should remain in a normal prompt-size band.
  • Per-result caps should likely be paired with stronger aggregate replay budgeting.

Actual behavior

  • Before mitigation, a tool-heavy turn in the live main session could create immediate six-figure prompt input on the next turn.
  • After mitigation, replay behavior improved substantially, but aggregate tool-result pressure still remained reducible by a second truncation pass.

Related / compounding symptom

This environment also hit GitHub Copilot Responses failures like:

400 The encrypted content for item rs_… could not be verified.
Reason: Encrypted content item_id did not match the target item id.

That appears related to the existing Copilot issue family rather than this being the sole root cause, but it likely makes recovery / replay / fallback behavior worse.

Related issue:

  • #71333

Related issues

The symptom here seems adjacent to, but not identical with:

  • #49888
  • #64151
  • #41312
  • #69829
  • #71333

The difference is that this report is specifically about the 2026.4.23 replay behavior in a human-facing main session, including the fact that:

  • per-result capping helps substantially
  • but aggregate replay pressure still remains after the cap

Suggested fix directions

  1. Enforce stricter aggregate transcript budgets for tool results, not only per-result caps.
  2. Prefer summary persistence / externalization for large tool outputs by default on human-facing main sessions.
  3. Make post-error / post-fallback / post-stuck-session replay behavior more conservative.
  4. Consider stronger defaults for diagnostic-heavy tools such as exec, process, memory_search, and large read.

Additional note

I also found separate community reports around 4.22/4.23 describing Discord instability / reconnect loops / repeated bundled plugin runtime dependency repair after update. I am not claiming that is the same single bug, but it does suggest this release window may have a broader reliability regression family rather than one isolated local setup problem.

If maintainers would prefer this merged into an existing issue instead of standing alone, that is also fine — but I wanted the concrete 2026.4.23 replay evidence and isolated validation numbers captured in one place.

extent analysis

TL;DR

Enforcing stricter aggregate transcript budgets for tool results and implementing summary persistence for large tool outputs may help mitigate the replay pressure issue in human-facing main sessions.

Guidance

  • Review and adjust the contextLimits configuration, specifically the toolResultMaxChars setting, to ensure it is adequately limiting the size of tool results.
  • Consider implementing a second truncation pass to further reduce aggregate replay pressure.
  • Investigate the use of summary persistence or externalization for large tool outputs to prevent them from being replayed into the main conversation.
  • Monitor the behavior of diagnostic-heavy tools such as exec, process, and memory_search to determine if stricter defaults are needed.

Example

"contextLimits": {
  "toolResultMaxChars": 4000,
  "aggregateToolResultMaxChars": 10000
}

This example shows a potential configuration change to enforce both per-result and aggregate limits on tool result sizes.

Notes

The provided mitigation using toolResultMaxChars shows promise, but the issue is not fully resolved. Further investigation into the aggregate replay pressure and potential adjustments to the configuration or implementation of summary persistence may be necessary.

Recommendation

Apply the suggested fix directions, specifically enforcing stricter aggregate transcript budgets and implementing summary persistence for large tool outputs, to mitigate the replay pressure issue. This approach addresses the root cause of the problem and provides a more comprehensive solution than simply adjusting the toolResultMaxChars setting.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • Large tool results should not be replayed back into the main conversation in a way that rapidly drives context usage toward the limit.
  • A short follow-up after one tool-heavy turn should remain in a normal prompt-size band.
  • Per-result caps should likely be paired with stronger aggregate replay budgeting.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: 2026.4.23 tool-heavy main sessions can jump from low context to near-full within 1-2 replay turns [3 comments, 1 participants]