- Large tool results should not be replayed back into the main conversation in a way that rapidly drives context usage toward the limit. - A short follow-up after one tool-heavy turn should remain in a normal prompt-size band. - Per-result caps should likely be paired with stronger aggregate replay budgeting.

openclaw - 💡(How to fix) Fix [Bug]: 2026.4.23 tool-heavy main sessions can jump from low context to near-full within 1-2 replay turns [3 comments, 1 participants]

openclaw2026-04-25 14:16:53

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#71615•Fetched 2026-04-26 05:10:37

View on GitHub

Comments

Participants

Timeline

Reactions

Author

riosbotchen-source

Participants

riosbotchen-source

Timeline (top)

commented ×3cross-referenced ×2

After updating to OpenClaw 2026.4.23, a human-facing main session can jump from low context usage to near-full context within one or two diagnostic turns.

The strongest evidence points to heavier persistence + replay of large tool results in the session transcript. Per-result caps help, but aggregate replay pressure across multiple tool results still remains large enough to need a second truncation pass.

This does not look like a pure local-config problem.

Error Message

Make post-error / post-fallback / post-stuck-session replay behavior more conservative.

Root Cause

That appears related to the existing Copilot issue family rather than this being the sole root cause, but it likely makes recovery / replay / fallback behavior worse.

Fix Action

Fix / Workaround

To verify this was really transcript persistence / replay related, I applied a local mitigation to the main agent:

So the current mitigation works as first aid, but not as a complete fix.

Before mitigation, a tool-heavy turn in the live main session could create immediate six-figure prompt input on the next turn.
After mitigation, replay behavior improved substantially, but aggregate tool-result pressure still remained reducible by a second truncation pass.

Code Example

"contextLimits": {
  "toolResultMaxChars": 4000
}

---

400 The encrypted content for item rs_… could not be verified.
Reason: Encrypted content item_id did not match the target item id.

RAW_BUFFERClick to expand / collapse

Bug type

Performance / reliability regression

Summary

After updating to OpenClaw 2026.4.23, a human-facing main session can jump from low context usage to near-full context within one or two diagnostic turns.

This does not look like a pure local-config problem.

Environment

OpenClaw: 2026.4.23
OS: macOS 26.4.1 arm64
Runtime: Node v25.9.0
Affected agent/session type: human-facing main session (agent:main:main)
Main model path during investigation: github-copilot/gpt-5.4
Fallbacks observed in practice: github-copilot/gpt-5-mini, google/gemini-3.1-pro-preview

User-visible impact

A main session can go from nearly empty / freshly compacted to near context limit after 1-2 troubleshooting turns.
Diagnostic-heavy work becomes unsafe in the main session.
To the operator, this looks like "I only asked a couple things and CTX exploded again."

Why this looks like a regression

Config metadata for the environment did not materially change on the same day.
Similar workflows did not previously explode this aggressively.
The largest growth is concentrated in tool-heavy turns, not ordinary user text.
The same environment also showed provider instability (github-copilot/gpt-5.4 encrypted-content mismatch), which may worsen replay / compaction interactions but does not fully explain the transcript growth itself.

Strong evidence from the live main session

Affected live session:

session key: agent:main:main
session id: eda7ea33-4058-4294-ac56-690174c45e73

Observed abnormal prompt sizes after compaction:

input = 136955
input = 48552

Largest persisted tool-result payloads observed in that session:

exec: 314047 bytes
process: 259679 bytes
process: 203647 bytes
additional large memory_search / exec payloads were also present

This is far beyond normal user-message growth.

Local stopgap and isolated validation

To verify this was really transcript persistence / replay related, I applied a local mitigation to the main agent:

"contextLimits": {
  "toolResultMaxChars": 4000
}

Then I ran an isolated validation session for the same main agent without touching the live agent:main:main conversation.

Validation session:

session key: agent:main:explicit:codex-ctx-verify-readonly-20260425-220906
session id: 8bfc8aa7-cecc-4f01-9259-08cdfa4b98c6

Reproduction shape:

first turn: memory_search once + read four times against large logs / report files
second turn: one short follow-up asking for a one-sentence conclusion, explicitly no tool re-run

Observed results with the 4k cap active:

First tool-heavy turn:

input = 27993
output = 1913
cacheRead = 56956

Second short follow-up:

input = 10766
output = 705
cacheRead = 28663

Persisted tool-result sizes in that isolated test transcript:

3996 chars
3251 chars
3803 chars
3977 chars
3979 chars

Interpretation:

the per-result cap materially reduced replay growth
the immediate six-figure replay explosion no longer happened on the next turn
but aggregate replay pressure still remained

Specifically, running OpenClaw's built-in transcript truncation helper again on that isolated validation session still returned:

truncated = true
truncatedCount = 5

So the current mitigation works as first aid, but not as a complete fix.

Expected behavior

Large tool results should not be replayed back into the main conversation in a way that rapidly drives context usage toward the limit.
A short follow-up after one tool-heavy turn should remain in a normal prompt-size band.
Per-result caps should likely be paired with stronger aggregate replay budgeting.

Actual behavior

Before mitigation, a tool-heavy turn in the live main session could create immediate six-figure prompt input on the next turn.
After mitigation, replay behavior improved substantially, but aggregate tool-result pressure still remained reducible by a second truncation pass.

Related / compounding symptom

This environment also hit GitHub Copilot Responses failures like:

400 The encrypted content for item rs_… could not be verified.
Reason: Encrypted content item_id did not match the target item id.

That appears related to the existing Copilot issue family rather than this being the sole root cause, but it likely makes recovery / replay / fallback behavior worse.

Related issue:

#71333

Related issues

The symptom here seems adjacent to, but not identical with:

#49888
#64151
#41312
#69829
#71333

The difference is that this report is specifically about the 2026.4.23 replay behavior in a human-facing main session, including the fact that:

per-result capping helps substantially
but aggregate replay pressure still remains after the cap

Suggested fix directions

Enforce stricter aggregate transcript budgets for tool results, not only per-result caps.
Prefer summary persistence / externalization for large tool outputs by default on human-facing main sessions.
Make post-error / post-fallback / post-stuck-session replay behavior more conservative.
Consider stronger defaults for diagnostic-heavy tools such as exec, process, memory_search, and large read.

Additional note

I also found separate community reports around 4.22/4.23 describing Discord instability / reconnect loops / repeated bundled plugin runtime dependency repair after update. I am not claiming that is the same single bug, but it does suggest this release window may have a broader reliability regression family rather than one isolated local setup problem.

If maintainers would prefer this merged into an existing issue instead of standing alone, that is also fine — but I wanted the concrete 2026.4.23 replay evidence and isolated validation numbers captured in one place.

extent analysis

TL;DR

Enforcing stricter aggregate transcript budgets for tool results and implementing summary persistence for large tool outputs may help mitigate the replay pressure issue in human-facing main sessions.

Guidance

Review and adjust the contextLimits configuration, specifically the toolResultMaxChars setting, to ensure it is adequately limiting the size of tool results.
Consider implementing a second truncation pass to further reduce aggregate replay pressure.
Investigate the use of summary persistence or externalization for large tool outputs to prevent them from being replayed into the main conversation.
Monitor the behavior of diagnostic-heavy tools such as exec, process, and memory_search to determine if stricter defaults are needed.

Example

"contextLimits": {
  "toolResultMaxChars": 4000,
  "aggregateToolResultMaxChars": 10000
}

This example shows a potential configuration change to enforce both per-result and aggregate limits on tool result sizes.

Notes

The provided mitigation using toolResultMaxChars shows promise, but the issue is not fully resolved. Further investigation into the aggregate replay pressure and potential adjustments to the configuration or implementation of summary persistence may be necessary.

Recommendation

Apply the suggested fix directions, specifically enforcing stricter aggregate transcript budgets and implementing summary persistence for large tool outputs, to mitigate the replay pressure issue. This approach addresses the root cause of the problem and provides a more comprehensive solution than simply adjusting the toolResultMaxChars setting.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Large tool results should not be replayed back into the main conversation in a way that rapidly drives context usage toward the limit.
A short follow-up after one tool-heavy turn should remain in a normal prompt-size band.
Per-result caps should likely be paired with stronger aggregate replay budgeting.

#api #SSR setup #ISR setup #authentication setup #request error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix [Bug]: 2026.4.23 tool-heavy main sessions can jump from low context to near-full within 1-2 replay turns [3 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Summary

Environment

User-visible impact

Why this looks like a regression

Strong evidence from the live main session

Local stopgap and isolated validation

Expected behavior

Actual behavior

Related / compounding symptom

Related issues

Suggested fix directions

Additional note

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING