openclaw - 💡(How to fix) Fix Gateway retains raw chat buffers and stalls on stale session index growth [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80282Fetched 2026-05-11 03:16:54
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
2
Timeline (top)
closed ×1commented ×1mentioned ×1subscribed ×1

The gateway can become progressively memory/CPU heavy and eventually unresponsive when live chat runs leave behind orphaned streaming buffers and the session index grows with old “active” sessions.

I investigated this on a macOS OpenClaw install after repeated gateway heap pressure/restarts. This was not fixed by increasing Node heap. Evidence points to retained gateway JS heap and expensive session-list scans.

Root Cause

The failure mode presents as heap pressure/crashes, but raising --max-old-space-size only delays it. The gateway can retain orphaned raw assistant output and then spend large amounts of CPU scanning stale session state, making the dashboard/status RPCs slow or unreachable.

Fix Action

Fix / Workaround

I investigated this on a macOS OpenClaw install after repeated gateway heap pressure/restarts. This was not fixed by increasing Node heap. Evidence points to retained gateway JS heap and expensive session-list scans.

Local hotfix applied

I patched the installed bundle to thread chatRunState.rawBuffers into the same cleanup paths and delete raw buffers wherever stale/aborted run buffers are deleted.

Code Example

const chatRunBuffers = chatRunState.buffers;
+const chatRunRawBuffers = chatRunState.rawBuffers;
 const chatDeltaSentAt = chatRunState.deltaSentAt;

---

chatRunBuffers: params.chatRunBuffers,
+chatRunRawBuffers: params.chatRunRawBuffers,
 chatDeltaSentAt: params.chatDeltaSentAt,

---

params.chatRunBuffers.delete(runId);
+params.chatRunRawBuffers?.delete(runId);
 params.chatDeltaSentAt.delete(runId);
 params.chatDeltaLastBroadcastLen.delete(runId);

---

ops.chatRunBuffers.delete(runId);
+ops.chatRunRawBuffers?.delete(runId);
 ops.chatDeltaSentAt.delete(runId);
 ops.chatDeltaLastBroadcastLen.delete(runId);

---

node --check /opt/homebrew/lib/node_modules/openclaw/dist/server.impl-B11albXx.js
node --check /opt/homebrew/lib/node_modules/openclaw/dist/chat-DNr22c3k.js
node --check /opt/homebrew/lib/node_modules/openclaw/dist/server-methods-BDdu1Gof.js
node --check /opt/homebrew/lib/node_modules/openclaw/dist/chat-abort-DqrwtdgZ.js
RAW_BUFFERClick to expand / collapse

Summary

The gateway can become progressively memory/CPU heavy and eventually unresponsive when live chat runs leave behind orphaned streaming buffers and the session index grows with old “active” sessions.

I investigated this on a macOS OpenClaw install after repeated gateway heap pressure/restarts. This was not fixed by increasing Node heap. Evidence points to retained gateway JS heap and expensive session-list scans.

Environment

  • OpenClaw app: 2026.5.2
  • Node: 25.8.1
  • OS: macos 26.4.1 (arm64)
  • Gateway command: /opt/homebrew/opt/node/bin/node /opt/homebrew/lib/node_modules/openclaw/dist/index.js gateway --port 18789
  • Dashboard: http://127.0.0.1:18789/

Observed symptoms

  • Gateway RSS/heap grew into multi-GB territory before restart:
    • RSS roughly 3.3-3.5GB
    • V8 heap used roughly 2.5-2.8GB
    • externalBytes / arrayBuffersBytes were small, so this looked like retained JS objects, not native buffers.
  • Fresh gateway later reached ~800-860MB RSS within ~30-40 minutes.
  • Gateway became unresponsive/slow before cleanup:
    • sessions.list calls repeatedly took 31-59s
    • gateway CPU pegged at 100%
    • openclaw status sometimes reported gateway unreachable (timeout) even though LaunchAgent was running.
  • Session store had hundreds of old entries shown as active/recent:
    • status before cleanup: Sessions 336 active · 3 stores
    • main store: 334 entries, including old stale running sessions.
  • Restart drain logs showed stuck active embedded run/task state:
    • still draining 2 active task(s) and 1 active embedded run(s) before restart
    • wait for active embedded runs timed out: activeRuns=1 timeoutMs=300000

Root cause found locally

There appears to be a cleanup mismatch in gateway chat run state:

  • createChatRunState() has both:
    • rawBuffers
    • buffers
  • server-chat appends full unprojected assistant stream text into chatRunState.rawBuffers.
  • Maintenance/abort cleanup paths were pruning chatRunState.buffers via chatRunBuffers, but stale-run cleanup did not also prune chatRunState.rawBuffers.
  • This means orphaned/incomplete runs can retain the larger raw assistant stream indefinitely, while the visible/projected buffer is removed.

Relevant installed bundle paths/functions from 2026.5.2:

  • dist/server-chat-state-xIpd9Yv0.js
    • createChatRunState() defines rawBuffers, buffers, deltaSentAt, deltaLastBroadcastLen, abortedRuns.
  • dist/server-chat-CeSx1Sbx.js
    • emitChatDelta() writes to chatRunState.rawBuffers and chatRunState.buffers.
    • normal finalization clears both, but stale maintenance paths only had access to chatRunBuffers/buffers.
  • dist/server.impl-B11albXx.js
    • startGatewayMaintenanceTimers() prunes stale/aborted entries from params.chatRunBuffers, params.chatDeltaSentAt, params.chatDeltaLastBroadcastLen, but not rawBuffers.
  • dist/chat-abort-DqrwtdgZ.js
    • abortChatRunById() clears chatRunBuffers, chatDeltaSentAt, chatDeltaLastBroadcastLen, but not rawBuffers.

Local hotfix applied

I patched the installed bundle to thread chatRunState.rawBuffers into the same cleanup paths and delete raw buffers wherever stale/aborted run buffers are deleted.

Conceptual patch:

 const chatRunBuffers = chatRunState.buffers;
+const chatRunRawBuffers = chatRunState.rawBuffers;
 const chatDeltaSentAt = chatRunState.deltaSentAt;

Pass it through runtime context/ops:

 chatRunBuffers: params.chatRunBuffers,
+chatRunRawBuffers: params.chatRunRawBuffers,
 chatDeltaSentAt: params.chatDeltaSentAt,

Cleanup stale aborted/stale runs:

 params.chatRunBuffers.delete(runId);
+params.chatRunRawBuffers?.delete(runId);
 params.chatDeltaSentAt.delete(runId);
 params.chatDeltaLastBroadcastLen.delete(runId);

Abort path:

 ops.chatRunBuffers.delete(runId);
+ops.chatRunRawBuffers?.delete(runId);
 ops.chatDeltaSentAt.delete(runId);
 ops.chatDeltaLastBroadcastLen.delete(runId);

Files locally patched:

  • /opt/homebrew/lib/node_modules/openclaw/dist/server.impl-B11albXx.js
  • /opt/homebrew/lib/node_modules/openclaw/dist/chat-abort-DqrwtdgZ.js
  • /opt/homebrew/lib/node_modules/openclaw/dist/chat-DNr22c3k.js
  • /opt/homebrew/lib/node_modules/openclaw/dist/server-methods-BDdu1Gof.js

Validation of hotfix syntax:

node --check /opt/homebrew/lib/node_modules/openclaw/dist/server.impl-B11albXx.js
node --check /opt/homebrew/lib/node_modules/openclaw/dist/chat-DNr22c3k.js
node --check /opt/homebrew/lib/node_modules/openclaw/dist/server-methods-BDdu1Gof.js
node --check /opt/homebrew/lib/node_modules/openclaw/dist/chat-abort-DqrwtdgZ.js

All passed.

Session index cleanup impact

Separately, pruning old session index entries fixed the immediate CPU/unreachable behavior:

Before cleanup:

  • Sessions 336 active · 3 stores
  • sessions.list timings: 31822ms, 32775ms, 59268ms, 50570ms, etc.
  • Gateway CPU: 100%
  • Gateway occasionally unreachable (timeout)

After pruning stale session index entries and closing stale running entries:

  • Sessions 32 active · 3 stores
  • openclaw status completed in 2.59s
  • Gateway reachable in 40-50ms
  • Gateway CPU down to ~0.2-2%
  • RSS after controlled restart: ~685MB, later ~785MB; heap tail around 398MB used / 479MB total.

Important: transcript files were preserved; only active session store entries were pruned.

Suggested upstream fix

  1. Thread rawBuffers into any maintenance/abort cleanup path that deletes buffers.
  2. Consider adding an invariant/test for createChatRunState() cleanup: any stale/aborted run cleanup must clear all run-keyed maps (rawBuffers, buffers, deltaSentAt, deltaLastBroadcastLen, and related registry/abort state as appropriate).
  3. Consider making session-store maintenance stricter by default or avoiding expensive transcript/child-session work in sessions.list for large stale stores. The session-list path was a major CPU amplifier once the active session index had hundreds of old entries.

Why this matters

The failure mode presents as heap pressure/crashes, but raising --max-old-space-size only delays it. The gateway can retain orphaned raw assistant output and then spend large amounts of CPU scanning stale session state, making the dashboard/status RPCs slow or unreachable.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Gateway retains raw chat buffers and stalls on stale session index growth [1 comments, 2 participants]