openclaw - 💡(How to fix) Fix [Bug]: Heap exhaustion after extended uptime — OOM during filesystem scan [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#57349Fetched 2026-04-08 01:50:45
View on GitHub
Comments
1
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
commented ×1labeled ×1subscribed ×1

OpenClaw gateway hits JavaScript heap limit (~4GB) and OOMs after extended uptime (~17-20 hours). The crash occurs during a filesystem directory scan operation (AfterScanDir), suggesting memory accumulation from repeated fs operations or polling. This is distinct from the Discord stale-socket crash loop in #55274 — no Discord reconnection errors precede this crash.

Error Message

GC state before crash:

<--- Last few GCs ---> [16:0x46746000] 1052687 ms: Mark-Compact 4091.0 (4099.6) -> 4090.9 (4101.6) MB, pooled: 0 MB, 892.63 / 0.00 ms (average mu = 0.587, current mu = 0.014) allocation failure; scavenge might not succeed

Stack trace (partial):

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace ----- 1: 0x735eec node::OOMErrorHandler(char const*, v8::OOMDetails const&) ... 10: 0xbcb66d v8::String::NewFromOneByte(v8::Isolate*, unsigned char const*, ...) 11: 0xa5ba40 node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, ...) 12: 0x8bdd19 node::fs::AfterScanDir(uv_fs_s*) 13: 0x89f50d node::MakeLibuvRequestCallback<uv_fs_s, void ()(uv_fs_s)>::Wrapper(uv_fs_s*)

Pre-crash log pattern (hundreds of these):

[ws] res node.list 330ms conn=adb062ec [ws] res node.list 268ms conn=adb062ec [ws] res node.list 280ms conn=adb062ec ... (every ~5 seconds for hours)

Browser tool errors shortly before crash:

[tools] browser failed: tab not found [tools] browser failed: timed out

Post-crash restart and reconnection:

2026-03-29T16:37:19.818-07:00 [heartbeat] started 2026-03-29T16:37:19.837-07:00 [gateway] listening on ws://0.0.0.0:18789 (PID 16) 2026-03-29T16:37:24.521-07:00 [ws] webchat connected conn=e617f6a8...

Root Cause

OpenClaw gateway hits JavaScript heap limit (~4GB) and OOMs after extended uptime (~17-20 hours). The crash occurs during a filesystem directory scan operation (AfterScanDir), suggesting memory accumulation from repeated fs operations or polling. This is distinct from the Discord stale-socket crash loop in #55274 — no Discord reconnection errors precede this crash.

Code Example

GC state before crash:

<--- Last few GCs --->
[16:0x46746000]  1052687 ms: Mark-Compact 4091.0 (4099.6) -> 4090.9 (4101.6) MB, pooled: 0 MB, 892.63 / 0.00 ms  (average mu = 0.587, current mu = 0.014) allocation failure; scavenge might not succeed

Stack trace (partial):

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----
 1: 0x735eec node::OOMErrorHandler(char const*, v8::OOMDetails const&)
...
10: 0xbcb66d v8::String::NewFromOneByte(v8::Isolate*, unsigned char const*, ...)
11: 0xa5ba40 node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, ...)
12: 0x8bdd19 node::fs::AfterScanDir(uv_fs_s*)
13: 0x89f50d node::MakeLibuvRequestCallback<uv_fs_s, void (*)(uv_fs_s*)>::Wrapper(uv_fs_s*)

Pre-crash log pattern (hundreds of these):

[ws] res node.list 330ms conn=adb062ec
[ws] res node.list 268ms conn=adb062ec
[ws] res node.list 280ms conn=adb062ec
... (every ~5 seconds for hours)

Browser tool errors shortly before crash:

[tools] browser failed: tab not found
[tools] browser failed: timed out

Post-crash restart and reconnection:

2026-03-29T16:37:19.818-07:00 [heartbeat] started
2026-03-29T16:37:19.837-07:00 [gateway] listening on ws://0.0.0.0:18789 (PID 16)
2026-03-29T16:37:24.521-07:00 [ws] webchat connected conn=e617f6a8...
RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

OpenClaw gateway hits JavaScript heap limit (~4GB) and OOMs after extended uptime (~17-20 hours). The crash occurs during a filesystem directory scan operation (AfterScanDir), suggesting memory accumulation from repeated fs operations or polling. This is distinct from the Discord stale-socket crash loop in #55274 — no Discord reconnection errors precede this crash.

Steps to reproduce

  1. Run OpenClaw - 2026.3.28 - gateway with webchat + Discord enabled
  2. Keep webchat UI open (polls node.list every ~5 seconds)
  3. Use browser tool intermittently during the session
  4. Wait ~17-20 hours
  5. Gateway OOMs and crashes

Expected behavior

Gateway should run indefinitely without heap exhaustion. Memory should be reclaimed by GC.

Actual behavior

Heap grows to ~4GB and crashes. GC becomes ineffective (mu = 0.014). Process crashes during filesystem operation with: FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

Crash frequency varies dramatically based on usage: First observed crash: after ~17.5 hours (1,052,687ms) with moderate usage Second crash: after ~14.5 minutes (869,961ms) with active webchat UI open

Both crashes occurred during AfterScanDir filesystem operations with identical stack traces. The leak appears to accelerate significantly when webchat is actively polling node.list every ~5 seconds

OpenClaw version

2026.3.28

Operating system

Linux 6.8.0-106-generic x64

Install method

Docker

Model

anthropic/claude-haiku-4-5 (default), github-copilot/claude-opus-4.5 (session override)

Provider / routing chain

openclaw -> anthropic, openclaw -> github-copilot

Additional provider/model setup details

  • Container: ghcr.io/openclaw/openclaw:2026.3.28
  • Browser: browserless/chromium via CDP (cdpUrl: http://browserless:3000)
  • Channels: Discord + Webchat enabled
  • No explicit NODE_OPTIONS set (default ~4GB heap)

Logs, screenshots, and evidence

GC state before crash:

<--- Last few GCs --->
[16:0x46746000]  1052687 ms: Mark-Compact 4091.0 (4099.6) -> 4090.9 (4101.6) MB, pooled: 0 MB, 892.63 / 0.00 ms  (average mu = 0.587, current mu = 0.014) allocation failure; scavenge might not succeed

Stack trace (partial):

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----
 1: 0x735eec node::OOMErrorHandler(char const*, v8::OOMDetails const&)
...
10: 0xbcb66d v8::String::NewFromOneByte(v8::Isolate*, unsigned char const*, ...)
11: 0xa5ba40 node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, ...)
12: 0x8bdd19 node::fs::AfterScanDir(uv_fs_s*)
13: 0x89f50d node::MakeLibuvRequestCallback<uv_fs_s, void (*)(uv_fs_s*)>::Wrapper(uv_fs_s*)

Pre-crash log pattern (hundreds of these):

[ws] res node.list 330ms conn=adb062ec
[ws] res node.list 268ms conn=adb062ec
[ws] res node.list 280ms conn=adb062ec
... (every ~5 seconds for hours)

Browser tool errors shortly before crash:

[tools] browser failed: tab not found
[tools] browser failed: timed out

Post-crash restart and reconnection:

2026-03-29T16:37:19.818-07:00 [heartbeat] started
2026-03-29T16:37:19.837-07:00 [gateway] listening on ws://0.0.0.0:18789 (PID 16)
2026-03-29T16:37:24.521-07:00 [ws] webchat connected conn=e617f6a8...

Impact and severity

High — Gateway crashes frequently under normal webchat usage:

  • Crashes every ~15 minutes with webchat UI actively open
  • Interrupts active sessions and loses in-flight agent work
  • Webchat users see repeated disconnects
  • Makes webchat UI effectively unusable for extended sessions
  • Auto-recovery works but the rapid crash cycle degrades user experience significantly

Additional information

Potential leak sources to investigate:

  1. node.list polling — Webchat polls every 5s. Response objects may not be GC'd properly.
  2. Browser CDP sessions — Tab references or CDP handles may leak on timeout/close.
  3. Filesystem scans — The AfterScanDir in the stack trace suggests directory listing results accumulating. Possibly related to workspace/skills scanning?
  4. Session context — Long-running sessions with large conversation history.

extent analysis

Fix Plan

To address the JavaScript heap limit issue, we'll focus on the following steps:

  • Increase the Node.js heap size
  • Optimize!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! memory allocation in the node.list polling mechanism
  • Implement a mechanism to limit the!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! number of concurrent filesystem scans

Code Changes

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

// Increase Node.js heap size
// Add the following!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Gateway should run indefinitely without heap exhaustion. Memory should be reclaimed by GC.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Heap exhaustion after extended uptime — OOM during filesystem scan [1 comments, 1 participants]