openclaw - 💡(How to fix) Fix [Bug]: Heap exhaustion after extended uptime — OOM during filesystem scan [1 comments, 1 participants]

stevenepalmer · 2026-03-29T23:55:57Z

[openclaw] OpenClaw gateway hits JavaScript heap limit ~4GB and OOMs after extended uptime ~17-20 hours . The crash occurs during a filesystem directory scan o… OpenClaw gateway hits JavaScript heap limit (~4GB) and OOMs after extended uptime (~17-20 hours). The crash occurs during a filesystem directory scan operation (`AfterScanDir`), suggesting memory accumulation from repeated fs operations or polling. This is distinct from the Discord stale-socket crash loop in #55274 — no Discord reconnection errors precede this crash. ### Bug type Crash (process/app exits or hangs) ### Beta release blocker No ### Summary OpenClaw gateway hits JavaScript heap limit (~4GB) and OOMs after extended uptime (~17-20 hours). The crash occurs during a filesystem directory scan operation (`AfterScanDir`), suggesting memory accumulation from repeated fs operations or polling. This is distinct from the Discord stale-socket crash loop in #55274 — no Discord reconnection errors precede this crash. ### Steps to reproduce 1. Run OpenClaw - 2026.3.28 - gateway with webchat + Discord enabled 2. Keep webchat UI open (polls `node.list` every ~5 seconds) 3. Use browser tool intermittently during the session 4. Wait ~17-20 hours 5. Gateway OOMs and crashes ### Expected behavior Gateway should run indefinitely without heap exhaustion. Memory should be reclaimed by GC. ### Actual behavior Heap grows to ~4GB and crashes. GC becomes ineffective (mu = 0.014). Process crashes during filesystem operation with: FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory Crash frequency varies dramatically based on usage: First observed crash: after ~17.5 hours (1,052,687ms) with moderate usage Second crash: after ~14.5 minutes (869,961ms) with active webchat UI open Both crashes occurred during AfterScanDir filesystem operations with identical stack traces. The leak appears to accelerate significantly when webchat is actively polling node.list every ~5 seconds ### OpenClaw version 2026.3.28 ### Operating system Linux 6.8.0-106-generic x64 ### Install method Docker ### Model anthropic/claude-haiku-4-5 (default), github-copilot/claude-opus-4.5 (session override) ### Provider / routing chain openclaw -> anthropic, openclaw -> github-copilot ### Additional provider/model setup details - Container: ghcr.io/openclaw/openclaw:2026.3.28 - Browser: browserless/chromium via CDP (cdpUrl: http://browserless:3000) - Channels: Discord + Webchat enabled - No explicit NODE_OPTIONS set (default ~4GB heap) ### Logs, screenshots, and evidence ```shell GC state before crash: [16:0x46746000] 1052687 ms: Mark-Compact 4091.0 (4099.6) -> 4090.9 (4101.6) MB, pooled: 0 MB, 892.63 / 0.00 ms (average mu = 0.587, current mu = 0.014) allocation failure; scavenge might not succeed Stack trace (partial): FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace ----- 1: 0x735eec node::OOMErrorHandler(char const*, v8::OOMDetails const&) ... 10: 0xbcb66d v8::String::NewFromOneByte(v8::Isolate*, unsigned char const*, ...) 11: 0xa5ba40 node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, ...) 12: 0x8bdd19 node::fs::AfterScanDir(uv_fs_s*) 13: 0x89f50d node::MakeLibuvRequestCallback ::Wrapper(uv_fs_s*) Pre-crash log pattern (hundreds of these): [ws] res node.list 330ms conn=adb062ec [ws] res node.list 268ms conn=adb062ec [ws] res node.list 280ms conn=adb062ec ... (every ~5 seconds for hours) Browser tool errors shortly before crash: [tools] browser failed: tab not found [tools] browser failed: timed out Post-crash restart and reconnection: 2026-03-29T16:37:19.818-07:00 [heartbeat] started 2026-03-29T16:37:19.837-07:00 [gateway] listening on ws://0.0.0.0:18789 (PID 16) 2026-03-29T16:37:24.521-07:00 [ws] webchat connected conn=e617f6a8... ``` ### Impact and severity High — Gateway crashes frequently under normal webchat usage: - Crashes every ~15 minutes with webchat UI actively open - Interrupts active sessions and loses in-flight agent work - Webchat users see repeated disconnects - Makes webchat UI effectively unusable for extended sessions - Auto-recovery works but the rapid crash cycle degrades user experience significantly ### Additional information Potential leak sources to investigate: 1. node.list polling — Webchat polls every 5s. Response objects may not be GC'd properly. 2. Browser CDP sessions — Tab references or CDP handles may leak on timeout/close. 3. Filesystem scans — The AfterScanDir in the stack trace suggests directory listing results accumulating. Possibly related to workspace/skills scanning? 4. Session context — Long-running sessions with large conversation history.

Error Message

GC state before crash:

<--- Last few GCs ---> [16:0x46746000] 1052687 ms: Mark-Compact 4091.0 (4099.6) -> 4090.9 (4101.6) MB, pooled: 0 MB, 892.63 / 0.00 ms (average mu = 0.587, current mu = 0.014) allocation failure; scavenge might not succeed

Stack trace (partial):

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace ----- 1: 0x735eec node::OOMErrorHandler(char const*, v8::OOMDetails const&) ... 10: 0xbcb66d v8::String::NewFromOneByte(v8::Isolate*, unsigned char const*, ...) 11: 0xa5ba40 node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, ...) 12: 0x8bdd19 node::fs::AfterScanDir(uv_fs_s*) 13: 0x89f50d node::MakeLibuvRequestCallback<uv_fs_s, void ()(uv_fs_s)>::Wrapper(uv_fs_s*)

Pre-crash log pattern (hundreds of these):

[ws] res node.list 330ms conn=adb062ec [ws] res node.list 268ms conn=adb062ec [ws] res node.list 280ms conn=adb062ec ... (every ~5 seconds for hours)

Browser tool errors shortly before crash:

[tools] browser failed: tab not found [tools] browser failed: timed out

Post-crash restart and reconnection:

2026-03-29T16:37:19.818-07:00 [heartbeat] started 2026-03-29T16:37:19.837-07:00 [gateway] listening on ws://0.0.0.0:18789 (PID 16) 2026-03-29T16:37:24.521-07:00 [ws] webchat connected conn=e617f6a8...

Root Cause

OpenClaw gateway hits JavaScript heap limit (~4GB) and OOMs after extended uptime (~17-20 hours). The crash occurs during a filesystem directory scan operation (AfterScanDir), suggesting memory accumulation from repeated fs operations or polling. This is distinct from the Discord stale-socket crash loop in #55274 — no Discord reconnection errors precede this crash.

Code Example

GC state before crash:

<--- Last few GCs --->
[16:0x46746000]  1052687 ms: Mark-Compact 4091.0 (4099.6) -> 4090.9 (4101.6) MB, pooled: 0 MB, 892.63 / 0.00 ms  (average mu = 0.587, current mu = 0.014) allocation failure; scavenge might not succeed

Stack trace (partial):

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----
 1: 0x735eec node::OOMErrorHandler(char const*, v8::OOMDetails const&)
...
10: 0xbcb66d v8::String::NewFromOneByte(v8::Isolate*, unsigned char const*, ...)
11: 0xa5ba40 node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, ...)
12: 0x8bdd19 node::fs::AfterScanDir(uv_fs_s*)
13: 0x89f50d node::MakeLibuvRequestCallback<uv_fs_s, void (*)(uv_fs_s*)>::Wrapper(uv_fs_s*)

Pre-crash log pattern (hundreds of these):

[ws] res node.list 330ms conn=adb062ec
[ws] res node.list 268ms conn=adb062ec
[ws] res node.list 280ms conn=adb062ec
... (every ~5 seconds for hours)

Browser tool errors shortly before crash:

[tools] browser failed: tab not found
[tools] browser failed: timed out

Post-crash restart and reconnection:

2026-03-29T16:37:19.818-07:00 [heartbeat] started
2026-03-29T16:37:19.837-07:00 [gateway] listening on ws://0.0.0.0:18789 (PID 16)
2026-03-29T16:37:24.521-07:00 [ws] webchat connected conn=e617f6a8...

Bug type

Crash (process/app exits or hangs)

Beta release blocker

Summary

Steps to reproduce

Run OpenClaw - 2026.3.28 - gateway with webchat + Discord enabled
Keep webchat UI open (polls node.list every ~5 seconds)
Use browser tool intermittently during the session
Wait ~17-20 hours
Gateway OOMs and crashes

Expected behavior

Gateway should run indefinitely without heap exhaustion. Memory should be reclaimed by GC.

Actual behavior

Heap grows to ~4GB and crashes. GC becomes ineffective (mu = 0.014). Process crashes during filesystem operation with: FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

Crash frequency varies dramatically based on usage: First observed crash: after ~17.5 hours (1,052,687ms) with moderate usage Second crash: after ~14.5 minutes (869,961ms) with active webchat UI open

Both crashes occurred during AfterScanDir filesystem operations with identical stack traces. The leak appears to accelerate significantly when webchat is actively polling node.list every ~5 seconds

OpenClaw version

2026.3.28

Operating system

Linux 6.8.0-106-generic x64

Install method

Docker

Model

anthropic/claude-haiku-4-5 (default), github-copilot/claude-opus-4.5 (session override)

Provider / routing chain

openclaw -> anthropic, openclaw -> github-copilot

Additional provider/model setup details

Container: ghcr.io/openclaw/openclaw:2026.3.28
Browser: browserless/chromium via CDP (cdpUrl: http://browserless:3000)
Channels: Discord + Webchat enabled
No explicit NODE_OPTIONS set (default ~4GB heap)

Logs, screenshots, and evidence

GC state before crash:

<--- Last few GCs --->
[16:0x46746000]  1052687 ms: Mark-Compact 4091.0 (4099.6) -> 4090.9 (4101.6) MB, pooled: 0 MB, 892.63 / 0.00 ms  (average mu = 0.587, current mu = 0.014) allocation failure; scavenge might not succeed

Stack trace (partial):

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----
 1: 0x735eec node::OOMErrorHandler(char const*, v8::OOMDetails const&)
...
10: 0xbcb66d v8::String::NewFromOneByte(v8::Isolate*, unsigned char const*, ...)
11: 0xa5ba40 node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, ...)
12: 0x8bdd19 node::fs::AfterScanDir(uv_fs_s*)
13: 0x89f50d node::MakeLibuvRequestCallback<uv_fs_s, void (*)(uv_fs_s*)>::Wrapper(uv_fs_s*)

Pre-crash log pattern (hundreds of these):

[ws] res node.list 330ms conn=adb062ec
[ws] res node.list 268ms conn=adb062ec
[ws] res node.list 280ms conn=adb062ec
... (every ~5 seconds for hours)

Browser tool errors shortly before crash:

[tools] browser failed: tab not found
[tools] browser failed: timed out

Post-crash restart and reconnection:

2026-03-29T16:37:19.818-07:00 [heartbeat] started
2026-03-29T16:37:19.837-07:00 [gateway] listening on ws://0.0.0.0:18789 (PID 16)
2026-03-29T16:37:24.521-07:00 [ws] webchat connected conn=e617f6a8...

Impact and severity

High — Gateway crashes frequently under normal webchat usage:

Crashes every ~15 minutes with webchat UI actively open
Interrupts active sessions and loses in-flight agent work
Webchat users see repeated disconnects
Makes webchat UI effectively unusable for extended sessions
Auto-recovery works but the rapid crash cycle degrades user experience significantly

Additional information

Potential leak sources to investigate:

node.list polling — Webchat polls every 5s. Response objects may not be GC'd properly.
Browser CDP sessions — Tab references or CDP handles may leak on timeout/close.
Filesystem scans — The AfterScanDir in the stack trace suggests directory listing results accumulating. Possibly related to workspace/skills scanning?
Session context — Long-running sessions with large conversation history.

extent analysis

Fix Plan

To address the JavaScript heap limit issue, we'll focus on the following steps:

Increase the Node.js heap size
Optimize!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! memory allocation in the node.list polling mechanism
Implement a mechanism to limit the!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! number of concurrent filesystem scans

Code Changes

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

// Increase Node.js heap size
// Add the following!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: Heap exhaustion after extended uptime — OOM during filesystem scan [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

Fix Plan

Code Changes

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Heap exhaustion after extended uptime — OOM during filesystem scan [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

Fix Plan

Code Changes

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING