openclaw - ✅(Solved) Fix web_fetch: Chrome renderer processes accumulate and are never cleaned up (memory leak) [1 pull requests, 2 comments, 3 participants]

HendrikHarren · 2026-04-22T16:53:13Z

[openclaw] When using web fetch headless Chrome browser tool , renderer processes accumulate over time and are never terminated, causing a memory leak that can… When using `web_fetch` (headless Chrome browser tool), renderer processes accumulate over time and are never terminated, causing a memory leak that can crash the gateway on constrained servers. # PR #70419: fix(gateway): raise child oom_score_adj on linux to spare the gateway under OOM - Repository: openclaw/openclaw - Author: neeravmakwana - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/70419 ## Description (problem / solution / changelog) Closes #70404. ## Root Cause On Linux, child processes inherit the gateway's `oom_score_adj`. In a memory-constrained cgroup, the gateway is often the largest-RSS process because it keeps long-lived WebSocket state and V8 heap resident, while transient children such as agent workers, MCP stdio servers, PTY shells, and Chrome/browser helpers are smaller individually. When the cgroup hits its memory limit, the kernel can therefore kill `openclaw-gateway` instead of the transient child that pushed the cgroup over the edge. The gateway exits with 137 and all connected sessions drop. The important constraint: lowering the gateway's OOM score, or having the parent process write a lower score into children, is capability-sensitive in hardened containers. The reliable unprivileged operation is the opposite: a Linux process may voluntarily increase its own OOM kill likelihood. ## Fix Add a shared Linux-only spawn helper that wraps eligible child commands in a short `/bin/sh` shim: ```sh /bin/sh -c 'echo 1000 > /proc/self/oom_score_adj 2>/dev/null; exec "$0" "$@"' ``` The shim runs in the post-fork child, raises that child's own `oom_score_adj`, then `exec`s the real command. There is no extra long-lived shell process, and after `exec` the process identity, PID, stdio, exit, and kill semantics remain the target process. Current covered spawn surfaces: - `src/process/supervisor/adapters/child.ts` for regular supervisor-managed children. - `src/process/supervisor/adapters/pty.ts` for PTY-backed shell children. - `src/agents/mcp-stdio-transport.ts` for MCP stdio server children. - `extensions/browser/src/browser/chrome.ts` for launched browser/Chrome processes, through the public plugin SDK process-runtime seam. The helper is no-op when: - the platform is not Linux, - `OPENCLAW_CHILD_OOM_SCORE_ADJ=0` / `false` / `no` / `off` is set in the child env, - `/bin/sh` is unavailable, so distroless/scratch images degrade to previous behavior instead of failing with `ENOENT`, - the argv is already wrapped, - the command name starts with `-`, because POSIX `sh` implementations do not support `exec --` and a leading-dash command could be parsed as an `exec` option. ## Safety Notes - **Linux-only behavior.** macOS, Windows, and other platforms keep their existing spawn shape. - **Argument-safe execution.** The wrapper script is fixed text. The real command and args are passed as shell positional parameters and executed with POSIX-compatible `exec "$0" "$@"`, so user args are not re-parsed as shell source. Leading-dash command names are intentionally left on the original direct-spawn path. - **Shell env hardening.** Wrapped spawns strip `BASH_ENV`, `ENV`, and `CDPATH` so the `/bin/sh -c` shim cannot source caller-influenced startup files before `exec`. - **Transparent failure mode.** If `/proc/self/oom_score_adj` is unavailable or unwritable, stderr is suppressed and the child still runs normally. It just does not get the OOM bias. - **Plugin boundary kept clean.** Browser plugin code uses `openclaw/plugin-sdk/process-runtime`; it does not deep-import core internals. ## Scope Boundary / Related Work This PR is intentionally a kernel victim-selection fix. It does **not** try to solve every child-process OOM class. Related issues/PRs that remain separate work: - #70400, #70389, #69145, #64169, #64984: MCP stdio/runtime lifecycle leaks. This PR makes leaked or transient MCP children better OOM victims than the gateway, but it does not replace proper runtime disposal and transport shutdown ordering. - #70270, #55698, #30130, #31504: browser/Chrome renderer cleanup and container hardening. This PR covers launched browser process trees with the OOM bias, but stale renderer cleanup/resource caps remain separate lifecycle work. - #23409, #28629: broader child resource controls such as cgroup v2 limits, systemd `MemoryMax=`, spawn caps, and watchdogs. Those are stronger resource-governance features and should not be folded into this focused fix. - #68680, #69242: SIGKILL observability. Once children are intentionally preferred OOM victims, surfacing signal-killed subprocesses clearly becomes more useful, but it is an independent reporting improvement. - #52205, #47776: process-group and orphan cleanup. The shim uses `exec`, so it preserves the existing process-tree cleanup model rather than changing

openclaw2026-04-22 16:53:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#70270•Fetched 2026-04-23 07:26:55

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×2cross-referenced ×1

When using web_fetch (headless Chrome browser tool), renderer processes accumulate over time and are never terminated, causing a memory leak that can crash the gateway on constrained servers.

Error Message

After approximately 24 hours of normal use (briefing cron jobs, web_fetch calls), the Chrome renderer count grew to 46 processes consuming 2.6 GB RAM on a 3.7 GB VPS (Hetzner CX22). RAM usage reached 91%, with 1.8 GB swap in use.

Root Cause

When using web_fetch (headless Chrome browser tool), renderer processes accumulate over time and are never terminated, causing a memory leak that can crash the gateway on constrained servers.

Fix Action

Workaround

Daily gateway restart via cron:

45 3 * * * XDG_RUNTIME_DIR=/run/user/1000 systemctl --user restart openclaw-gateway.service

PR fix notes

PR #70419: fix(gateway): raise child oom_score_adj on linux to spare the gateway under OOM

Repository: openclaw/openclaw
Author: neeravmakwana
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/70419

Description (problem / solution / changelog)

Closes #70404.

Root Cause

On Linux, child processes inherit the gateway's oom_score_adj. In a memory-constrained cgroup, the gateway is often the largest-RSS process because it keeps long-lived WebSocket state and V8 heap resident, while transient children such as agent workers, MCP stdio servers, PTY shells, and Chrome/browser helpers are smaller individually. When the cgroup hits its memory limit, the kernel can therefore kill openclaw-gateway instead of the transient child that pushed the cgroup over the edge. The gateway exits with 137 and all connected sessions drop.

The important constraint: lowering the gateway's OOM score, or having the parent process write a lower score into children, is capability-sensitive in hardened containers. The reliable unprivileged operation is the opposite: a Linux process may voluntarily increase its own OOM kill likelihood.

Fix

Add a shared Linux-only spawn helper that wraps eligible child commands in a short /bin/sh shim:

/bin/sh -c 'echo 1000 > /proc/self/oom_score_adj 2>/dev/null; exec "$0" "$@"' <cmd> <args...>

The shim runs in the post-fork child, raises that child's own oom_score_adj, then execs the real command. There is no extra long-lived shell process, and after exec the process identity, PID, stdio, exit, and kill semantics remain the target process.

Current covered spawn surfaces:

src/process/supervisor/adapters/child.ts for regular supervisor-managed children.
src/process/supervisor/adapters/pty.ts for PTY-backed shell children.
src/agents/mcp-stdio-transport.ts for MCP stdio server children.
extensions/browser/src/browser/chrome.ts for launched browser/Chrome processes, through the public plugin SDK process-runtime seam.

The helper is no-op when:

the platform is not Linux,
OPENCLAW_CHILD_OOM_SCORE_ADJ=0 / false / no / off is set in the child env,
/bin/sh is unavailable, so distroless/scratch images degrade to previous behavior instead of failing with ENOENT,
the argv is already wrapped,
the command name starts with -, because POSIX sh implementations do not support exec -- and a leading-dash command could be parsed as an exec option.

Safety Notes

Linux-only behavior. macOS, Windows, and other platforms keep their existing spawn shape.
Argument-safe execution. The wrapper script is fixed text. The real command and args are passed as shell positional parameters and executed with POSIX-compatible exec "$0" "$@", so user args are not re-parsed as shell source. Leading-dash command names are intentionally left on the original direct-spawn path.
Shell env hardening. Wrapped spawns strip BASH_ENV, ENV, and CDPATH so the /bin/sh -c shim cannot source caller-influenced startup files before exec.
Transparent failure mode. If /proc/self/oom_score_adj is unavailable or unwritable, stderr is suppressed and the child still runs normally. It just does not get the OOM bias.
Plugin boundary kept clean. Browser plugin code uses openclaw/plugin-sdk/process-runtime; it does not deep-import core internals.

Scope Boundary / Related Work

This PR is intentionally a kernel victim-selection fix. It does not try to solve every child-process OOM class.

Related issues/PRs that remain separate work:

#70400, #70389, #69145, #64169, #64984: MCP stdio/runtime lifecycle leaks. This PR makes leaked or transient MCP children better OOM victims than the gateway, but it does not replace proper runtime disposal and transport shutdown ordering.
#70270, #55698, #30130, #31504: browser/Chrome renderer cleanup and container hardening. This PR covers launched browser process trees with the OOM bias, but stale renderer cleanup/resource caps remain separate lifecycle work.
#23409, #28629: broader child resource controls such as cgroup v2 limits, systemd MemoryMax=, spawn caps, and watchdogs. Those are stronger resource-governance features and should not be folded into this focused fix.
#68680, #69242: SIGKILL observability. Once children are intentionally preferred OOM victims, surfacing signal-killed subprocesses clearly becomes more useful, but it is an independent reporting improvement.
#52205, #47776: process-group and orphan cleanup. The shim uses exec, so it preserves the existing process-tree cleanup model rather than changing it.

Documentation

Added Linux docs for OOM victim selection, covered child process surfaces, opt-out env values, and /proc/<pid>/oom_score_adj verification:

docs/platforms/linux.md
docs/vps.md

Live Linux Docker Validation

Ran on node:22-bookworm inside Docker and verified real /proc/<pid>/oom_score_adj values for all covered spawn paths:

direct shared helper wrapped spawn: 1000
direct helper opt-out with OPENCLAW_CHILD_OOM_SCORE_ADJ=0: 0
supervisor child adapter: 1000
PTY adapter: 1000
MCP stdio transport: 1000
browser launch path with a fake Chrome executable: 1000

Also ran a cgroup memory-pressure simulation with --memory=256m --memory-swap=256m, a gateway-like parent holding ~179 MB RSS, and a child allocating memory in 4 MB chunks:

baseline/no wrapper: child inherited oom_score_adj=0; the parent/container was killed with exit 137 while the child was around 141 MB RSS.
wrapper enabled: child had oom_score_adj=1000; the child was killed with SIGKILL while the parent stayed alive at ~179 MB RSS.

This live pass also caught a portability bug in the earlier wrapper: Debian's /bin/sh is dash and rejects exec --. The PR now uses portable exec "$0" "$@" and skips wrapping leading-dash command names.

Tests Run

pnpm docs:list
pnpm test src/process/linux-oom-score.test.ts src/process/supervisor/adapters/child.test.ts src/process/supervisor/adapters/pty.test.ts src/agents/mcp-stdio-transport.test.ts extensions/browser/src/browser/chrome.internal.test.ts
node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-browser.config.ts extensions/browser/src/browser/chrome.internal.test.ts
pnpm tsgo:prod
pnpm plugin-sdk:check-exports
pnpm plugin-sdk:api:check
pnpm check:changed
Linux Docker live harness against node:22-bookworm verifying /proc/<pid>/oom_score_adj for helper, opt-out, supervisor child, PTY, MCP stdio, and browser launch paths.
Linux Docker cgroup memory-pressure simulation with --memory=256m --memory-swap=256m, confirming the wrapper changes victim selection from parent/container to child.

Note: after the full pnpm check:changed passed locally on the prior commit, later repeated pnpm check:changed / combined targeted test invocations hit a Vitest unit-fast process stuck at 0% CPU. The focused test lanes above were rerun split by lane and passed.

Changed files

CHANGELOG.md (modified, +1/-0)
docs/.generated/plugin-sdk-api-baseline.sha256 (modified, +2/-2)
docs/platforms/linux.md (modified, +37/-0)
docs/vps.md (modified, +3/-0)
extensions/browser/src/browser/chrome.ts (modified, +6/-2)
src/agents/mcp-stdio-transport.test.ts (modified, +20/-3)
src/agents/mcp-stdio-transport.ts (modified, +12/-5)
src/plugin-sdk/process-runtime.ts (modified, +2/-0)
src/process/linux-oom-score.test.ts (added, +105/-0)
src/process/linux-oom-score.ts (added, +143/-0)
src/process/supervisor/adapters/child.test.ts (modified, +53/-1)
src/process/supervisor/adapters/child.ts (modified, +7/-2)
src/process/supervisor/adapters/pty.test.ts (modified, +55/-8)
src/process/supervisor/adapters/pty.ts (modified, +5/-2)

Code Example

$ ps aux | grep 'chrome.*renderer' | wc -l
46

$ ps aux | grep 'chrome.*renderer' | awk '{sum += $6} END {print sum " KB"}'
2611628 KB   # ≈ 2.6 GB

---

45 3 * * * XDG_RUNTIME_DIR=/run/user/1000 systemctl --user restart openclaw-gateway.service

RAW_BUFFERClick to expand / collapse

Summary

When using web_fetch (headless Chrome browser tool), renderer processes accumulate over time and are never terminated, causing a memory leak that can crash the gateway on constrained servers.

Observed behavior

$ ps aux | grep 'chrome.*renderer' | wc -l
46

$ ps aux | grep 'chrome.*renderer' | awk '{sum += $6} END {print sum " KB"}'
2611628 KB   # ≈ 2.6 GB

All 43 stale renderer processes had been running since the previous day (Apr 21) — none were cleaned up after their web_fetch sessions completed.

Steps to reproduce

Configure web_fetch with the browser provider
Run several cron sessions that call web_fetch (e.g. daily briefings, market monitors)
Wait 12–24 hours
Observe Chrome renderer processes accumulating (ps aux | grep chrome | wc -l)

Environment

OpenClaw: 2026.4.15
OS: Debian 13 (Trixie), Linux 6.12.74
Node: 22.22.2
Chrome: /opt/google/chrome (headless), --no-sandbox --disable-dev-shm-usage
VPS: Hetzner CX22 (2 vCPU, 4 GB RAM)

Impact

Gateway requires full restart to recover. On a 4 GB VPS this can cause OOM within 24 hours.

Workaround

Daily gateway restart via cron:

45 3 * * * XDG_RUNTIME_DIR=/run/user/1000 systemctl --user restart openclaw-gateway.service

Expected behavior

Chrome renderer processes should be terminated after their web_fetch session completes, not left running indefinitely.

extent analysis

TL;DR

The most likely fix is to implement a mechanism to terminate Chrome renderer processes after their web_fetch session completes, potentially through modifications to the web_fetch tool or its configuration.

Guidance

Investigate the web_fetch tool's configuration and documentation to see if there are any options for automatically terminating Chrome renderer processes after use.
Consider implementing a script or cron job to periodically clean up stale Chrome renderer processes, in addition to the existing daily gateway restart workaround.
Review the system's resource monitoring and alerting to ensure that memory usage thresholds are being tracked and alerted on to prevent unexpected out-of-memory errors.
Examine the Chrome launch flags (--no-sandbox --disable-dev-shm-usage) to determine if they may be contributing to the issue, and consider alternative flags or configurations.

Example

No specific code example is provided, as the issue is more related to system configuration and process management.

Notes

The provided workaround of daily gateway restarts via cron may not be suitable for all environments, and a more targeted solution to terminate Chrome renderer processes after use is desirable to prevent memory leaks and out-of-memory errors.

Recommendation

Apply a workaround, such as implementing a script to periodically clean up stale Chrome renderer processes, until a more permanent fix can be developed and implemented. This will help mitigate the memory leak issue and prevent out-of-memory errors.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Chrome renderer processes should be terminated after their web_fetch session completes, not left running indefinitely.

#pipeline error #runtime error #dependency conflict #environment setup #memory leak

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix web_fetch: Chrome renderer processes accumulate and are never cleaned up (memory leak) [1 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

PR fix notes

PR #70419: fix(gateway): raise child oom_score_adj on linux to spare the gateway under OOM

Description (problem / solution / changelog)

Root Cause

Fix

Safety Notes

Scope Boundary / Related Work

Documentation

Live Linux Docker Validation

Tests Run

Changed files

Code Example

Summary

Observed behavior

Steps to reproduce

Environment

Impact

Workaround

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING