openclaw - 💡(How to fix) Fix Browser plugin: stale SingletonLock + no retry backoff causes OOM-induced host reboot

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

A stale Chromium SingletonLock (left behind by an unclean shutdown or a previous failed-handshake spawn) causes every subsequent Chrome spawn to fail CDP handshake. The browser plugin then retries indefinitely with ~12 s spacing and no backoff, leaking memory in the gateway process. After several hours the gateway OOM-kills, takes down the desktop session/journald via memory pressure, and the box is forced to reboot.

Reproduced on [email protected], Node v25.8.1, Ubuntu 24.04, kernel 6.17.0-23-generic.

Error Message

[browser] 🦞 openclaw browser started (chrome) profile "openclaw" on 127.0.0.1:18800 (pid <N>) [ws] ⇄ res ✗ browser.request 8xxxms errorCode=UNAVAILABLE errorMessage=Error: Chrome CDP websocket for profile "openclaw" is not reachable after start.

Root Cause

  1. Run the gateway with the browser plugin enabled and a persistent profile under ~/.openclaw/browser/<profile>/user-data/.
  2. Kill the gateway uncleanly (SIGKILL, OOM, hard reboot) while a Chrome child process is alive — SingletonLock, SingletonCookie, SingletonSocket symlinks remain pointing at the dead PID / a now-wiped /tmp socket dir.
  3. Restart the gateway. Any tool call that needs the browser triggers browser.request, which spawns Chrome.
  4. Chrome aborts startup because the singleton it sees is "held". The plugin logs:
    [browser] 🦞 openclaw browser started (chrome) profile \"openclaw\" on 127.0.0.1:18800 (pid <N>)
    [ws] ⇄ res ✗ browser.request 8xxxms errorCode=UNAVAILABLE errorMessage=Error: Chrome CDP websocket for profile \"openclaw\" is not reachable after start.
  5. The plugin retries roughly every 12 s, forever. Memory in the gateway process grows monotonically (orphaned chromium processes, retained Promises / CDP buffers, fd / handle accumulation — needs a heap dump to pinpoint exactly which).

Fix Action

Fix / Workaround

Workaround for affected users

Code Example

[browser] 🦞 openclaw browser started (chrome) profile \"openclaw\" on 127.0.0.1:18800 (pid <N>)
   [ws] ⇄ res ✗ browser.request 8xxxms errorCode=UNAVAILABLE errorMessage=Error: Chrome CDP websocket for profile \"openclaw\" is not reachable after start.

---

$ ls -la ~/.openclaw/browser/openclaw/user-data/Singleton*
SingletonLock   -> openclaw-B650-GAMING-X-AX-207211     # PID 207211 = the last failed Chrome from the prior boot
SingletonCookie -> 11815413789327643664
SingletonSocket -> /tmp/com.google.Chrome.nWB6h9/SingletonSocket   # /tmp wiped on reboot

---

$ journalctl --user -u openclaw-gateway -b -1 | grep -c \"browser started (chrome)\"
397

---

May 09 09:59:09 kernel: gnome-shel:cs0 invoked oom-killer ...
May 09 09:59:09 kernel: oom-kill: ... task=openclaw-gatewa,pid=2398
May 09 09:59:09 kernel: Out of memory: Killed process 2398 (openclaw-gatewa) total-vm:12625900kB, anon-rss:2522760kB
[... cascade kills gnome-shell, dbus-daemon, pipewire, journald watchdog times out, forced reboot at 10:27 ...]

---

systemctl --user stop openclaw-gateway
rm ~/.openclaw/browser/<profile>/user-data/SingletonLock \
   ~/.openclaw/browser/<profile>/user-data/SingletonCookie \
   ~/.openclaw/browser/<profile>/user-data/SingletonSocket
systemctl --user start openclaw-gateway

---

# ~/.config/systemd/user/openclaw-gateway.service.d/20-memory-guardrails.conf
[Unit]
StartLimitBurst=10
StartLimitIntervalSec=600
[Service]
MemoryMax=6G
MemoryHigh=4G
MemorySwapMax=0
RAW_BUFFERClick to expand / collapse

Summary

A stale Chromium SingletonLock (left behind by an unclean shutdown or a previous failed-handshake spawn) causes every subsequent Chrome spawn to fail CDP handshake. The browser plugin then retries indefinitely with ~12 s spacing and no backoff, leaking memory in the gateway process. After several hours the gateway OOM-kills, takes down the desktop session/journald via memory pressure, and the box is forced to reboot.

Reproduced on [email protected], Node v25.8.1, Ubuntu 24.04, kernel 6.17.0-23-generic.

Impact

In our incident, the gateway accumulated 397 failed Chrome spawn attempts over ~6 hours, grew from baseline to ~2.5 GB RSS, and triggered a global OOM cascade that killed gnome-shell, pipewire, dbus-daemon, gcr-ssh-agent, gnome-session, and tripped the journald (3 min) and snapd (5 min) watchdogs — forcing a reboot. Box has 61 GB RAM + 8 GB swap, so this is purely a runaway-leak symptom, not a sized-too-small host.

Reproduction

  1. Run the gateway with the browser plugin enabled and a persistent profile under ~/.openclaw/browser/<profile>/user-data/.
  2. Kill the gateway uncleanly (SIGKILL, OOM, hard reboot) while a Chrome child process is alive — SingletonLock, SingletonCookie, SingletonSocket symlinks remain pointing at the dead PID / a now-wiped /tmp socket dir.
  3. Restart the gateway. Any tool call that needs the browser triggers browser.request, which spawns Chrome.
  4. Chrome aborts startup because the singleton it sees is "held". The plugin logs:
    [browser] 🦞 openclaw browser started (chrome) profile \"openclaw\" on 127.0.0.1:18800 (pid <N>)
    [ws] ⇄ res ✗ browser.request 8xxxms errorCode=UNAVAILABLE errorMessage=Error: Chrome CDP websocket for profile \"openclaw\" is not reachable after start.
  5. The plugin retries roughly every 12 s, forever. Memory in the gateway process grows monotonically (orphaned chromium processes, retained Promises / CDP buffers, fd / handle accumulation — needs a heap dump to pinpoint exactly which).

Evidence from our incident

$ ls -la ~/.openclaw/browser/openclaw/user-data/Singleton*
SingletonLock   -> openclaw-B650-GAMING-X-AX-207211     # PID 207211 = the last failed Chrome from the prior boot
SingletonCookie -> 11815413789327643664
SingletonSocket -> /tmp/com.google.Chrome.nWB6h9/SingletonSocket   # /tmp wiped on reboot
$ journalctl --user -u openclaw-gateway -b -1 | grep -c \"browser started (chrome)\"
397
May 09 09:59:09 kernel: gnome-shel:cs0 invoked oom-killer ...
May 09 09:59:09 kernel: oom-kill: ... task=openclaw-gatewa,pid=2398
May 09 09:59:09 kernel: Out of memory: Killed process 2398 (openclaw-gatewa) total-vm:12625900kB, anon-rss:2522760kB
[... cascade kills gnome-shell, dbus-daemon, pipewire, journald watchdog times out, forced reboot at 10:27 ...]

Removing the three stale Singleton* symlinks fixed the spawn failure immediately.

Requested fixes

1. Reap stale singletons on browser plugin startup. Before invoking Chrome, check <user-data>/SingletonLock. If it's a symlink whose target encodes a PID (<host>-<pid>) and that PID is not alive (or is alive but isn't a chromium process), remove SingletonLock, SingletonCookie, and SingletonSocket. Also remove SingletonSocket if its /tmp/... target no longer exists. This is the same recovery Chrome itself performs interactively when run from a desktop, but it does not happen reliably under headless/automation use.

2. Exponential backoff + circuit breaker on CDP-unreachable failures. The current ~12 s fixed retry interval means a broken profile produces ~300 spawn attempts per hour. Suggested behavior:

  • Backoff: 5 s → 15 s → 45 s → 2 m → 5 m (cap).
  • Circuit-breaker: after N (e.g. 5) consecutive CDP websocket ... not reachable failures, stop retrying, mark the profile unhealthy, and fail subsequent browser.request calls fast with a clear actionable error ("profile X failed N CDP handshakes — try \openclaw browser reset --profile X\\"). Auto-reset after a long cooldown or on explicit user action.

3. Defensive cleanup of orphaned chromium processes. When a spawn fails handshake, kill the Chrome process tree we just launched before retrying. From our top output during the leak window, chromium children were not being reaped, which is part of why parent RSS / pgtables grew so much (pgtables:69968 kB at OOM time).

4. (Stretch) openclaw browser doctor / openclaw browser reset --profile <name> subcommand that checks for stale singletons, dead pids, dangling /tmp sockets, and offers to clean them. Cheap to add and would surface this class of issue without a forensic dive into the journal.

Workaround for affected users

systemctl --user stop openclaw-gateway
rm ~/.openclaw/browser/<profile>/user-data/SingletonLock \
   ~/.openclaw/browser/<profile>/user-data/SingletonCookie \
   ~/.openclaw/browser/<profile>/user-data/SingletonSocket
systemctl --user start openclaw-gateway

And add a memory ceiling to the unit so a future leak can't take the host down:

# ~/.config/systemd/user/openclaw-gateway.service.d/20-memory-guardrails.conf
[Unit]
StartLimitBurst=10
StartLimitIntervalSec=600
[Service]
MemoryMax=6G
MemoryHigh=4G
MemorySwapMax=0

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Browser plugin: stale SingletonLock + no retry backoff causes OOM-induced host reboot