openclaw - 💡(How to fix) Fix Gateway wrapper's spawned child process can survive parent SIGTERM, blocking systemd restarts on EADDRINUSE

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When openclaw.mjs gateway is run under a systemd unit (Type=simple, User=...) and the unit is stopped or restarted, the spawned node dist/index.js gateway child process can survive its parent and become an orphan reparented to PID 1. The orphan keeps the gateway port bound. systemd's auto-restart then fails repeatedly with EADDRINUSE, eventually marking the service as failed (Start request repeated too quickly).

I observed this twice during a v2026.3.x → v2026.5.7 upgrade where I was issuing several stop/start cycles in quick succession. Both times the orphan was a root-owned node process — the unit's ExecStartPre=/bin/bash -c 'fuser -k 18789/tcp ...' runs as the unit's User= (a non-root user) and silently fails to kill a root-owned orphan, so the next ExecStart keeps hitting EADDRINUSE.

The root-ownership of the orphan in my case came from running diagnostic commands via sudo while debugging the upgrade. So the failure mode is more general than just root orphans: any orphan with a different ownership than the unit's User= will not be reaped by the bundled ExecStartPre fuser -k.

Root Cause

When openclaw.mjs gateway is run under a systemd unit (Type=simple, User=...) and the unit is stopped or restarted, the spawned node dist/index.js gateway child process can survive its parent and become an orphan reparented to PID 1. The orphan keeps the gateway port bound. systemd's auto-restart then fails repeatedly with EADDRINUSE, eventually marking the service as failed (Start request repeated too quickly).

I observed this twice during a v2026.3.x → v2026.5.7 upgrade where I was issuing several stop/start cycles in quick succession. Both times the orphan was a root-owned node process — the unit's ExecStartPre=/bin/bash -c 'fuser -k 18789/tcp ...' runs as the unit's User= (a non-root user) and silently fails to kill a root-owned orphan, so the next ExecStart keeps hitting EADDRINUSE.

The root-ownership of the orphan in my case came from running diagnostic commands via sudo while debugging the upgrade. So the failure mode is more general than just root orphans: any orphan with a different ownership than the unit's User= will not be reaped by the bundled ExecStartPre fuser -k.

Fix Action

Workaround

The most surgical fix at the systemd-unit layer is the + prefix on ExecStartPre/ExecStopPost, which causes those hooks to run as root regardless of User=. Combined with explicit cgroup directives:

-ExecStartPre=/bin/bash -c 'fuser -k 18789/tcp 2>/dev/null || true'
+ExecStartPre=+/bin/bash -c 'fuser -k 18789/tcp 18791/tcp 18792/tcp 2>/dev/null || true'
 ExecStart=/usr/bin/node openclaw.mjs gateway
+ExecStopPost=+/bin/bash -c 'fuser -k 18789/tcp 18791/tcp 18792/tcp 2>/dev/null || true'
 Restart=on-failure
+KillMode=control-group
+TimeoutStopSec=20

This reaps any port-holder regardless of ownership, so the EADDRINUSE crash-loop can no longer happen. Verified with a real stop/start cycle: port released cleanly, fresh PID came up bound in 4 seconds.

But this is a defensive patch — it doesn't address the underlying reason the child process detaches from the wrapper in the first place. The real fix should be in openclaw.mjs's child-process lifecycle so the spawned node child can't outlive its parent.

Code Example

-ExecStartPre=/bin/bash -c 'fuser -k 18789/tcp 2>/dev/null || true'
+ExecStartPre=+/bin/bash -c 'fuser -k 18789/tcp 18791/tcp 18792/tcp 2>/dev/null || true'
 ExecStart=/usr/bin/node openclaw.mjs gateway
+ExecStopPost=+/bin/bash -c 'fuser -k 18789/tcp 18791/tcp 18792/tcp 2>/dev/null || true'
 Restart=on-failure
+KillMode=control-group
+TimeoutStopSec=20
RAW_BUFFERClick to expand / collapse

Gateway wrapper's spawned child process can survive parent SIGTERM, blocking systemd restarts on EADDRINUSE

Summary

When openclaw.mjs gateway is run under a systemd unit (Type=simple, User=...) and the unit is stopped or restarted, the spawned node dist/index.js gateway child process can survive its parent and become an orphan reparented to PID 1. The orphan keeps the gateway port bound. systemd's auto-restart then fails repeatedly with EADDRINUSE, eventually marking the service as failed (Start request repeated too quickly).

I observed this twice during a v2026.3.x → v2026.5.7 upgrade where I was issuing several stop/start cycles in quick succession. Both times the orphan was a root-owned node process — the unit's ExecStartPre=/bin/bash -c 'fuser -k 18789/tcp ...' runs as the unit's User= (a non-root user) and silently fails to kill a root-owned orphan, so the next ExecStart keeps hitting EADDRINUSE.

The root-ownership of the orphan in my case came from running diagnostic commands via sudo while debugging the upgrade. So the failure mode is more general than just root orphans: any orphan with a different ownership than the unit's User= will not be reaped by the bundled ExecStartPre fuser -k.

Steps to reproduce

  1. Run OpenClaw gateway under a systemd unit similar to the bundled service (Type=simple, User=<non-root>, ExecStart=/usr/bin/node openclaw.mjs gateway, ExecStartPre=/bin/bash -c 'fuser -k 18789/tcp ...').
  2. From a root shell (e.g. inside sudo bash), run node openclaw.mjs gateway directly. This spawns a root-owned node dist/index.js gateway child.
  3. Kill the wrapper (Ctrl-C or kill <wrapper-pid>). The child often survives.
  4. systemctl stop openclaw.service && systemctl start openclaw.service.
  5. Observe: ExecStart exits with code 78 / EADDRINUSE. After 5 restart attempts, systemd marks the unit failed.
  6. Check ss -tlnp | grep 18789 — the port is held by the surviving root-owned node child, even though systemctl status openclaw.service reports failed.

Expected behavior

When the openclaw.mjs gateway wrapper exits (for any reason — SIGTERM, SIGKILL, crash), its spawned node dist/index.js gateway child should also terminate, so the gateway port is released and systemd can restart cleanly.

Actual behavior

The spawned child detaches from the wrapper and survives, holding the port. Subsequent restarts crash-loop.

Workaround

The most surgical fix at the systemd-unit layer is the + prefix on ExecStartPre/ExecStopPost, which causes those hooks to run as root regardless of User=. Combined with explicit cgroup directives:

-ExecStartPre=/bin/bash -c 'fuser -k 18789/tcp 2>/dev/null || true'
+ExecStartPre=+/bin/bash -c 'fuser -k 18789/tcp 18791/tcp 18792/tcp 2>/dev/null || true'
 ExecStart=/usr/bin/node openclaw.mjs gateway
+ExecStopPost=+/bin/bash -c 'fuser -k 18789/tcp 18791/tcp 18792/tcp 2>/dev/null || true'
 Restart=on-failure
+KillMode=control-group
+TimeoutStopSec=20

This reaps any port-holder regardless of ownership, so the EADDRINUSE crash-loop can no longer happen. Verified with a real stop/start cycle: port released cleanly, fresh PID came up bound in 4 seconds.

But this is a defensive patch — it doesn't address the underlying reason the child process detaches from the wrapper in the first place. The real fix should be in openclaw.mjs's child-process lifecycle so the spawned node child can't outlive its parent.

Environment

  • OpenClaw: 2026.5.7 (eeef486) — also reproduced on 2026.3.3
  • Node: v22.22.1
  • OS: Ubuntu 24.04 LTS, kernel 6.8.0-106-generic
  • Systemd: 255

Suggested upstream remedies

  1. In openclaw.mjs gateway, attach a parent-death signal to the spawned child (Linux: prctl(PR_SET_PDEATHSIG, SIGTERM) from a tiny helper). On Linux this guarantees the child dies when the wrapper does, regardless of how the wrapper exits.
  2. Or, document the +-prefix fuser -k workaround in the bundled systemd unit examples.
  3. Or, add an openclaw gateway stop command that idempotently kills any process owning the gateway port, and use it as ExecStop=.

Happy to test any of these against my install if useful.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When the openclaw.mjs gateway wrapper exits (for any reason — SIGTERM, SIGKILL, crash), its spawned node dist/index.js gateway child should also terminate, so the gateway port is released and systemd can restart cleanly.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Gateway wrapper's spawned child process can survive parent SIGTERM, blocking systemd restarts on EADDRINUSE