openclaw - ✅(Solved) Fix Gateway left dead after update.run / SIGUSR1 supervisor restart — systemd sees clean exit, does not relaunch [1 pull requests, 1 comments, 2 participants]

openclaw2026-04-22 21:46:35

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#70354•Fetched 2026-04-23 07:25:46

View on GitHub

Comments

Participants

Timeline

Reactions

Author

TomCruiseTorpedo

Participants

Kailigithub

TomCruiseTorpedo

Timeline (top)

commented ×1cross-referenced ×1

When a user triggers an update from the Control UI, the gateway receives SIGUSR1 and performs an in-process supervisor restart. If the supervisor's re-exec fails, the process exits with code 0. A systemd user service with the typical Restart=on-failure policy treats exit 0 as "clean", does not relaunch, and leaves the gateway dead. The user sees 502 Bad Gateway on the tailnet-served dashboard URL and has no in-band way to recover — the Control UI itself can't fix it because the Control UI needs the gateway to talk to.

Root Cause

Fix Action

Fix / Workaround

Workaround I deployed

PR fix notes

PR #70466: fix(gateway): exit non-zero on supervised restart so systemd Restart=on-failure recovers

Repository: openclaw/openclaw
Author: Kailigithub
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/70466

Description (problem / solution / changelog)

After update.run / SIGUSR1 supervisor restart, the gateway exited with code 0. When running under systemd with Restart=on-failure (the default), a clean exit does not trigger a restart, leaving the gateway dead until manual intervention.

Exit code 1 instead so Restart=on-failure restarts the service. KeepAlive on launchd ignores exit code and is unaffected. The spawned (non-supervised) case contin to exit 0 as the detached child takes over as the new gateway.

Closes #70354

Changed files

src/cli/gateway-cli/run-loop.test.ts (modified, +2/-2)
src/cli/gateway-cli/run-loop.ts (modified, +11/-11)

Code Example

Apr 22 00:43:24 node[1035]: [gateway] update.run completed ... status=ok
Apr 22 00:43:26 node[1035]: [gateway] signal SIGUSR1 received
Apr 22 00:43:26 node[1035]: [gateway] received SIGUSR1; restarting
Apr 22 00:43:26 node[1035]: [gateway] restart mode: full process restart (supervisor restart)
Apr 22 00:43:27 systemd[883]: openclaw-gateway.service: Consumed 3min 20.664s CPU time.
# ← no further journal entries from _UID=openclaw until manual intervention 19h later

---

[Service]
Restart=always
RestartSec=10s

[Unit]
StartLimitIntervalSec=300
StartLimitBurst=20

RAW_BUFFERClick to expand / collapse

Summary

Impact

Hit deterministically by two separate nodes on 2026-04-22 when I pressed Update in the dashboard. Both stayed down for ~19 hours until noticed. For less-technical users who run OpenClaw behind Tailscale + HTTPS serve, this is a hard lock-out: the only recovery path requires SSH and knowledge of systemctl --user restart. Friends I was planning to hand this setup to would absolutely have been stuck here.

Reproduction

Run OpenClaw gateway as a systemd user service with Restart=on-failure (the default if you follow the standard install path).
Open the Control UI, click Update.
Observe: dashboard returns 502. systemctl --user status openclaw-gateway shows inactive (dead) with status=0/SUCCESS (clean exit). Process does not come back.

Journal evidence (redacted, from my node)

Apr 22 00:43:24 node[1035]: [gateway] update.run completed ... status=ok
Apr 22 00:43:26 node[1035]: [gateway] signal SIGUSR1 received
Apr 22 00:43:26 node[1035]: [gateway] received SIGUSR1; restarting
Apr 22 00:43:26 node[1035]: [gateway] restart mode: full process restart (supervisor restart)
Apr 22 00:43:27 systemd[883]: openclaw-gateway.service: Consumed 3min 20.664s CPU time.
# ← no further journal entries from _UID=openclaw until manual intervention 19h later

The Consumed ... CPU time line is systemd's last word on the unit. No Started openclaw-gateway.service. follow-up.

Expected

The gateway should come back after SIGUSR1, either by the supervisor succeeding at re-exec, or by systemd relaunching the unit. Neither happened here.

Possible fixes (pick one or both)

Exit non-zero when re-exec fails. If the supervisor cannot hand off cleanly, abort with exit 1 so systemd's default Restart=on-failure catches it. Current behavior exits 0 even when the post-update process is gone, which hides the failure from systemd.
Recommend Restart=always in the packaged openclaw-gateway.service template (not Restart=on-failure). Makes the exit code irrelevant; systemd always relaunches. This is the pragmatic fix and it's what I've now baked into my deploy kit.
Optional but nice: have the Control UI monitor the gateway's /health post-restart and display a "gateway recovering…" spinner + "click here if stuck for > 2 minutes" escape link. Right now the dashboard just hangs or shows 502 with no context.

Workaround I deployed

In ~/.config/systemd/user/openclaw-gateway.service.d/20-restart.conf:

[Service]
Restart=always
RestartSec=10s

[Unit]
StartLimitIntervalSec=300
StartLimitBurst=20

Plus a user-level 60-second /health watchdog timer as belt-and-suspenders for the StartLimitBurst edge case. Both reproducibly recover the gateway from a forced systemctl --user kill --signal=SIGUSR1 openclaw-gateway within 25–50 seconds on my two nodes.

Version

OpenClaw 2026.4.15 → updating to 2026.4.20 triggered this. Reproduced on two independent nodes (Oracle Cloud Ubuntu 24.04 aarch64, Node 24.15.0 via nvm).

extent analysis

TL;DR

To fix the issue where the OpenClaw gateway does not restart after receiving a SIGUSR1 signal, update the openclaw-gateway.service template to use Restart=always instead of the default Restart=on-failure.

Guidance

The root cause of the issue is that the supervisor's re-exec fails and exits with code 0, which is treated as a clean exit by systemd, preventing it from restarting the service.
To verify the fix, trigger an update from the Control UI and check if the gateway restarts successfully by running systemctl --user status openclaw-gateway.
Consider implementing a watchdog timer to monitor the gateway's /health endpoint and display a recovery message to the user if the gateway is stuck.
Update the openclaw-gateway.service template to include Restart=always and RestartSec=10s to ensure the service restarts after a failure.

Example

[Service]
Restart=always
RestartSec=10s

[Unit]
StartLimitIntervalSec=300
StartLimitBurst=20

This configuration can be added to a drop-in file (e.g., ~/.config/systemd/user/openclaw-gateway.service.d/20-restart.conf) to override the default service settings.

Notes

The provided workaround using Restart=always and a watchdog timer has been successfully tested on two independent nodes and can be used as a reliable fix for this issue.

Recommendation

Apply the workaround by updating the openclaw-gateway.service template to use Restart=always, as it provides a reliable and pragmatic fix for the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#request error #file not found #serialization error #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - ✅(Solved) Fix Gateway left dead after update.run / SIGUSR1 supervisor restart — systemd sees clean exit, does not relaunch [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workaround I deployed

PR fix notes

PR #70466: fix(gateway): exit non-zero on supervised restart so systemd Restart=on-failure recovers

Description (problem / solution / changelog)

Changed files

Code Example

Summary

Impact

Reproduction

Journal evidence (redacted, from my node)

Expected

Possible fixes (pick one or both)

Workaround I deployed

Version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - ✅(Solved) Fix Gateway left dead after update.run / SIGUSR1 supervisor restart — systemd sees clean exit, does not relaunch [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workaround I deployed

PR fix notes

PR #70466: fix(gateway): exit non-zero on supervised restart so systemd Restart=on-failure recovers

Description (problem / solution / changelog)

Changed files

Code Example

Summary

Impact

Reproduction

Journal evidence (redacted, from my node)

Expected

Possible fixes (pick one or both)

Workaround I deployed

Version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING