openclaw - ✅(Solved) Fix Internal restart via 'stop && start' permanently kills gateway on macOS [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71929Fetched 2026-04-27 05:37:18
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Timeline (top)
closed ×1commented ×1cross-referenced ×1

When OpenClaw internally triggers a gateway restart (e.g. after config changes like switching web-search provider), it executes:

openclaw gateway stop && sleep 2 && openclaw gateway start

On macOS with launchd, this sequence fails to bring the gateway back up, leaving the service permanently offline. Because OpenClaw runs inside the gateway, once the gateway dies it can no longer fix itself — this is a self-destruct mechanism.


Error Message

[restart-sentinel] Gateway restart restart ok (gateway.restart): continuation delivery failed: Error: restart continuation route unavailable

Root Cause

On macOS with launchd, this sequence fails to bring the gateway back up, leaving the service permanently offline. Because OpenClaw runs inside the gateway, once the gateway dies it can no longer fix itself — this is a self-destruct mechanism.

Fix Action

Fix / Workaround

Workaround Applied by User

PR fix notes

PR #72174: fix(macos): keep attach-only from stopping gateway launchd

Description (problem / solution / changelog)

Summary

  • keep macOS attach-only mode from disabling/uninstalling the Gateway LaunchAgent
  • make the --attach-only / --no-launchd launch path and Debug Settings toggle only persist the disable marker
  • add a focused regression test proving attach-only writes the marker without issuing daemon commands

Related reports / duplicate review

  • Related: #42530 (Mac app kills running gateway on launch despite disable-launchagent sentinel)
  • Reviewed related launchd/LaunchAgent PRs/issues before opening: #44, #20379, #64447, #43660, #48455, #24123, #51811, #56035, #71929.
  • I did not find an open PR that narrowly fixes this attach-only/UI path; the nearest exact report is #42530, which is closed/locked but matches the behavior this patch prevents.

Behavior change

Attach-only should mean “attach to an existing Gateway; do not manage launchd”. Previously both the CLI flag path and Debug Settings toggle could still call GatewayLaunchAgentManager.set(enabled: false, ...), which can stop/disable/uninstall the Gateway LaunchAgent and drop active sessions. This patch routes attach-only through a marker-only helper instead.

Security / secrets check

  • No new permissions/capabilities.
  • No secrets/tokens/credentials added.
  • Secret-pattern scan of added diff lines found no matches.
  • Full changed-file scan only matched pre-existing dummy test literals (" secret ", "pw") in GatewayLaunchAgentManagerTests.swift, not real credentials.

Testing

  • swift test --filter GatewayLaunchAgentManagerTests
  • swift test --filter GatewayProcessManagerTests
  • git diff --check
  • Manual controlled canary from the packet: patched app launched with --attach-only; Gateway PID stayed stable; no new Gateway SIGTERM during the canary window; app node reconnected.

Changed files

  • apps/macos/Sources/OpenClaw/DebugSettings.swift (modified, +0/-8)
  • apps/macos/Sources/OpenClaw/GatewayLaunchAgentManager.swift (modified, +41/-1)
  • apps/macos/Sources/OpenClaw/MenuBar.swift (modified, +1/-7)
  • apps/macos/Tests/OpenClawIPCTests/GatewayLaunchAgentManagerTests.swift (modified, +23/-0)

Code Example

openclaw gateway stop && sleep 2 && openclaw gateway start

---

[gateway] received SIGUSR1; restarting
[gateway] restart mode: full process restart (supervisor restart)

---

[restart-sentinel] Gateway restart restart ok (gateway.restart):
  continuation delivery failed: Error: restart continuation route unavailable

---

2026-04-25T22:12:50.189+08:00 [gateway] received SIGUSR1; restarting
2026-04-25T22:12:55.542+08:00 [gateway] restart mode: full process restart (supervisor restart)

---

2026-04-25T23:06:26.342+08:00 [exec] elevated command
  openclaw gateway stop 2>&1 && sleep 2 && openclaw gateway start 2>&1

2026-04-25T23:06:27.683+08:00 [gateway] signal SIGTERM received
2026-04-25T23:06:27.683+08:00 [gateway] received SIGTERM; shutting down
2026-04-25T23:06:27.769+08:00 [feishu] feishu[default]: abort signal received, stopping
2026-04-25T23:06:27.771+08:00 [ws] ws client closed manually

2026-04-25T23:06:31.804+08:00 restart scheduled, gateway will restart momentarily

---

$ openclaw gateway status
Service: LaunchAgent (not loaded)
...
Runtime: unknown (Could not find service "ai.openclaw.gateway" in domain for user gui: 501)
Connectivity probe: failed

---

Gateway LaunchAgent was installed but not loaded; re-bootstrapped launchd service.
RAW_BUFFERClick to expand / collapse

Bug Report: Internal restart via stop && start permanently kills gateway on macOS

Version: OpenClaw 2026.4.22 (00bd2cf) Platform: macOS 26.4.1 (arm64) · node 24.4.1 Severity: Critical — gateway self-destructs and never recovers


Summary

When OpenClaw internally triggers a gateway restart (e.g. after config changes like switching web-search provider), it executes:

openclaw gateway stop && sleep 2 && openclaw gateway start

On macOS with launchd, this sequence fails to bring the gateway back up, leaving the service permanently offline. Because OpenClaw runs inside the gateway, once the gateway dies it can no longer fix itself — this is a self-destruct mechanism.


Root Cause Analysis

1. stop && start is not atomic with launchd

openclaw gateway stop causes the process to exit cleanly. On macOS, launchd with KeepAlive=true does not restart a service that exits with code 0 — it considers it an intentional shutdown. The sleep 2 gap is too short for the old service to fully bootout, so the subsequent start (bootstrap) often silently fails or leaves the service in a "not loaded" state.

2. Contrast with openclaw gateway restart

The CLI restart command uses SIGUSR1 → "full process restart (supervisor restart)", which keeps the launchd service alive:

[gateway] received SIGUSR1; restarting
[gateway] restart mode: full process restart (supervisor restart)

This path works reliably because launchd never sees the process exit.

3. Restart continuation is also broken

Even when a restart succeeds via SIGUSR1, the restart-sentinel logs show:

[restart-sentinel] Gateway restart restart ok (gateway.restart):
  continuation delivery failed: Error: restart continuation route unavailable

This means interrupted sessions/tasks are lost after restart.


Evidence from Logs

Last successful internal restart (SIGUSR1 path)

2026-04-25T22:12:50.189+08:00 [gateway] received SIGUSR1; restarting
2026-04-25T22:12:55.542+08:00 [gateway] restart mode: full process restart (supervisor restart)

The failing stop && start sequence

2026-04-25T23:06:26.342+08:00 [exec] elevated command
  openclaw gateway stop 2>&1 && sleep 2 && openclaw gateway start 2>&1

2026-04-25T23:06:27.683+08:00 [gateway] signal SIGTERM received
2026-04-25T23:06:27.683+08:00 [gateway] received SIGTERM; shutting down
2026-04-25T23:06:27.769+08:00 [feishu] feishu[default]: abort signal received, stopping
2026-04-25T23:06:27.771+08:00 [ws] ws client closed manually

2026-04-25T23:06:31.804+08:00 restart scheduled, gateway will restart momentarily

After this point, no more gateway logs until manual intervention 8+ hours later.

Morning state: "not loaded"

$ openclaw gateway status
Service: LaunchAgent (not loaded)
...
Runtime: unknown (Could not find service "ai.openclaw.gateway" in domain for user gui: 501)
Connectivity probe: failed

Manual openclaw gateway start was required to recover:

Gateway LaunchAgent was installed but not loaded; re-bootstrapped launchd service.

Reproduction Steps

  1. Start gateway: openclaw gateway start
  2. Trigger any config change that causes an internal restart request (e.g. modify plugins.entries via an elevated exec tool that edits openclaw.json)
  3. Observe that OpenClaw schedules restart, but then executes stop && start
  4. Check openclaw gateway status — it shows "not loaded"

Expected Behavior

Gateway restart should be reliable and self-healing:

  • Internal tools should use openclaw gateway restart (SIGUSR1 / supervisor restart) instead of stop && start
  • If a full process restart is unavoidable, the restart command must verify the service is actually running after start, and retry if launchctl bootstrap fails
  • The restart sentinel should gracefully handle continuation delivery or at least surface the error to the user

Suggested Fixes

  1. Immediate: Change elevated exec restart command from stop && sleep 2 && start to restart (or at least stop && sleep 5 && start && openclaw gateway status with retry loop)
  2. Better: Implement a restart API endpoint on the gateway itself so internal tools can request restart via WebSocket (gateway.restart tool) instead of shelling out
  3. Defensive: After any start command, poll launchctl list / curl http://127.0.0.1:18789/ for up to 10s; if unreachable, retry start once more before giving up
  4. Continuation fix: Investigate why restart continuation route unavailable occurs and ensure in-flight tasks can resume after restart

Workaround Applied by User

Until this is fixed, user has deployed an external launchd watchdog (ai.openclaw.watchdog) that checks every 30s:

  • Is ai.openclaw.gateway loaded in launchd with state = running?
  • Does http://127.0.0.1:18789/ respond within 3s? If either check fails → openclaw gateway start.

This should not be necessary for a self-hosted gateway.

extent analysis

TL;DR

Change the elevated exec restart command from stop && sleep 2 && start to restart to ensure a reliable and self-healing gateway restart.

Guidance

  • Use the openclaw gateway restart command instead of stop && start for internal restart requests to leverage the SIGUSR1/supervisor restart mechanism.
  • Implement a retry loop after start to verify the service is running and handle potential launchctl bootstrap failures.
  • Consider adding a restart API endpoint on the gateway for internal tools to request restarts via WebSocket.
  • Poll launchctl list or curl http://127.0.0.1:18789/ after start to ensure the service is reachable.

Example

openclaw gateway restart

or

openclaw gateway stop && sleep 5 && openclaw gateway start && openclaw gateway status

with a retry loop.

Notes

The current workaround using an external launchd watchdog (ai.openclaw.watchdog) should not be necessary for a self-hosted gateway. The suggested fixes aim to provide a more robust and self-healing restart mechanism.

Recommendation

Apply the suggested fix by changing the elevated exec restart command to restart to ensure a reliable gateway restart. This approach leverages the existing SIGUSR1/supervisor restart mechanism, providing a more robust solution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING