openclaw - ✅(Solved) Fix Internal restart via 'stop && start' permanently kills gateway on macOS [1 pull requests, 1 comments, 2 participants]

ygc3817922006-sketch · 2026-04-26T03:36:21Z

[openclaw] When OpenClaw internally triggers a gateway restart e.g. after config changes like switching web-search provider , it executes: bash openclaw gatewa… When OpenClaw internally triggers a gateway restart (e.g. after config changes like switching web-search provider), it executes: ```bash openclaw gateway stop && sleep 2 && openclaw gateway start ``` On macOS with launchd, this sequence **fails to bring the gateway back up**, leaving the service permanently offline. Because OpenClaw runs inside the gateway, once the gateway dies it can no longer fix itself — this is a self-destruct mechanism. --- # PR #72174: fix(macos): keep attach-only from stopping gateway launchd - Repository: openclaw/openclaw - Author: DolencLuka - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/72174 ## Description (problem / solution / changelog) ## Summary - keep macOS attach-only mode from disabling/uninstalling the Gateway LaunchAgent - make the `--attach-only` / `--no-launchd` launch path and Debug Settings toggle only persist the disable marker - add a focused regression test proving attach-only writes the marker without issuing daemon commands ## Related reports / duplicate review - Related: #42530 (`Mac app kills running gateway on launch despite disable-launchagent sentinel`) - Reviewed related launchd/LaunchAgent PRs/issues before opening: #44, #20379, #64447, #43660, #48455, #24123, #51811, #56035, #71929. - I did not find an open PR that narrowly fixes this attach-only/UI path; the nearest exact report is #42530, which is closed/locked but matches the behavior this patch prevents. ## Behavior change Attach-only should mean “attach to an existing Gateway; do not manage launchd”. Previously both the CLI flag path and Debug Settings toggle could still call `GatewayLaunchAgentManager.set(enabled: false, ...)`, which can stop/disable/uninstall the Gateway LaunchAgent and drop active sessions. This patch routes attach-only through a marker-only helper instead. ## Security / secrets check - No new permissions/capabilities. - No secrets/tokens/credentials added. - Secret-pattern scan of added diff lines found no matches. - Full changed-file scan only matched pre-existing dummy test literals (`" secret "`, `"pw"`) in `GatewayLaunchAgentManagerTests.swift`, not real credentials. ## Testing - `swift test --filter GatewayLaunchAgentManagerTests` - `swift test --filter GatewayProcessManagerTests` - `git diff --check` - Manual controlled canary from the packet: patched app launched with `--attach-only`; Gateway PID stayed stable; no new Gateway SIGTERM during the canary window; app node reconnected. ## Changed files - `apps/macos/Sources/OpenClaw/DebugSettings.swift` (modified, +0/-8) - `apps/macos/Sources/OpenClaw/GatewayLaunchAgentManager.swift` (modified, +41/-1) - `apps/macos/Sources/OpenClaw/MenuBar.swift` (modified, +1/-7) - `apps/macos/Tests/OpenClawIPCTests/GatewayLaunchAgentManagerTests.swift` (modified, +23/-0) ## Fix / Workaround ## Workaround Applied by User # Bug Report: Internal restart via `stop && start` permanently kills gateway on macOS **Version:** OpenClaw 2026.4.22 (00bd2cf) **Platform:** macOS 26.4.1 (arm64) · node 24.4.1 **Severity:** Critical — gateway self-destructs and never recovers --- ## Summary When OpenClaw internally triggers a gateway restart (e.g. after config changes like switching web-search provider), it executes: ```bash openclaw gateway stop && sleep 2 && openclaw gateway start ``` On macOS with launchd, this sequence **fails to bring the gateway back up**, leaving the service permanently offline. Because OpenClaw runs inside the gateway, once the gateway dies it can no longer fix itself — this is a self-destruct mechanism. --- ## Root Cause Analysis ### 1. `stop && start` is not atomic with launchd `openclaw gateway stop` causes the process to exit cleanly. On macOS, `launchd` with `KeepAlive=true` **does not restart a service that exits with code 0** — it considers it an intentional shutdown. The `sleep 2` gap is too short for the old service to fully `bootout`, so the subsequent `start` (`bootstrap`) often silently fails or leaves the service in a "not loaded" state. ### 2. Contrast with `openclaw gateway restart` The CLI `restart` command uses `SIGUSR1` → "full process restart (supervisor restart)", which keeps the launchd service alive: ```log [gateway] received SIGUSR1; restarting [gateway] restart mode: full process restart (supervisor restart) ``` This path works reliably because launchd never sees the process exit. ### 3. Restart continuation is also broken Even when a restart succeeds via `SIGUSR1`, the `restart-sentinel` logs show: ```log [restart-sentinel] Gateway restart restart ok (gateway.restart): continuation delivery failed: Error: restart continuation route unavailable ``` This means interrupted sessions/tasks are lost after restart. --- ## Evidence from Logs ### Last successful internal restart (SIGUSR1 path) ```log 2026-04-25T22:12:50.189+

openclaw2026-04-26 03:36:21

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#71929•Fetched 2026-04-27 05:37:18

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ygc3817922006-sketch

Participants

steipete

ygc3817922006-sketch

Timeline (top)

closed ×1commented ×1cross-referenced ×1

When OpenClaw internally triggers a gateway restart (e.g. after config changes like switching web-search provider), it executes:

openclaw gateway stop && sleep 2 && openclaw gateway start

On macOS with launchd, this sequence fails to bring the gateway back up, leaving the service permanently offline. Because OpenClaw runs inside the gateway, once the gateway dies it can no longer fix itself — this is a self-destruct mechanism.

Error Message

[restart-sentinel] Gateway restart restart ok (gateway.restart): continuation delivery failed: Error: restart continuation route unavailable

Root Cause

Fix Action

Fix / Workaround

Workaround Applied by User

PR fix notes

PR #72174: fix(macos): keep attach-only from stopping gateway launchd

Repository: openclaw/openclaw
Author: DolencLuka
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/72174

Description (problem / solution / changelog)

Summary

keep macOS attach-only mode from disabling/uninstalling the Gateway LaunchAgent
make the --attach-only / --no-launchd launch path and Debug Settings toggle only persist the disable marker
add a focused regression test proving attach-only writes the marker without issuing daemon commands

Related reports / duplicate review

Related: #42530 (Mac app kills running gateway on launch despite disable-launchagent sentinel)
Reviewed related launchd/LaunchAgent PRs/issues before opening: #44, #20379, #64447, #43660, #48455, #24123, #51811, #56035, #71929.
I did not find an open PR that narrowly fixes this attach-only/UI path; the nearest exact report is #42530, which is closed/locked but matches the behavior this patch prevents.

Behavior change

Attach-only should mean “attach to an existing Gateway; do not manage launchd”. Previously both the CLI flag path and Debug Settings toggle could still call GatewayLaunchAgentManager.set(enabled: false, ...), which can stop/disable/uninstall the Gateway LaunchAgent and drop active sessions. This patch routes attach-only through a marker-only helper instead.

Security / secrets check

No new permissions/capabilities.
No secrets/tokens/credentials added.
Secret-pattern scan of added diff lines found no matches.
Full changed-file scan only matched pre-existing dummy test literals (" secret ", "pw") in GatewayLaunchAgentManagerTests.swift, not real credentials.

Testing

swift test --filter GatewayLaunchAgentManagerTests
swift test --filter GatewayProcessManagerTests
git diff --check
Manual controlled canary from the packet: patched app launched with --attach-only; Gateway PID stayed stable; no new Gateway SIGTERM during the canary window; app node reconnected.

Changed files

apps/macos/Sources/OpenClaw/DebugSettings.swift (modified, +0/-8)
apps/macos/Sources/OpenClaw/GatewayLaunchAgentManager.swift (modified, +41/-1)
apps/macos/Sources/OpenClaw/MenuBar.swift (modified, +1/-7)
apps/macos/Tests/OpenClawIPCTests/GatewayLaunchAgentManagerTests.swift (modified, +23/-0)

Code Example

openclaw gateway stop && sleep 2 && openclaw gateway start

---

[gateway] received SIGUSR1; restarting
[gateway] restart mode: full process restart (supervisor restart)

---

[restart-sentinel] Gateway restart restart ok (gateway.restart):
  continuation delivery failed: Error: restart continuation route unavailable

---

2026-04-25T22:12:50.189+08:00 [gateway] received SIGUSR1; restarting
2026-04-25T22:12:55.542+08:00 [gateway] restart mode: full process restart (supervisor restart)

---

2026-04-25T23:06:26.342+08:00 [exec] elevated command
  openclaw gateway stop 2>&1 && sleep 2 && openclaw gateway start 2>&1

2026-04-25T23:06:27.683+08:00 [gateway] signal SIGTERM received
2026-04-25T23:06:27.683+08:00 [gateway] received SIGTERM; shutting down
2026-04-25T23:06:27.769+08:00 [feishu] feishu[default]: abort signal received, stopping
2026-04-25T23:06:27.771+08:00 [ws] ws client closed manually

2026-04-25T23:06:31.804+08:00 restart scheduled, gateway will restart momentarily

---

$ openclaw gateway status
Service: LaunchAgent (not loaded)
...
Runtime: unknown (Could not find service "ai.openclaw.gateway" in domain for user gui: 501)
Connectivity probe: failed

---

Gateway LaunchAgent was installed but not loaded; re-bootstrapped launchd service.

RAW_BUFFERClick to expand / collapse

Bug Report: Internal restart via `stop && start` permanently kills gateway on macOS

Version: OpenClaw 2026.4.22 (00bd2cf) Platform: macOS 26.4.1 (arm64) · node 24.4.1 Severity: Critical — gateway self-destructs and never recovers

Summary

When OpenClaw internally triggers a gateway restart (e.g. after config changes like switching web-search provider), it executes:

openclaw gateway stop && sleep 2 && openclaw gateway start

Root Cause Analysis

1. `stop && start` is not atomic with launchd

openclaw gateway stop causes the process to exit cleanly. On macOS, launchd with KeepAlive=true does not restart a service that exits with code 0 — it considers it an intentional shutdown. The sleep 2 gap is too short for the old service to fully bootout, so the subsequent start (bootstrap) often silently fails or leaves the service in a "not loaded" state.

2. Contrast with `openclaw gateway restart`

The CLI restart command uses SIGUSR1 → "full process restart (supervisor restart)", which keeps the launchd service alive:

[gateway] received SIGUSR1; restarting
[gateway] restart mode: full process restart (supervisor restart)

This path works reliably because launchd never sees the process exit.

3. Restart continuation is also broken

Even when a restart succeeds via SIGUSR1, the restart-sentinel logs show:

[restart-sentinel] Gateway restart restart ok (gateway.restart):
  continuation delivery failed: Error: restart continuation route unavailable

This means interrupted sessions/tasks are lost after restart.

Evidence from Logs

Last successful internal restart (SIGUSR1 path)

2026-04-25T22:12:50.189+08:00 [gateway] received SIGUSR1; restarting
2026-04-25T22:12:55.542+08:00 [gateway] restart mode: full process restart (supervisor restart)

The failing `stop && start` sequence

2026-04-25T23:06:26.342+08:00 [exec] elevated command
  openclaw gateway stop 2>&1 && sleep 2 && openclaw gateway start 2>&1

2026-04-25T23:06:27.683+08:00 [gateway] signal SIGTERM received
2026-04-25T23:06:27.683+08:00 [gateway] received SIGTERM; shutting down
2026-04-25T23:06:27.769+08:00 [feishu] feishu[default]: abort signal received, stopping
2026-04-25T23:06:27.771+08:00 [ws] ws client closed manually

2026-04-25T23:06:31.804+08:00 restart scheduled, gateway will restart momentarily

After this point, no more gateway logs until manual intervention 8+ hours later.

Morning state: "not loaded"

$ openclaw gateway status
Service: LaunchAgent (not loaded)
...
Runtime: unknown (Could not find service "ai.openclaw.gateway" in domain for user gui: 501)
Connectivity probe: failed

Manual openclaw gateway start was required to recover:

Gateway LaunchAgent was installed but not loaded; re-bootstrapped launchd service.

Reproduction Steps

Start gateway: openclaw gateway start
Trigger any config change that causes an internal restart request (e.g. modify plugins.entries via an elevated exec tool that edits openclaw.json)
Observe that OpenClaw schedules restart, but then executes stop && start
Check openclaw gateway status — it shows "not loaded"

Expected Behavior

Gateway restart should be reliable and self-healing:

Internal tools should use openclaw gateway restart (SIGUSR1 / supervisor restart) instead of stop && start
If a full process restart is unavoidable, the restart command must verify the service is actually running after start, and retry if launchctl bootstrap fails
The restart sentinel should gracefully handle continuation delivery or at least surface the error to the user

Suggested Fixes

Immediate: Change elevated exec restart command from stop && sleep 2 && start to restart (or at least stop && sleep 5 && start && openclaw gateway status with retry loop)
Better: Implement a restart API endpoint on the gateway itself so internal tools can request restart via WebSocket (gateway.restart tool) instead of shelling out
Defensive: After any start command, poll launchctl list / curl http://127.0.0.1:18789/ for up to 10s; if unreachable, retry start once more before giving up
Continuation fix: Investigate why restart continuation route unavailable occurs and ensure in-flight tasks can resume after restart

Workaround Applied by User

Until this is fixed, user has deployed an external launchd watchdog (ai.openclaw.watchdog) that checks every 30s:

Is ai.openclaw.gateway loaded in launchd with state = running?
Does http://127.0.0.1:18789/ respond within 3s? If either check fails → openclaw gateway start.

This should not be necessary for a self-hosted gateway.

extent analysis

TL;DR

Change the elevated exec restart command from stop && sleep 2 && start to restart to ensure a reliable and self-healing gateway restart.

Guidance

Use the openclaw gateway restart command instead of stop && start for internal restart requests to leverage the SIGUSR1/supervisor restart mechanism.
Implement a retry loop after start to verify the service is running and handle potential launchctl bootstrap failures.
Consider adding a restart API endpoint on the gateway for internal tools to request restarts via WebSocket.
Poll launchctl list or curl http://127.0.0.1:18789/ after start to ensure the service is reachable.

Example

openclaw gateway restart

openclaw gateway stop && sleep 5 && openclaw gateway start && openclaw gateway status

with a retry loop.

Notes

The current workaround using an external launchd watchdog (ai.openclaw.watchdog) should not be necessary for a self-hosted gateway. The suggested fixes aim to provide a more robust and self-healing restart mechanism.

Recommendation

Apply the suggested fix by changing the elevated exec restart command to restart to ensure a reliable gateway restart. This approach leverages the existing SIGUSR1/supervisor restart mechanism, providing a more robust solution.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #parallel task #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Internal restart via 'stop && start' permanently kills gateway on macOS [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround Applied by User

PR fix notes

PR #72174: fix(macos): keep attach-only from stopping gateway launchd

Description (problem / solution / changelog)

Summary

Related reports / duplicate review

Behavior change

Security / secrets check

Testing

Changed files

Code Example

Bug Report: Internal restart via stop && start permanently kills gateway on macOS

Summary

Root Cause Analysis

1. stop && start is not atomic with launchd

2. Contrast with openclaw gateway restart

3. Restart continuation is also broken

Evidence from Logs

Last successful internal restart (SIGUSR1 path)

The failing stop && start sequence

Morning state: "not loaded"

Reproduction Steps

Expected Behavior

Suggested Fixes

Workaround Applied by User

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Bug Report: Internal restart via `stop && start` permanently kills gateway on macOS

1. `stop && start` is not atomic with launchd

2. Contrast with `openclaw gateway restart`

The failing `stop && start` sequence