openclaw - 💡(How to fix) Fix Gateway launchd agent gets unloaded during self-update and never re-bootstrapped (macOS)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

After an openclaw self-update on macOS, ~/Library/LaunchAgents/ai.openclaw.gateway.plist gets rewritten with a new Comment (version bump) but the launchd service ends up not registered in the user domain — so once the in-flight gateway process exits, KeepAlive=true has nothing to respawn and the gateway stays down indefinitely. No alert, no auto-recovery; users only notice when their integrations (Telegram, automation tools) silently stop responding.

Root Cause

So the in-process [reload] manager tried to apply config changes (plugins.installs.codex, commands.ownerAllowFrom), kept getting deferred because an squad-monitor cron task was always running, and at some point the process exited (or was killed) — and then nothing brought it back, because launchd no longer had the registration.

Fix Action

Fix / Workaround

Manual recovery (workaround)

Suggested mitigations (regardless of fix choice)

Code Example

<key>RunAtLoad</key><true/>
<key>KeepAlive</key><true/>
<key>ThrottleInterval</key><integer>1</integer>

---

$ launchctl list ai.openclaw.gateway
Could not find service "ai.openclaw.gateway" in domain for port

$ curl -sf http://127.0.0.1:18789/health
(no response)

$ ps aux | grep -E "openclaw.*node|node.*openclaw" | grep -v grep
(nothing)

---

16:59:30 [reload] config change requires gateway restart (plugins.installs.codex) — deferring until 4 operation(s), 1 reply(ies), 2 embedded run(s), 1 background task run(s) complete
16:59:30 [reload] restart blocked by active background task run(s): … runtime=cron label=squad-monitor
17:01:13 [gateway] restart blocked by active background task run(s): … runtime=cron label=squad-monitor
17:01:19 [telegram] [diag] isolated polling ingress started …
17:02:29 [reload] config change requires gateway restart (commands.ownerAllowFrom) — deferring …
17:02:36 [plugins] plugins.allow is empty; discovered non-bundled plugins may auto-load: codex …
17:02:39 [telegram] [diag] isolated polling ingress started …
17:02:50 [plugins] plugins.allow is empty; discovered non-bundled plugins may auto-load …
<silence until manual recovery at 17:09:34>

---

launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/ai.openclaw.gateway.plist
RAW_BUFFERClick to expand / collapse

Summary

After an openclaw self-update on macOS, ~/Library/LaunchAgents/ai.openclaw.gateway.plist gets rewritten with a new Comment (version bump) but the launchd service ends up not registered in the user domain — so once the in-flight gateway process exits, KeepAlive=true has nothing to respawn and the gateway stays down indefinitely. No alert, no auto-recovery; users only notice when their integrations (Telegram, automation tools) silently stop responding.

Environment

  • macOS, user-domain LaunchAgent (no daemon, no root)
  • openclaw 2026.5.20 (installed via /opt/homebrew/lib/node_modules/openclaw)
  • node@22 (Homebrew)
  • Label: ai.openclaw.gateway
  • Plist path: ~/Library/LaunchAgents/ai.openclaw.gateway.plist

The plist correctly contains:

<key>RunAtLoad</key><true/>
<key>KeepAlive</key><true/>
<key>ThrottleInterval</key><integer>1</integer>

…so launchd would respawn on its own if the service were still bootstrapped.

Symptom

$ launchctl list ai.openclaw.gateway
Could not find service "ai.openclaw.gateway" in domain for port

$ curl -sf http://127.0.0.1:18789/health
(no response)

$ ps aux | grep -E "openclaw.*node|node.*openclaw" | grep -v grep
(nothing)

~/Library/LaunchAgents/ai.openclaw.gateway.plist exists and is valid — its mtime matches the moment the self-update ran. So the file is there, the service is just not loaded.

Timeline from one real incident

Plist mtime (= updater write time): 2026-05-21 17:01:05 PDT (Comment field now reads OpenClaw Gateway (v2026.5.20)).

From ~/.openclaw/logs/gateway.err.log around that window:

16:59:30 [reload] config change requires gateway restart (plugins.installs.codex) — deferring until 4 operation(s), 1 reply(ies), 2 embedded run(s), 1 background task run(s) complete
16:59:30 [reload] restart blocked by active background task run(s): … runtime=cron label=squad-monitor
17:01:13 [gateway] restart blocked by active background task run(s): … runtime=cron label=squad-monitor
17:01:19 [telegram] [diag] isolated polling ingress started …
17:02:29 [reload] config change requires gateway restart (commands.ownerAllowFrom) — deferring …
17:02:36 [plugins] plugins.allow is empty; discovered non-bundled plugins may auto-load: codex …
17:02:39 [telegram] [diag] isolated polling ingress started …
17:02:50 [plugins] plugins.allow is empty; discovered non-bundled plugins may auto-load …
<silence until manual recovery at 17:09:34>

So the in-process [reload] manager tried to apply config changes (plugins.installs.codex, commands.ownerAllowFrom), kept getting deferred because an squad-monitor cron task was always running, and at some point the process exited (or was killed) — and then nothing brought it back, because launchd no longer had the registration.

The 7-minute gap (17:02:50 → 17:09:34) is the user-visible outage; the 17:09:34 line is from my manual recovery (next section).

Manual recovery (workaround)

launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/ai.openclaw.gateway.plist

immediately brings the service back. PID populates, health endpoint returns {"ok":true,"status":"live"}. (launchctl kickstart -k gui/$UID/ai.openclaw.gateway won't work in this state because the service isn't bootstrapped — kickstart requires a registered service.)

Hypothesis on root cause

The self-updater path appears to:

  1. Write the new plist to disk (✅ — we see updated mtime + Comment).
  2. Tear down the running process (likely via launchctl bootout so the new ProgramArguments / env take effect on next start).
  3. Start a replacement — but this final step doesn't reliably re-bootstrap from the new plist. If the in-process [reload] deferral races with the updater's bootout, the gateway exits before bootstrap is reissued, and launchd is left empty.

Two contributing factors visible in the logs:

  • Reload deferral on long-lived crons — the gateway's internal hot-reload waits for "background task runs" to complete, and squad-monitor is on a recurring 10-minute cadence, so the quiet window for hot-reload is often never reached. The updater may give up on graceful and force a bootout.
  • Updater doesn't verify post-update state — there's no launchctl print gui/$UID/ai.openclaw.gateway check after the rewrite, so a missing registration goes unnoticed.

Expected behavior

After any successful self-update, the gateway must be both bootstrapped in the current launchd domain and running. The updater should:

  1. Write the new plist atomically.
  2. launchctl bootout the old registration (if present).
  3. launchctl bootstrap gui/$UID <plist> the new one.
  4. Verify with launchctl print gui/$UID/ai.openclaw.gateway (or at least launchctl list <label>) that the PID is populated; retry/log loudly if not.
  5. Probe http://127.0.0.1:<port>/health for a final readiness check before declaring the update done.

If the hot-reload pathway is preferred to avoid disturbing background tasks, it should have a timeout — if deferral has been pending for, say, > 60s, fall through to the bootout/bootstrap path rather than indefinitely waiting for squad-monitor to be idle.

Suggested mitigations (regardless of fix choice)

  • Emit a gateway exit log line with reason whenever the process is about to terminate, including whether it was an internal reload, bootout, or unexpected crash. The current gap in gateway.err.log (silence from 17:02:50 → manual recovery) makes incidents hard to debug — there should always be a final line.
  • Surface a Telegram/notification alert if /health is unreachable for more than N seconds. Today the outage is silent until the user notices their automation has gone quiet.

Repro outline

Hard to reproduce on demand because it depends on the timing race between self-update and the in-process reload deferral, but you can probably trigger it by:

  1. Run a long-lived cron task that the gateway's [reload] manager defers on (anything that's perpetually "in flight"; squad-monitor on a tight cadence works).
  2. Trigger any config change that the reload manager classifies as requires gateway restart (e.g. commands.ownerAllowFrom, plugins.installs.codex).
  3. Bump the npm package while the deferral is active.
  4. Observe launchctl list ai.openclaw.gateway after ~1–2 minutes — registration gone.

Happy to attach more log around any specific reload event if useful.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After any successful self-update, the gateway must be both bootstrapped in the current launchd domain and running. The updater should:

  1. Write the new plist atomically.
  2. launchctl bootout the old registration (if present).
  3. launchctl bootstrap gui/$UID <plist> the new one.
  4. Verify with launchctl print gui/$UID/ai.openclaw.gateway (or at least launchctl list <label>) that the PID is populated; retry/log loudly if not.
  5. Probe http://127.0.0.1:<port>/health for a final readiness check before declaring the update done.

If the hot-reload pathway is preferred to avoid disturbing background tasks, it should have a timeout — if deferral has been pending for, say, > 60s, fall through to the bootout/bootstrap path rather than indefinitely waiting for squad-monitor to be idle.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING