openclaw - ✅(Solved) Fix Gateway silently dies after auto-update: launchd removes service due to rapid restart cycle [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#54861Fetched 2026-04-08 01:35:05
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
cross-referenced ×5

After auto-updating to v2026.3.24, the gateway was silently killed by macOS launchd ~3 hours later and never restarted, despite KeepAlive: true in the LaunchAgent plist. The service was dead for 4+ hours with zero alerts until manual recovery.

Root Cause

The v2026.3.24 auto-update + restart sentinel + Telegram polling stall recovery triggered 3 rapid restarts in 4 minutes. With ThrottleInterval: 1 in the plist, macOS launchd classified the service as "inefficient" (crash-loop heuristic).

~3 hours later, launchd proactively killed the gateway process and removed the service from its job table entirely. Once removed, KeepAlive: true is irrelevant — the service definition no longer exists. The gateway stayed dead until manual re-registration.

Fix Action

Workaround

After the gateway dies, run from Terminal:

openclaw gateway install   # rewrites plist + bootstraps service

PR fix notes

PR #58070: daemon: set macOS LaunchAgent ProcessType and WorkingDirectory

Description (problem / solution / changelog)

Summary

  • set generated macOS LaunchAgent plists to ProcessType=Interactive
  • always emit a WorkingDirectory, defaulting to / when no explicit directory is provided
  • cover the new plist defaults in launchd install tests

Testing

  • pnpm exec vitest run src/daemon/launchd.test.ts
  • pnpm lint

Closes #58061 Related to #54861

Changed files

  • src/daemon/launchd-plist.ts (modified, +6/-4)
  • src/daemon/launchd.test.ts (modified, +21/-0)

Code Example

service inactive: ai.openclaw.gateway
removing service: ai.openclaw.gateway

---

openclaw gateway install   # rewrites plist + bootstraps service
RAW_BUFFERClick to expand / collapse

Summary

After auto-updating to v2026.3.24, the gateway was silently killed by macOS launchd ~3 hours later and never restarted, despite KeepAlive: true in the LaunchAgent plist. The service was dead for 4+ hours with zero alerts until manual recovery.

Environment

  • macOS 26.3.1 (arm64), Node 25.8.1
  • OpenClaw v2026.3.23 → v2026.3.24 (auto-update via openclaw doctor --fix)
  • LaunchAgent plist: KeepAlive: true, ThrottleInterval: 1

Reproduction Timeline (March 25, 2026 — all times CT)

Phase 1: Auto-update triggers rapid restart chain

TimeEvent
15:04:24Auto-update from v2026.3.23 → v2026.3.24. Gateway receives SIGTERM via launchctl kickstart -k.
15:04:33Gateway restarts (9s later).
15:06:36Telegram polling stall detected (no getUpdates for 119.87s), forces restart.
15:07:16Gateway receives SIGTERM again.
15:07:21Second SIGTERM during shutdown, ignored.
15:08:17SIGUSR1 from restart sentinel (new v2026.3.24 feature). Full process restart.
15:08:21Gateway restarts. Runs stably for ~3 hours.

3 restarts in 4 minutes.

Phase 2: launchd kills the service

TimeEvent
18:04:47Gateway still alive, serving WS requests (channels.status, doctor.memory.status).
18:07:17Gateway receives SIGTERM. Clean shutdown logged.

macOS system log at 18:07:17:

service inactive: ai.openclaw.gateway
removing service: ai.openclaw.gateway

launchd did not just stop the process — it unregistered the entire service. No Setting service to enabled follows (compare with the 15:04 cycle where re-registration happens immediately).

Phase 3: 4-hour silent outage

TimeEvent
18:25 – 21:56Cron jobs attempt to connect every ~30 min, all fail with gateway closed (1006 abnormal closure).
22:16:13Manual recovery via Terminal (openclaw gateway install rewrites plist + launchctl bootstrap).

launchctl print gui/502/ai.openclaw.gateway after recovery shows immediate reason = inefficient and runs = 3.

Root Cause

The v2026.3.24 auto-update + restart sentinel + Telegram polling stall recovery triggered 3 rapid restarts in 4 minutes. With ThrottleInterval: 1 in the plist, macOS launchd classified the service as "inefficient" (crash-loop heuristic).

~3 hours later, launchd proactively killed the gateway process and removed the service from its job table entirely. Once removed, KeepAlive: true is irrelevant — the service definition no longer exists. The gateway stayed dead until manual re-registration.

Evidence

  • launchd system log confirms removing service at 18:07:17 with no subsequent re-registration until 22:16:13.
  • launchctl print shows immediate reason = inefficient — macOS's daemon throttle classification.
  • Gateway logs show clean SIGTERM shutdown (not a crash), confirming the kill came from launchd.
  • No crash-looping between 18:07 and 22:16 — zero gateway startup attempts. The service was simply gone from launchd's job table.
  • Mac was awake throughout (caffeinate running 62+ hours).

Suggested Fix

  1. Raise ThrottleInterval to 10–15 seconds in the plist template. ThrottleInterval: 1 tells launchd rapid restarts are expected, but macOS interprets multiple restarts within the throttle window as a crash loop and penalizes the service.

  2. Add ProcessType: Interactive to the plist. This changes launchd's resource classification heuristic and raises the threshold for "inefficient" termination.

  3. Consolidate the update restart path. The current v2026.3.24 update triggers up to 3 separate restarts (kickstart + Telegram polling stall + restart sentinel SIGUSR1). These should be coalesced into a single restart to avoid tripping launchd's throttle.

  4. Add a self-check. If the gateway detects it was removed from launchd (e.g., periodic launchctl print self-check), it could re-register itself before dying.

Impact

  • 4+ hours of complete silence — no cron jobs, no Telegram messages, no heartbeats.
  • 9 cron jobs left in zombie "running" state after recovery, requiring manual disable/enable to clear.
  • No alert was sent before the outage (the gateway was killed, so it couldn't alert).

Workaround

After the gateway dies, run from Terminal:

openclaw gateway install   # rewrites plist + bootstraps service

extent analysis

Fix Plan

To resolve the issue, follow these steps:

  1. Update the LaunchAgent plist:
    • Raise ThrottleInterval to 10-15 seconds.
    • Add ProcessType: Interactive.

Example plist changes:

<key>ThrottleInterval</key>
<integer>10</integer>
<key>ProcessType</key>
<string>Interactive</string>
  1. Consolidate the update restart path:
    • Modify the update script to trigger a single restart instead of multiple restarts.

Example code snippet ( Node.js ):

// Consolidate restarts into a single restart
if (updateTriggered) {
  // Trigger a single restart
  process.kill(process.pid, 'SIGUSR1');
} else {
  // Handle other restart triggers (e.g., Telegram polling stall)
  // ...
}
  1. Add a self-check:
    • Implement a periodic launchctl print self-check to detect if the gateway was removed from launchd.

Example code snippet ( Node.js ):

const childProcess = require('child_process');

// Periodic self-check (e.g., every 1 hour)
setInterval(() => {
  childProcess.exec('launchctl print gui/502/ai.openclaw.gateway', (error, stdout, stderr) => {
    if (error) {
      // Re-register the service if it was removed
      childProcess.exec('openclaw gateway install');
    }
  });
}, 3600000); // 1 hour

Verification

To verify the fix, monitor the gateway's behavior after applying the changes:

  • Check the system log for any removing service messages.
  • Verify that the gateway remains running and responsive after updates and restarts.
  • Test the self-check mechanism by manually removing the service and verifying that it re-registers itself.

Extra Tips

  • Regularly review and update the LaunchAgent plist to ensure it reflects the latest requirements.
  • Consider implementing additional monitoring and alerting mechanisms to detect and respond to service outages.
  • Test the fix thoroughly to ensure it resolves the issue and does not introduce any new problems.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING