openclaw - 💡(How to fix) Fix [Bug] Auto-upgrade v2026.3.24→2026.3.28: plist race condition kills gateway for 9+ hours on macOS (KeepAlive not respected) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#57379Fetched 2026-04-08 01:50:26
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

After auto-upgrading from v2026.3.24 to v2026.3.28, the gateway received a SIGTERM at 11:35 AM ET, exited with a config validation error, and never restarted for 9+ hours despite KeepAlive: true in the LaunchAgent plist. Manual intervention was required at 8:51 PM ET.

This appears to be a new confirmed occurrence of the race condition described in #28335, now on the 2026.3.24→2026.3.28 upgrade path.

Error Message

2026-03-29T11:35:26.381-04:00 [gateway] shutdown error: Error: Invalid config at /Users/.../openclaw.json:

  • plugins.entries.bluebubbles: plugin requires OpenClaw >=2026.3.28, but this host is 2026.3.24; skipping load
  • plugins.entries.discord: plugin requires OpenClaw >=2026.3.28, but this host is 2026.3.24; skipping load [... 17 more plugin version errors ...]

Root Cause

The upgrade sequence appears to be:

  1. npm installs v2026.3.28 binary
  2. Config is updated with plugin entries requiring >=2026.3.28
  3. Plist is rewritten (updating OPENCLAW_SERVICE_VERSION)
  4. launchd detects plist change and internally unloads the job (as described in #28335)
  5. SIGTERM sent to running v2026.3.24 gateway process
  6. Gateway exits with non-zero (config validation failure against already-upgraded config)
  7. launchd's kickstart operates on the now-unloaded job → fails silently
  8. KeepAlive is never triggered because the job is unloaded, not crashed
  9. Gateway dead. No alerts. No recovery.

The additional wrinkle vs #28335: the config is written before the new binary is active, so the old binary exits with an error code (config references plugins it can't load). This non-zero exit may further confuse launchd's restart logic.

Code Example

2026-03-29T11:35:22.604-04:00 [ws] ⇄ res ✓ doctor.memory.status
2026-03-29T11:35:26.352-04:00 [gateway] signal SIGTERM received
2026-03-29T11:35:26.352-04:00 [gateway] received SIGTERM; shutting down

---

2026-03-29T11:35:26.381-04:00 [gateway] shutdown error: Error: Invalid config at /Users/.../openclaw.json:
- plugins.entries.bluebubbles: plugin requires OpenClaw >=2026.3.28, but this host is 2026.3.24; skipping load
- plugins.entries.discord: plugin requires OpenClaw >=2026.3.28, but this host is 2026.3.24; skipping load
[... 17 more plugin version errors ...]

---

2026-03-29T20:53:39.171-04:00 [gateway] listening on ws://127.0.0.1:18789 (PID 63206)
RAW_BUFFERClick to expand / collapse

Summary

After auto-upgrading from v2026.3.24 to v2026.3.28, the gateway received a SIGTERM at 11:35 AM ET, exited with a config validation error, and never restarted for 9+ hours despite KeepAlive: true in the LaunchAgent plist. Manual intervention was required at 8:51 PM ET.

This appears to be a new confirmed occurrence of the race condition described in #28335, now on the 2026.3.24→2026.3.28 upgrade path.

Environment

  • macOS Darwin 25.3.0 (arm64) — MacBook Pro
  • Node.js v25.8.0
  • OpenClaw: v2026.3.24 → v2026.3.28 (auto-upgrade enabled)
  • LaunchAgent: KeepAlive: true, ThrottleInterval: 1
  • Installed via npm (global)

Timeline (2026-03-29, all times ET)

TimeEvent
~10:44 AMLast successful agent response
11:35:21 AMAuto-upgrade runs, installs v2026.3.28, rewrites plist with new OPENCLAW_SERVICE_VERSION=2026.3.28
11:35:26 AMGateway (still running v2026.3.24 binary) receives SIGTERM
11:35:26 AMGateway attempts shutdown, exits with error due to config already referencing v2026.3.28 plugin versions it cannot load
11:35 AM–8:51 PMZero log entries. Gateway completely dead. 9h 16m outage.
8:51 PMUser manually starts gateway via openclaw gateway restart

Relevant Log Entries

Last entries before outage (from gateway.log):

2026-03-29T11:35:22.604-04:00 [ws] ⇄ res ✓ doctor.memory.status
2026-03-29T11:35:26.352-04:00 [gateway] signal SIGTERM received
2026-03-29T11:35:26.352-04:00 [gateway] received SIGTERM; shutting down

Error on shutdown (from gateway.err.log):

2026-03-29T11:35:26.381-04:00 [gateway] shutdown error: Error: Invalid config at /Users/.../openclaw.json:
- plugins.entries.bluebubbles: plugin requires OpenClaw >=2026.3.28, but this host is 2026.3.24; skipping load
- plugins.entries.discord: plugin requires OpenClaw >=2026.3.28, but this host is 2026.3.24; skipping load
[... 17 more plugin version errors ...]

First entry after 9h gap:

2026-03-29T20:53:39.171-04:00 [gateway] listening on ws://127.0.0.1:18789 (PID 63206)

Root Cause Analysis

The upgrade sequence appears to be:

  1. npm installs v2026.3.28 binary
  2. Config is updated with plugin entries requiring >=2026.3.28
  3. Plist is rewritten (updating OPENCLAW_SERVICE_VERSION)
  4. launchd detects plist change and internally unloads the job (as described in #28335)
  5. SIGTERM sent to running v2026.3.24 gateway process
  6. Gateway exits with non-zero (config validation failure against already-upgraded config)
  7. launchd's kickstart operates on the now-unloaded job → fails silently
  8. KeepAlive is never triggered because the job is unloaded, not crashed
  9. Gateway dead. No alerts. No recovery.

The additional wrinkle vs #28335: the config is written before the new binary is active, so the old binary exits with an error code (config references plugins it can't load). This non-zero exit may further confuse launchd's restart logic.

Impact

  • 9h 16m complete outage with no notification
  • All inbound messages (Telegram, iMessage) silently dropped during window
  • User had no indication the gateway was down

Expected Behavior

  • Auto-upgrade should not produce a 9-hour silent outage
  • At minimum, launchd KeepAlive should eventually recover the process
  • Ideally: upgrade sequence should ensure new binary is running before config is updated, or old binary can tolerate the new config gracefully (warn, don't error)

Related Issues

  • #28335 — macOS: auto-updater restart fails silently due to plist race condition with launchd (same root cause, different version)
  • #54861 — Gateway silently dies after auto-update: launchd removes service (similar symptom)
  • #50070 — Gateway fails to auto-restart after SIGTERM — launchd KeepAlive ineffective

extent analysis

Fix Plan

To resolve the issue, we need to modify the upgrade sequence to ensure the new binary is running before the config is updated. Here are the steps:

  • Modify the auto-upgrade script:
    • Install the new binary
    • Start the new binary in a separate process
    • Wait for the new binary to confirm it's running and ready
    • Update the config to reference the new plugin versions
    • Rewrite the plist with the new OPENCLAW_SERVICE_VERSION
    • Send a SIGTERM to the old binary process

Example code snippet in Node.js:

const childProcess = require('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!child_process');

// Install new binary
const newBinaryPath = installNewBinary();

// Start new binary in a separate process
const newProcess = childProcess.spawn(newBinaryPath, ['--ready-when-started']);

// Wait for new binary to confirm it's running and ready
newProcess.stdout.on('data', (data) => {
  if (data.toString().includes('Ready')) {
    // Update config to reference new plugin versions
    updateConfig();

    // Rewrite plist with new OPENCLAW_SERVICE_VERSION
    rewritePlist();

    // Send SIGTERM to old binary process
    process.kill(oldProcess.pid, 'SIGTERM');
  }
});
  • Modify the gateway code to tolerate the new config gracefully:
    • Add a warning for plugin version mismatches instead of exiting with an error
    • Example code snippet:
// Check plugin versions
if (pluginVersion < requiredVersion) {
  console.warn(`Plugin ${pluginName} requires OpenClaw >=${requiredVersion}, but this host is ${currentVersion}; skipping load`);
  // Continue running with a warning instead of exiting
}

Verification

To verify the fix, test the auto-upgrade process and ensure the gateway restarts correctly after the upgrade. Check the logs for any errors or warnings and verify that the new binary is running and the config is updated correctly.

Extra Tips

  • Consider adding additional logging and monitoring to detect and alert on similar issues in the future.
  • Review the launchd configuration and ensure that the KeepAlive and ThrottleInterval settings are correctly configured to restart the gateway process in case of a failure.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING