openclaw - ✅(Solved) Fix Gateway fails to auto-restart after SIGTERM — launchd KeepAlive ineffective (2-hour outage) [1 pull requests, 5 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#50070Fetched 2026-04-08 00:59:32
View on GitHub
Comments
5
Participants
3
Timeline
8
Reactions
0
Timeline (top)
commented ×5cross-referenced ×2referenced ×1

Gateway receives SIGTERM during config hot-reload and shuts down cleanly, but launchd does not restart it despite KeepAlive: true and ThrottleInterval: 1. Results in multi-hour outages requiring manual openclaw gateway start to recover.

Root Cause

openclaw config set writes the config file, which the running gateway detects via file watcher. The gateway hot-reloads some settings, but then receives SIGTERM ~8 seconds later. The SIGTERM appears to come from the config set CLI process itself (triggering a full restart via launchctl kickstart -k).

The gateway handles SIGTERM gracefully and exits with code 0. macOS launchd has a known quirk where a clean exit (code 0) after SIGTERM is sometimes treated as an intentional stop rather than a crash requiring restart — even with KeepAlive: true.

Fix Action

Workaround

We implemented an external watchdog script (runs every 30s via separate LaunchAgent) that checks port 18789 and runs launchctl kickstart -k if the gateway is down. This caps outage time at ~45 seconds.

PR fix notes

PR #1: fix(daemon): use launchctl kickstart instead of direct SIGTERM

Description (problem / solution / changelog)

Summary

Direct SIGTERM bypasses launchd supervision, causing gateway to not restart properly on macOS (#50070). Using kickstart ensures launchd properly handles the restart cycle.

Root Cause

In src/cli/daemon-cli/lifecycle.ts, stopGatewayWithoutServiceManager() sends SIGTERM directly via process.kill(pid, 'SIGTERM'). When the gateway exits cleanly with code 0, macOS launchd sometimes treats this as an intentional stop rather than a crash — even with KeepAlive: true.

Fix

Replace direct SIGTERM with launchctl kickstart -k gui/$(id -u)/ai.openclaw.gateway, which properly signals launchd to restart the service.

Testing

  • Verified launchctl kickstart works on macOS with launchd service
  • Gateway successfully restarts via kickstart
  • Needs CI/CD testing on Linux (systemd path)

Fixes: Gateway fails to auto-restart after SIGTERM (#50070)

Changed files

  • src/agents/pi-embedded-runner/thinking.ts (modified, +1/-1)
  • src/agents/pi-tools.host-edit.ts (modified, +46/-9)
  • src/cli/daemon-cli/lifecycle.ts (modified, +12/-2)
  • src/gateway/session-utils.ts (modified, +2/-0)
  • src/gateway/session-utils.types.ts (modified, +2/-0)
  • ui/src/ui/views/chat.ts (modified, +3/-4)

Code Example

2026-03-18T14:05:25.775 [reload] config change detected; evaluating reload (historyLimit)
2026-03-18T14:05:25.777 [gateway/channels] restarting telegram channel
2026-03-18T14:05:25.779 [reload] config hot reload applied (channels.telegram.historyLimit)
2026-03-18T14:05:26.455 [reload] config change detected; evaluating reload (contextPruning)
2026-03-18T14:05:26.457 [reload] config change applied
2026-03-18T14:05:34.173 [gateway] signal SIGTERM received
2026-03-18T14:05:34.174 [gateway] received SIGTERM; shutting down
--- 2 HOUR GAP - NO LOG ENTRIES ---
2026-03-18T16:07:06.562 [heartbeat] started  ← manual recovery by user

---

SIGTERM at 2026-03-17T21:56:04.832
Gateway back at 2026-03-17T22:02:25.668 (6 min 21 sec gap)

---

state = running          ← only after manual restart
runs = 2                 ← launchd only started it twice total (boot + manual)
last terminating signal = Terminated: 15
properties = keepalive | runatload

---

<key>KeepAlive</key>
<true/>
<key>ThrottleInterval</key>
<integer>1</integer>
RAW_BUFFERClick to expand / collapse

OpenClaw Bug Report: Gateway fails to auto-restart after SIGTERM (launchd KeepAlive ineffective)

Summary

Gateway receives SIGTERM during config hot-reload and shuts down cleanly, but launchd does not restart it despite KeepAlive: true and ThrottleInterval: 1. Results in multi-hour outages requiring manual openclaw gateway start to recover.

Environment

  • OpenClaw: 2026.3.7 (42a1394)
  • OS: macOS 15.5 (arm64) — Mac Mini M4
  • Node: 22.22.1
  • Gateway mode: local
  • LaunchAgent: ai.openclaw.gateway

Reproduction

  1. Gateway running normally via LaunchAgent with KeepAlive: true
  2. Run openclaw config set channels.telegram.historyLimit 30 (or any config change)
  3. Gateway detects config change, hot-reloads, then receives SIGTERM ~8 seconds later
  4. Gateway shuts down cleanly (exit code 0)
  5. launchd does not restart the process

Evidence

Incident 1 — March 18, 2026 (2-hour outage)

2026-03-18T14:05:25.775 [reload] config change detected; evaluating reload (historyLimit)
2026-03-18T14:05:25.777 [gateway/channels] restarting telegram channel
2026-03-18T14:05:25.779 [reload] config hot reload applied (channels.telegram.historyLimit)
2026-03-18T14:05:26.455 [reload] config change detected; evaluating reload (contextPruning)
2026-03-18T14:05:26.457 [reload] config change applied
2026-03-18T14:05:34.173 [gateway] signal SIGTERM received
2026-03-18T14:05:34.174 [gateway] received SIGTERM; shutting down
--- 2 HOUR GAP - NO LOG ENTRIES ---
2026-03-18T16:07:06.562 [heartbeat] started  ← manual recovery by user

Incident 2 — March 17, 2026 (6-minute outage)

SIGTERM at 2026-03-17T21:56:04.832
Gateway back at 2026-03-17T22:02:25.668 (6 min 21 sec gap)

launchctl state after failure

state = running          ← only after manual restart
runs = 2                 ← launchd only started it twice total (boot + manual)
last terminating signal = Terminated: 15
properties = keepalive | runatload

LaunchAgent plist (relevant sections)

<key>KeepAlive</key>
<true/>
<key>ThrottleInterval</key>
<integer>1</integer>

Root Cause Analysis

openclaw config set writes the config file, which the running gateway detects via file watcher. The gateway hot-reloads some settings, but then receives SIGTERM ~8 seconds later. The SIGTERM appears to come from the config set CLI process itself (triggering a full restart via launchctl kickstart -k).

The gateway handles SIGTERM gracefully and exits with code 0. macOS launchd has a known quirk where a clean exit (code 0) after SIGTERM is sometimes treated as an intentional stop rather than a crash requiring restart — even with KeepAlive: true.

Expected Behavior

  1. openclaw config set should hot-reload config without killing the gateway process (it already does this for some settings — historyLimit was hot-reloaded successfully before the SIGTERM arrived)
  2. If a full restart IS needed, the restart mechanism should be reliable — the gateway should come back within seconds, not hours
  3. KeepAlive: true should guarantee auto-restart regardless of exit code

Workaround

We implemented an external watchdog script (runs every 30s via separate LaunchAgent) that checks port 18789 and runs launchctl kickstart -k if the gateway is down. This caps outage time at ~45 seconds.

Impact

  • 2-hour outage on March 18 — user's messages went unanswered, missed a business demo window
  • 6-minute outage on March 17
  • User had to manually run openclaw config set gateway.mode local && openclaw gateway install --force && openclaw gateway start each time
  • This is a production agent handling business operations via Telegram — reliability is critical

Suggested Fix

Either:

  1. Don't send SIGTERM on config changes that can be hot-reloaded (the reload already works — the SIGTERM is redundant)
  2. If SIGTERM is necessary, ensure the restart mechanism (launchctl kickstart -k) actually verifies the new process starts
  3. Add a built-in watchdog/health-check that auto-recovers if the gateway dies
  4. Exit with a non-zero code when killed externally, so launchd treats it as a crash requiring restart

extent analysis

Fix Plan

To address the issue of the gateway not auto-restarting after receiving SIGTERM, we will implement the following steps:

  • Modify the openclaw config set command to not send SIGTERM on config changes that can be hot-reloaded.
  • Implement a built-in watchdog/health-check to auto-recover if the gateway dies.

Code Changes

We will modify the openclaw config set command to check if the config change can be hot-reloaded before sending SIGTERM. If it can be hot-reloaded, we will skip sending SIGTERM.

// config-set.js
const hotReloadableSettings = ['historyLimit', 'contextPruning'];

function getConfigChangeType(setting) {
  if (hotReloadableSettings.includes(setting)) {
    return 'hot-reload';
  } else {
    return 'full-restart';
  }
}

function handleConfigChange(setting, value) {
  const changeType = getConfigChangeType(setting);
  if (changeType === 'hot-reload') {
    // Hot-reload the setting
    reloadSetting(setting, value);
  } else {
    // Send SIGTERM to restart the gateway
    sendSigterm();
  }
}

We will also implement a built-in watchdog/health-check using a separate thread that checks the gateway's status every 10 seconds.

// watchdog.js
const axios = require('axios');

function startWatchdog() {
  setInterval(async () => {
    try {
      const response = await axios.get('http://localhost:18789/healthcheck');
      if (response.status !== 200) {
        // Restart the gateway
        restartGateway();
      }
    } catch (error) {
      // Restart the gateway
      restartGateway();
    }
  }, 10000);
}

function restartGateway() {
  // Restart the gateway using launchctl
  const childProcess = require('child_process');
  childProcess.exec('launchctl kickstart -k ai.openclaw.gateway');
}

Verification

To verify that the fix worked, we will test the following scenarios:

  • Run openclaw config set channels.telegram.historyLimit 30 and verify that the gateway does not receive SIGTERM.
  • Run openclaw config set gateway.mode local and verify that the gateway restarts correctly.
  • Kill the gateway process manually and verify that the watchdog restarts it within 10 seconds.

Extra Tips

To prevent similar issues in the future, we recommend:

  • Implementing a more robust restart mechanism that verifies the new process starts correctly.
  • Using a more reliable health-check mechanism, such as a TCP check or a custom health-check endpoint.
  • Monitoring the gateway's logs and metrics to detect potential issues before they cause outages.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING