openclaw - 💡(How to fix) Fix [Bug]: Custom watchdog kills healthy gateway during slow startup (2026.4.26 + 20 plugin deps) [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74281Fetched 2026-04-30 06:26:12
View on GitHub
Comments
2
Participants
2
Timeline
7
Reactions
2
Timeline (top)
commented ×2cross-referenced ×2labeled ×2closed ×1

Component: openclaw-ops skill / watchdog.sh
Date: April 29, 2026

The watchdog script at ~/.openclaw/skills/openclaw-ops/scripts/watchdog.sh uses a 5-second HTTP health check timeout (curl --max-time 5). In OpenClaw 4.26, the gateway takes 3-4 minutes to fully initialize with 20 bundled plugin deps. The watchdog fires on its 5-minute interval, the health check times out (HTTP 000), and it calls openclaw gateway restart which sends SIGKILL. This creates a kill cycle: the gateway starts, the watchdog kills it 5 minutes later, repeat indefinitely.

Error Message

  • ~/.openclaw/logs/gateway.log shows http server listening followed minutes later by signal SIGTERM received with no error Severity: High — causes complete Telegram/Discord outage with no obvious error

Root Cause

4.26 expanded bundled plugin deps from 9 (4.24) to 20. First boot after upgrade or restart takes 3-4 minutes. The watchdog's 5-second health check fires before the gateway is fully initialized and misclassifies a healthy-but-loading gateway as dead.


Fix Action

Fix

In ~/.openclaw/skills/openclaw-ops/scripts/watchdog.sh, change:

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 5 "$GATEWAY_URL" 2>/dev/null || echo "000")

to:

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 30 "$GATEWAY_URL" 2>/dev/null || echo "000")

Alternatively, add a startup grace period check: if the gateway process has been running for less than 5 minutes, skip the health check entirely.

Workaround

Disable the watchdog until the fix is applied:

launchctl bootout "gui/$(id -u)/ai.openclaw.watchdog"

The LaunchAgent's KeepAlive: true provides basic crash recovery without the watchdog.

Code Example

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 5 "$GATEWAY_URL" 2>/dev/null || echo "000")

---

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 30 "$GATEWAY_URL" 2>/dev/null || echo "000")

---

launchctl bootout "gui/$(id -u)/ai.openclaw.watchdog"

---
RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

Component: openclaw-ops skill / watchdog.sh
Date: April 29, 2026

The watchdog script at ~/.openclaw/skills/openclaw-ops/scripts/watchdog.sh uses a 5-second HTTP health check timeout (curl --max-time 5). In OpenClaw 4.26, the gateway takes 3-4 minutes to fully initialize with 20 bundled plugin deps. The watchdog fires on its 5-minute interval, the health check times out (HTTP 000), and it calls openclaw gateway restart which sends SIGKILL. This creates a kill cycle: the gateway starts, the watchdog kills it 5 minutes later, repeat indefinitely.

Steps to reproduce

  1. Install OpenClaw 4.26 on macOS with the openclaw-ops skill and its watchdog LaunchAgent enabled
  2. Restart the gateway (fresh boot or launchctl kickstart -k)
  3. Wait 5 minutes for the watchdog's first tick
  4. Observe gateway exits with code -9 (SIGKILL) in launchctl list | grep ai.openclaw.gateway
  5. Channels go silent; Control UI shows "Probe failed · This operation was aborted"

Expected behavior

No crash!

Actual behavior

Crash with code -9 (SIGKILL) . Telegram and Discord dead.

Root Cause

4.26 expanded bundled plugin deps from 9 (4.24) to 20. First boot after upgrade or restart takes 3-4 minutes. The watchdog's 5-second health check fires before the gateway is fully initialized and misclassifies a healthy-but-loading gateway as dead.


Diagnosis Signals

  • launchctl list | grep ai.openclaw.gateway shows exit code -9
  • ~/.openclaw/logs/watchdog.log shows Gateway unreachable (HTTP 000) followed by Attempting gateway restart
  • ~/.openclaw/logs/gateway.log shows http server listening followed minutes later by signal SIGTERM received with no error
  • No crash in gateway logs — it dies cleanly because the SIGKILL comes from outside

OpenClaw version

2026.4.26

Operating system

macOS 15.7.3 (x64), Node 22.22.0

Install method

npm global

Model

openai-codex/gpt-5.4

Provider / routing chain

openclaw -> codex-gpt5.4

Additional provider/model setup details

Fix

In ~/.openclaw/skills/openclaw-ops/scripts/watchdog.sh, change:

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 5 "$GATEWAY_URL" 2>/dev/null || echo "000")

to:

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 30 "$GATEWAY_URL" 2>/dev/null || echo "000")

Alternatively, add a startup grace period check: if the gateway process has been running for less than 5 minutes, skip the health check entirely.

Workaround

Disable the watchdog until the fix is applied:

launchctl bootout "gui/$(id -u)/ai.openclaw.watchdog"

The LaunchAgent's KeepAlive: true provides basic crash recovery without the watchdog.

Logs, screenshots, and evidence

Impact and severity

Severity: High — causes complete Telegram/Discord outage with no obvious error

Additional information

Environment

  • macOS 15.7.3 (x64), Node 22.22.0
  • OpenClaw 2026.4.26 (be8c246)
  • openclaw-ops skill with watchdog LaunchAgent
  • 20 bundled plugin deps staged at ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.26-<hash>/

extent analysis

TL;DR

Increase the HTTP health check timeout in the watchdog.sh script to accommodate the longer initialization time of the OpenClaw gateway.

Guidance

  • Modify the watchdog.sh script to increase the --max-time parameter for the curl command to 30 seconds, as shown in the provided fix.
  • Alternatively, consider implementing a startup grace period check to skip the health check if the gateway process has been running for less than 5 minutes.
  • To verify the fix, restart the gateway and wait for 5 minutes to ensure it doesn't exit with code -9 (SIGKILL).
  • If the issue persists, check the watchdog.log and gateway.log for any error messages or signals that may indicate the cause of the crash.

Example

The modified curl command with an increased timeout:

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 30 "$GATEWAY_URL" 2>/dev/null || echo "000")

Notes

The provided fix assumes that increasing the timeout will allow the gateway to fully initialize before the health check times out. However, if the issue persists, further investigation into the gateway's initialization process may be necessary.

Recommendation

Apply the workaround by increasing the HTTP health check timeout to 30 seconds, as this is a straightforward and targeted fix that addresses the identified root cause.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

No crash!

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Custom watchdog kills healthy gateway during slow startup (2026.4.26 + 20 plugin deps) [2 comments, 2 participants]