openclaw - 💡(How to fix) Fix [Bug]: Custom watchdog kills healthy gateway during slow startup (2026.4.26 + 20 plugin deps) [2 comments, 2 participants]

drmarcopapa · 2026-04-29T10:43:40Z

[openclaw] Component: openclaw-ops skill / watchdog.sh Date: April 29, 2026 The watchdog script at ~/.openclaw/skills/openclaw-ops/scripts/watchdog.sh uses a 5… **Component:** `openclaw-ops` skill / `watchdog.sh` **Date:** April 29, 2026 The watchdog script at `~/.openclaw/skills/openclaw-ops/scripts/watchdog.sh` uses a 5-second HTTP health check timeout (`curl --max-time 5`). In OpenClaw 4.26, the gateway takes 3-4 minutes to fully initialize with 20 bundled plugin deps. The watchdog fires on its 5-minute interval, the health check times out (HTTP 000), and it calls `openclaw gateway restart` which sends SIGKILL. This creates a kill cycle: the gateway starts, the watchdog kills it 5 minutes later, repeat indefinitely. ## Fix In `~/.openclaw/skills/openclaw-ops/scripts/watchdog.sh`, change: ```bash HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 5 "$GATEWAY_URL" 2>/dev/null || echo "000") ``` to: ```bash HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 30 "$GATEWAY_URL" 2>/dev/null || echo "000") ``` Alternatively, add a startup grace period check: if the gateway process has been running for less than 5 minutes, skip the health check entirely. ## Workaround Disable the watchdog until the fix is applied: ```bash launchctl bootout "gui/$(id -u)/ai.openclaw.watchdog" ``` The LaunchAgent's `KeepAlive: true` provides basic crash recovery without the watchdog. ### Bug type Crash (process/app exits or hangs) ### Beta release blocker No ### Summary **Component:** `openclaw-ops` skill / `watchdog.sh` **Date:** April 29, 2026 The watchdog script at `~/.openclaw/skills/openclaw-ops/scripts/watchdog.sh` uses a 5-second HTTP health check timeout (`curl --max-time 5`). In OpenClaw 4.26, the gateway takes 3-4 minutes to fully initialize with 20 bundled plugin deps. The watchdog fires on its 5-minute interval, the health check times out (HTTP 000), and it calls `openclaw gateway restart` which sends SIGKILL. This creates a kill cycle: the gateway starts, the watchdog kills it 5 minutes later, repeat indefinitely. ### Steps to reproduce 1. Install OpenClaw 4.26 on macOS with the `openclaw-ops` skill and its watchdog LaunchAgent enabled 2. Restart the gateway (fresh boot or `launchctl kickstart -k`) 3. Wait 5 minutes for the watchdog's first tick 4. Observe gateway exits with code `-9` (SIGKILL) in `launchctl list | grep ai.openclaw.gateway` 5. Channels go silent; Control UI shows "Probe failed · This operation was aborted" ### Expected behavior No crash! ### Actual behavior Crash with code `-9` (SIGKILL) . Telegram and Discord dead. ## Root Cause 4.26 expanded bundled plugin deps from 9 (4.24) to 20. First boot after upgrade or restart takes 3-4 minutes. The watchdog's 5-second health check fires before the gateway is fully initialized and misclassifies a healthy-but-loading gateway as dead. --- ## Diagnosis Signals - `launchctl list | grep ai.openclaw.gateway` shows exit code `-9` - `~/.openclaw/logs/watchdog.log` shows `Gateway unreachable (HTTP 000)` followed by `Attempting gateway restart` - `~/.openclaw/logs/gateway.log` shows `http server listening` followed minutes later by `signal SIGTERM received` with no error - No crash in gateway logs — it dies cleanly because the SIGKILL comes from outside ### OpenClaw version 2026.4.26 ### Operating system macOS 15.7.3 (x64), Node 22.22.0 ### Install method npm global ### Model openai-codex/gpt-5.4 ### Provider / routing chain openclaw -> codex-gpt5.4 ### Additional provider/model setup details ## Fix In `~/.openclaw/skills/openclaw-ops/scripts/watchdog.sh`, change: ```bash HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 5 "$GATEWAY_URL" 2>/dev/null || echo "000") ``` to: ```bash HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 30 "$GATEWAY_URL" 2>/dev/null || echo "000") ``` Alternatively, add a startup grace period check: if the gateway process has been running for less than 5 minutes, skip the health check entirely. ## Workaround Disable the watchdog until the fix is applied: ```bash launchctl bootout "gui/$(id -u)/ai.openclaw.watchdog" ``` The LaunchAgent's `KeepAlive: true` provides basic crash recovery without the watchdog. ### Logs, screenshots, and evidence ```shell ``` ### Impact and severity **Severity:** High — causes complete Telegram/Discord outage with no obvious error ### Additional information ## Environment - macOS 15.7.3 (x64), Node 22.22.0 - OpenClaw 2026.4.26 (be8c246) - `openclaw-ops` skill with watchdog LaunchAgent - 20 bundled plugin deps staged at `~/.openclaw/plugin-runtime-deps/openclaw-2026.4.26- /`

openclaw2026-04-29 10:43:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#74281•Fetched 2026-04-30 06:26:12

View on GitHub

Comments

Participants

Timeline

Reactions

Author

drmarcopapa

Participants

clawsweeper[bot]

drmarcopapa

Timeline (top)

commented ×2cross-referenced ×2labeled ×2closed ×1

Component: openclaw-ops skill / watchdog.sh
Date: April 29, 2026

The watchdog script at ~/.openclaw/skills/openclaw-ops/scripts/watchdog.sh uses a 5-second HTTP health check timeout (curl --max-time 5). In OpenClaw 4.26, the gateway takes 3-4 minutes to fully initialize with 20 bundled plugin deps. The watchdog fires on its 5-minute interval, the health check times out (HTTP 000), and it calls openclaw gateway restart which sends SIGKILL. This creates a kill cycle: the gateway starts, the watchdog kills it 5 minutes later, repeat indefinitely.

Error Message

~/.openclaw/logs/gateway.log shows http server listening followed minutes later by signal SIGTERM received with no error Severity: High — causes complete Telegram/Discord outage with no obvious error

Root Cause

4.26 expanded bundled plugin deps from 9 (4.24) to 20. First boot after upgrade or restart takes 3-4 minutes. The watchdog's 5-second health check fires before the gateway is fully initialized and misclassifies a healthy-but-loading gateway as dead.

Fix Action

Fix

In ~/.openclaw/skills/openclaw-ops/scripts/watchdog.sh, change:

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 5 "$GATEWAY_URL" 2>/dev/null || echo "000")

to:

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 30 "$GATEWAY_URL" 2>/dev/null || echo "000")

Alternatively, add a startup grace period check: if the gateway process has been running for less than 5 minutes, skip the health check entirely.

Workaround

Disable the watchdog until the fix is applied:

launchctl bootout "gui/$(id -u)/ai.openclaw.watchdog"

The LaunchAgent's KeepAlive: true provides basic crash recovery without the watchdog.

Code Example

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 5 "$GATEWAY_URL" 2>/dev/null || echo "000")

---

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 30 "$GATEWAY_URL" 2>/dev/null || echo "000")

---

launchctl bootout "gui/$(id -u)/ai.openclaw.watchdog"

---

RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

Summary

Component: openclaw-ops skill / watchdog.sh
Date: April 29, 2026

Steps to reproduce

Install OpenClaw 4.26 on macOS with the openclaw-ops skill and its watchdog LaunchAgent enabled
Restart the gateway (fresh boot or launchctl kickstart -k)
Wait 5 minutes for the watchdog's first tick
Observe gateway exits with code -9 (SIGKILL) in launchctl list | grep ai.openclaw.gateway
Channels go silent; Control UI shows "Probe failed · This operation was aborted"

Expected behavior

No crash!

Actual behavior

Crash with code -9 (SIGKILL) . Telegram and Discord dead.

Root Cause

Diagnosis Signals

launchctl list | grep ai.openclaw.gateway shows exit code -9
~/.openclaw/logs/watchdog.log shows Gateway unreachable (HTTP 000) followed by Attempting gateway restart
~/.openclaw/logs/gateway.log shows http server listening followed minutes later by signal SIGTERM received with no error
No crash in gateway logs — it dies cleanly because the SIGKILL comes from outside

OpenClaw version

2026.4.26

Operating system

macOS 15.7.3 (x64), Node 22.22.0

Install method

npm global

Model

openai-codex/gpt-5.4

Provider / routing chain

openclaw -> codex-gpt5.4

Additional provider/model setup details

Fix

In ~/.openclaw/skills/openclaw-ops/scripts/watchdog.sh, change:

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 5 "$GATEWAY_URL" 2>/dev/null || echo "000")

to:

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 30 "$GATEWAY_URL" 2>/dev/null || echo "000")

Alternatively, add a startup grace period check: if the gateway process has been running for less than 5 minutes, skip the health check entirely.

Workaround

Disable the watchdog until the fix is applied:

launchctl bootout "gui/$(id -u)/ai.openclaw.watchdog"

The LaunchAgent's KeepAlive: true provides basic crash recovery without the watchdog.

Logs, screenshots, and evidence

Impact and severity

Severity: High — causes complete Telegram/Discord outage with no obvious error

Additional information

Environment

macOS 15.7.3 (x64), Node 22.22.0
OpenClaw 2026.4.26 (be8c246)
openclaw-ops skill with watchdog LaunchAgent
20 bundled plugin deps staged at ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.26-<hash>/

extent analysis

TL;DR

Increase the HTTP health check timeout in the watchdog.sh script to accommodate the longer initialization time of the OpenClaw gateway.

Guidance

Modify the watchdog.sh script to increase the --max-time parameter for the curl command to 30 seconds, as shown in the provided fix.
Alternatively, consider implementing a startup grace period check to skip the health check if the gateway process has been running for less than 5 minutes.
To verify the fix, restart the gateway and wait for 5 minutes to ensure it doesn't exit with code -9 (SIGKILL).
If the issue persists, check the watchdog.log and gateway.log for any error messages or signals that may indicate the cause of the crash.

Example

The modified curl command with an increased timeout:

HTTP_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" --max-time 30 "$GATEWAY_URL" 2>/dev/null || echo "000")

Notes

The provided fix assumes that increasing the timeout will allow the gateway to fully initialize before the health check times out. However, if the issue persists, further investigation into the gateway's initialization process may be necessary.

Recommendation

Apply the workaround by increasing the HTTP health check timeout to 30 seconds, as this is a straightforward and targeted fix that addresses the identified root cause.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

No crash!

#API middleware #SSR setup #ISR setup #authentication setup #request error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix [Bug]: Custom watchdog kills healthy gateway during slow startup (2026.4.26 + 20 plugin deps) [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix

Workaround

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

Root Cause

Diagnosis Signals

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Fix

Workaround

Logs, screenshots, and evidence

Impact and severity

Additional information

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING