openclaw - 💡(How to fix) Fix [Bug]: Isolated cron runs can wedge gateway

Error Message

Observed after upgrade from: OpenClaw 2026.5.18 (50a2481)

Observed on: OpenClaw 2026.5.27 (27ae826)

Gateway/service state during failure:

openclaw status: gateway reachable=false
openclaw status error: timeout
systemd user service: openclaw-gateway.service active (running)
Gateway bind: 127.0.0.1:18789
Webchat websocket handshakes timed out locally
Tailscale Serve was still configured/reachable; this did not appear to be a Tailscale failure

Resource usage during failure:

Gateway node process consuming about 1 full CPU core
RSS around 1.4G
systemd stats showed about 8h 49m CPU over 7h 57m wall time
Memory peak around 2.2G

Likely involved isolated cron jobs:

LAN new device watcher, every 60 seconds
arpwatch LAN alert check, every 15 minutes

Recovery: systemctl --user restart openclaw-gateway.service

Then disabled the two cron jobs listed above.

After recovery:

Gateway reachable=true
Websocket latency about 57ms
CPU settled to roughly 0.5-2%
RSS around 454M
Tailscale Serve/webchat worked again

Root Cause

Affected: Webchat/Tailscale users relying on the local OpenClaw gateway, especially with isolated cron jobs enabled. Severity: High for affected users because it breaks remote chat access while the service still appears active under systemd. Frequency: Observed once after upgrading to 2026.5.27; not yet deterministically reproduced. Consequence: Webchat becomes unreachable until the gateway is manually restarted, and systemd does not automatically recover because the process remains active.

Code Example

Observed after upgrade from:
OpenClaw 2026.5.18 (50a2481)

Observed on:
OpenClaw 2026.5.27 (27ae826)

Gateway/service state during failure:
- openclaw status: gateway reachable=false
- openclaw status error: timeout
- systemd user service: openclaw-gateway.service active (running)
- Gateway bind: 127.0.0.1:18789
- Webchat websocket handshakes timed out locally
- Tailscale Serve was still configured/reachable; this did not appear to be a Tailscale failure

Resource usage during failure:
- Gateway node process consuming about 1 full CPU core
- RSS around 1.4G
- systemd stats showed about 8h 49m CPU over 7h 57m wall time
- Memory peak around 2.2G

Likely involved isolated cron jobs:
- LAN new device watcher, every 60 seconds
- arpwatch LAN alert check, every 15 minutes

Recovery:
systemctl --user restart openclaw-gateway.service

Then disabled the two cron jobs listed above.

After recovery:
- Gateway reachable=true
- Websocket latency about 57ms
- CPU settled to roughly 0.5-2%
- RSS around 454M
- Tailscale Serve/webchat worked again

Bug type

Crash (process/app exits or hangs)

Beta release blocker

Summary

After upgrading from OpenClaw 2026.5.18 to 2026.5.27, isolated cron runs appeared to wedge the gateway: systemd still reported the service active, but openclaw status and webchat websocket handshakes timed out.

Steps to reproduce

Run OpenClaw 2026.5.27 with the gateway managed by the systemd user service.
Have isolated cron jobs enabled, including a frequent job such as "LAN new device watcher" every 60 seconds and another such as "arpwatch LAN alert check" every 15 minutes.
Let the system run for several hours after upgrade.
Observe webchat becoming unreachable.
Run openclaw status and observe gateway reachable=false with a timeout while systemd still reports openclaw-gateway.service active.

Expected behavior

The gateway should remain responsive even if an isolated cron run stalls or loops. If a cron child task wedges, OpenClaw should timeout/kill that run, trip a health check, or restart/recover instead of leaving the gateway active but unreachable.

Actual behavior

The gateway process remained active under systemd, but openclaw status timed out, local webchat websocket handshakes timed out, and the gateway node process consumed sustained high CPU and memory until manually restarted.

OpenClaw version

2026.5.27 (27ae826)

Operating system

Ubuntu 26.04

Install method

npm global, upgraded with openclaw update

Model

gpt-5 / Codex via OpenClaw

Provider / routing chain

webchat -> Tailscale Serve -> local OpenClaw gateway on 127.0.0.1:18789 -> Codex/OpenAI

Additional provider/model setup details

The failure did not appear to be provider/model-specific. The observable outage was at the local gateway/webchat layer: Tailscale still routed to the local gateway, but the gateway did not respond fast enough for status/websocket handshakes.

Logs, screenshots, and evidence

Observed after upgrade from:
OpenClaw 2026.5.18 (50a2481)

Observed on:
OpenClaw 2026.5.27 (27ae826)

Gateway/service state during failure:
- openclaw status: gateway reachable=false
- openclaw status error: timeout
- systemd user service: openclaw-gateway.service active (running)
- Gateway bind: 127.0.0.1:18789
- Webchat websocket handshakes timed out locally
- Tailscale Serve was still configured/reachable; this did not appear to be a Tailscale failure

Resource usage during failure:
- Gateway node process consuming about 1 full CPU core
- RSS around 1.4G
- systemd stats showed about 8h 49m CPU over 7h 57m wall time
- Memory peak around 2.2G

Likely involved isolated cron jobs:
- LAN new device watcher, every 60 seconds
- arpwatch LAN alert check, every 15 minutes

Recovery:
systemctl --user restart openclaw-gateway.service

Then disabled the two cron jobs listed above.

After recovery:
- Gateway reachable=true
- Websocket latency about 57ms
- CPU settled to roughly 0.5-2%
- RSS around 454M
- Tailscale Serve/webchat worked again

Impact and severity

Additional information

Last known good: 2026.5.18 (50a2481) First observed bad: 2026.5.27 (27ae826)

Temporary workaround: restart openclaw-gateway.service, then disable or slow the frequent isolated cron jobs.

This looks more like a gateway/cron isolation issue than a Tailscale issue. Tailscale still routed to the local gateway, but the gateway was not responding. The gateway should remain responsive or self-recover if a cron child task stalls.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: Isolated cron runs can wedge gateway

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

FAQ

Expected behavior

Still need to ship something?

TRENDING