openclaw - 💡(How to fix) Fix HTTP server becomes unresponsive while WebSocket continues working (event loop blocking) [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#77635Fetched 2026-05-06 06:23:40
View on GitHub
Comments
2
Participants
2
Timeline
7
Reactions
2
Timeline (top)
commented ×2mentioned ×2subscribed ×2closed ×1

After upgrading to OpenClaw 2026.5.3, the gateway HTTP server silently stops responding to inbound requests (e.g., BlueBubbles webhook POSTs) while the WebSocket server continues functioning normally. This causes all webhook-based channels to stop receiving messages with no error logged.

Error Message

After upgrading to OpenClaw 2026.5.3, the gateway HTTP server silently stops responding to inbound requests (e.g., BlueBubbles webhook POSTs) while the WebSocket server continues functioning normally. This causes all webhook-based channels to stop receiving messages with no error logged.

Root Cause

After upgrading to OpenClaw 2026.5.3, the gateway HTTP server silently stops responding to inbound requests (e.g., BlueBubbles webhook POSTs) while the WebSocket server continues functioning normally. This causes all webhook-based channels to stop receiving messages with no error logged.

Fix Action

Workaround

Running a cron watchdog that pings the HTTP endpoint every 5 minutes and issues openclaw gateway restart when unresponsive. This limits message loss but is not a real fix.

Code Example

[diagnostic] liveness warning: reasons=event_loop_delay interval=30s eventLoopDelayP99Ms=152.7 eventLoopDelayMaxMs=2424.3 eventLoopUtilization=0.33 cpuCoreRatio=0.376 active=1 waiting=0 queued=2
RAW_BUFFERClick to expand / collapse

Summary

After upgrading to OpenClaw 2026.5.3, the gateway HTTP server silently stops responding to inbound requests (e.g., BlueBubbles webhook POSTs) while the WebSocket server continues functioning normally. This causes all webhook-based channels to stop receiving messages with no error logged.

Environment

  • OpenClaw 2026.5.3-1 (2eae30e)
  • Node v25.8.2
  • macOS 26.4.1 (arm64, Mac mini)
  • @openclaw/bluebubbles 2026.5.3
  • Gateway port: 18789 (HTTP + WS on same port)

Reproduction

The issue reproduces consistently within 30-60 minutes of a fresh gateway start. It has occurred 4+ times today.

  1. Gateway starts normally, HTTP server listening, BlueBubbles webhook registered
  2. After 30-60 min, HTTP POST requests to the webhook endpoint get no response (curl shows "Received HTTP/0.9 when not allowed" or empty response; netcat gets binary WebSocket upgrade bytes instead of HTTP)
  3. WebSocket connections remain fully functional (Control UI works fine)
  4. No errors in gateway.log — the HTTP server just silently stops processing requests
  5. Only fix is a full openclaw gateway restart

Diagnostics

The verbose log shows event loop blocking coinciding with the failures:

[diagnostic] liveness warning: reasons=event_loop_delay interval=30s eventLoopDelayP99Ms=152.7 eventLoopDelayMaxMs=2424.3 eventLoopUtilization=0.33 cpuCoreRatio=0.376 active=1 waiting=0 queued=2

Max event loop delay of 2.4 seconds means the HTTP server cannot process incoming connections while the event loop is blocked.

Additionally, there are multiple fetch-timeout warnings (5000ms timeouts on external fetches) suggesting heavy I/O contention.

Timeline (2026-05-04)

  • 11:31 AM — Gateway restart after update. BlueBubbles working.
  • ~1:15 PM — HTTP server dies silently. No webhooks received. WS still fine. No errors logged.
  • 1:15 PM — Manual restart fixes it. Catchup replays missed messages.
  • ~7:44 PM — Restart. Working again.
  • ~8:16 PM — Dead again (32 min). Watchdog cron detects and restarts.
  • ~8:19 PM — Restart. Working again.
  • ~8:19 PM — Dead again almost immediately after restart.

Workaround

Running a cron watchdog that pings the HTTP endpoint every 5 minutes and issues openclaw gateway restart when unresponsive. This limits message loss but is not a real fix.

Suspected Cause

The gateway appears to share HTTP and WebSocket on the same port (18789). When the Node.js event loop blocks (from embedded agent runs, cron job execution, or model calls), the HTTP request handler becomes unresponsive while established WebSocket connections survive due to their persistent nature.

This may be a regression in 2026.5.3 — the issue was not observed before the upgrade (previously on 2026.5.2).

Expected Behavior

The HTTP server should remain responsive to webhook POSTs regardless of agent/cron workload, or the gateway should detect and self-heal when the HTTP listener becomes unresponsive.

extent analysis

TL;DR

The most likely fix is to address the event loop blocking issue, potentially by optimizing or separating the workload of the HTTP and WebSocket servers.

Guidance

  • Investigate and optimize the embedded agent runs, cron job execution, or model calls that may be causing the event loop blocking.
  • Consider separating the HTTP and WebSocket servers to run on different ports or processes to prevent the event loop blocking from affecting the HTTP server.
  • Review the fetch-timeout warnings and optimize the external fetches to reduce I/O contention.
  • Monitor the event loop delay and utilization to identify patterns or correlations with the HTTP server failures.

Example

No code snippet is provided as the issue is more related to system configuration and optimization.

Notes

The issue may be specific to the 2026.5.3 version of OpenClaw, and reverting to a previous version may be a temporary workaround. However, this is not a recommended long-term solution.

Recommendation

Apply a workaround by optimizing the event loop and I/O contention, as the root cause is likely related to the event loop blocking and not a simple version upgrade issue. This approach allows for a more targeted fix and reduces the risk of introducing new issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING