openclaw - 💡(How to fix) Fix Gateway event-loop starvation can make Discord appear hung during long model/network timeouts [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#84226Fetched 2026-05-20 03:42:24
View on GitHub
Comments
2
Participants
2
Timeline
13
Reactions
1
Timeline (top)
labeled ×8commented ×2closed ×1mentioned ×1

A live OpenClaw Gateway can appear running and eventually recover, but Discord can stop responding for minutes when long model/network calls starve the Node event loop. This matches user-visible reports where Discord messages to Jarvis received no response for hours or until later pokes/restarts.

Error Message

I also applied a local defensive guard around the channels.status method handler so unexpected handler exceptions return a structured UNAVAILABLE error instead of escaping the method, but that did not address this root cause and should be treated only as a defensive hardening idea.

Root Cause

I also applied a local defensive guard around the channels.status method handler so unexpected handler exceptions return a structured UNAVAILABLE error instead of escaping the method, but that did not address this root cause and should be treated only as a defensive hardening idea.

Fix Action

Fix / Workaround

Local mitigation applied

Patched local file:

RAW_BUFFERClick to expand / collapse

Summary

A live OpenClaw Gateway can appear running and eventually recover, but Discord can stop responding for minutes when long model/network calls starve the Node event loop. This matches user-visible reports where Discord messages to Jarvis received no response for hours or until later pokes/restarts.

Evidence

Observed on May 18-19, 2026 on 2026.5.12:

  • openclaw health --json reported severe event-loop degradation before restart:
    • eventLoop.degraded=true
    • reasons included event_loop_delay, event_loop_utilization, and cpu
    • p99/max delay reached ~14.1s
    • utilization was ~0.997 in one sample
  • Gateway logs showed model fetch timers firing extremely late, e.g.:
    • fetch timeout reached; aborting operation
    • eventLoopDelayHint="timer delayed 898065ms, likely event-loop starvation"
    • url="https://api.z.ai/api/coding/paas/v4/chat/completions"
    • model/provider warnings for zai/glm-4.7 after elapsed times far beyond the intended timeout
  • Gateway diagnostics logged stalled sessions and command-lane timeouts:
    • stalled session ... activeWorkKind=model_call lastProgress=model_call:started
    • CommandLaneTaskTimeoutError: Command lane "main" task timed out after 330000ms
  • Discord transport also logged heartbeat / websocket failures around the same window:
    • discord gateway: Gateway websocket closed: 1006
    • earlier Gateway heartbeat ACK timeout entries
  • User-visible symptom: Discord messages to Jarvis got no response for a while, then later responses resumed after the system recovered/restarted.

What was ruled out

A later openclaw channels status --json failure with 1006 was initially suspicious, but rerunning outside the Codex sandbox showed it was a local sandbox socket-denial artifact. With escalation, channels.status and gateway call commands.list both succeeded and all enabled Discord accounts were connected. So the actionable bug is not the channel status RPC itself.

Local mitigation applied

Restarting the Gateway cleared the immediate degraded event-loop state. After restart:

  • openclaw health --json showed eventLoop.degraded=false
  • all enabled Discord accounts reconnected
  • escalated openclaw channels status --json succeeded

I also applied a local defensive guard around the channels.status method handler so unexpected handler exceptions return a structured UNAVAILABLE error instead of escaping the method, but that did not address this root cause and should be treated only as a defensive hardening idea.

Patched local file:

  • /opt/homebrew/lib/node_modules/openclaw/dist/server-methods-CxcGaVP0.js

Backup:

  • /opt/homebrew/lib/node_modules/openclaw/dist/server-methods-CxcGaVP0.js.bak-before-channels-status-guard-2026-05-19T16-42-27-852Z

Expected behavior

Long model/provider/network calls should not starve the Gateway event loop enough to break Discord heartbeats, CLI/admin RPC responsiveness, or message delivery. Timeouts should fire near their configured deadline, and command-lane timeouts should not leave the Gateway unable to service channel traffic.

Suggested fixes

  • Ensure model/provider calls are fully abortable and do not block or monopolize the main Gateway event loop.
  • Add watchdog recovery when event-loop delay exceeds a threshold while active model calls are stale.
  • Consider isolating long model calls from channel transport / admin RPC handling.
  • Surface a clear health warning when Discord is connected but recent model-call starvation caused missed heartbeat ACKs or delayed message processing.
  • Add regression coverage for a hung provider request where the Gateway must continue responding to health, channels.status, and Discord heartbeats.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Long model/provider/network calls should not starve the Gateway event loop enough to break Discord heartbeats, CLI/admin RPC responsiveness, or message delivery. Timeouts should fire near their configured deadline, and command-lane timeouts should not leave the Gateway unable to service channel traffic.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Gateway event-loop starvation can make Discord appear hung during long model/network timeouts [2 comments, 2 participants]