openclaw - 💡(How to fix) Fix [Bug]: Gateway liveness reports severe event-loop stalls under subagent load

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Gateway diagnostics reported repeated multi-second event-loop stalls while several agent/subagent runs were active.

Root Cause

Affected: Gateway users running concurrent agent/subagent workloads with diagnostics enabled. Severity: High, because seconds-long event-loop stalls can delay polling, streaming, queue handling, and cleanup. Frequency: 154 captured liveness lines matched eventLoopDelayP99Ms= in the observed log. Consequence: Gateway responsiveness degrades while active agent/subagent work is running.

Fix Action

Fix / Workaround

The implicated source path is the diagnostic liveness and diagnostic event dispatch path. High-frequency diagnostic events were deferred, but the async queue drained the whole backlog in a single setImmediate turn, which can starve other gateway work during concurrent agent/subagent bursts.

Code Example

Trace/proof:
- gateway-dev.log:10274
  "liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu interval=30s eventLoopDelayP99Ms=6123.7 eventLoopDelayMaxMs=7709.1 eventLoopUtilization=0.975 cpuCoreRatio=0.983 active=5 waiting=0 queued=1 ..."
- gateway-dev.log:11542
  "liveness warning: reasons=event_loop_delay,event_loop_utilization interval=329s eventLoopDelayP99Ms=12750.7 eventLoopDelayMaxMs=12750.7 eventLoopUtilization=1 ..."
- gateway-dev.log:4190
  "liveness warning: reasons=event_loop_delay,cpu interval=30s eventLoopDelayP99Ms=1780.5 eventLoopDelayMaxMs=2128.6 eventLoopUtilization=0.892 cpuCoreRatio=0.951 active=6 ..."
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

Gateway diagnostics reported repeated multi-second event-loop stalls while several agent/subagent runs were active.

Steps to reproduce

  1. Start the gateway with diagnostics enabled.
  2. Run a workload with several active agent/subagent sessions.
  3. Observe gateway liveness warnings reporting multi-second event-loop delay while active work is present.

Expected behavior

The gateway diagnostic event path should not monopolize the main event loop during bursts from concurrent agent/subagent activity.

Actual behavior

The captured gateway log contains repeated liveness warnings with event-loop delay measured in seconds, including samples with active agent/subagent work and queued work.

OpenClaw version

NOT_ENOUGH_INFO

Operating system

NOT_ENOUGH_INFO

Install method

pnpm dev

Model

NOT_ENOUGH_INFO

Provider / routing chain

NOT_ENOUGH_INFO

Additional provider/model setup details

NOT_ENOUGH_INFO

Logs, screenshots, and evidence

Trace/proof:
- gateway-dev.log:10274
  "liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu interval=30s eventLoopDelayP99Ms=6123.7 eventLoopDelayMaxMs=7709.1 eventLoopUtilization=0.975 cpuCoreRatio=0.983 active=5 waiting=0 queued=1 ..."
- gateway-dev.log:11542
  "liveness warning: reasons=event_loop_delay,event_loop_utilization interval=329s eventLoopDelayP99Ms=12750.7 eventLoopDelayMaxMs=12750.7 eventLoopUtilization=1 ..."
- gateway-dev.log:4190
  "liveness warning: reasons=event_loop_delay,cpu interval=30s eventLoopDelayP99Ms=1780.5 eventLoopDelayMaxMs=2128.6 eventLoopUtilization=0.892 cpuCoreRatio=0.951 active=6 ..."

Impact and severity

Affected: Gateway users running concurrent agent/subagent workloads with diagnostics enabled. Severity: High, because seconds-long event-loop stalls can delay polling, streaming, queue handling, and cleanup. Frequency: 154 captured liveness lines matched eventLoopDelayP99Ms= in the observed log. Consequence: Gateway responsiveness degrades while active agent/subagent work is running.

Additional information

The implicated source path is the diagnostic liveness and diagnostic event dispatch path. High-frequency diagnostic events were deferred, but the async queue drained the whole backlog in a single setImmediate turn, which can starve other gateway work during concurrent agent/subagent bursts.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The gateway diagnostic event path should not monopolize the main event loop during bursts from concurrent agent/subagent activity.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Gateway liveness reports severe event-loop stalls under subagent load