openclaw - 💡(How to fix) Fix [Bug]: event-loop starvation delays fetch timeouts during gateway load [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

A fetch timeout configured for 10s fired after 17.175s while the gateway process was CPU-saturated, so timeout drift from event-loop starvation was only visible as an unstructured log line.

Root Cause

A fetch timeout configured for 10s fired after 17.175s while the gateway process was CPU-saturated, so timeout drift from event-loop starvation was only visible as an unstructured log line.

Fix Action

Fixed

Code Example

Command run:
pidstat -h -u -r -d -p [redacted gateway pid] 1 60

Pidstat summary from 60 rows:
avg_cpu=83.66
avg_usr=79.42
avg_sys=4.25
cpu_ge_100_count=42
avg_rd=0.00
avg_majflt=0.00
max_cpu=190.0 at 05:07:47

Selected raw pidstat rows around timeout drift:
05:07:47 1000 [redacted gateway pid] 177.00 13.00 0.00 0.00 190.00 3 7079.00 0.00 21237004 1265796 3.85 0.00 8.00 0.00 0 openclaw
05:07:48 1000 [redacted gateway pid] 100.00 1.00 0.00 0.00 101.00 3 60.00 0.00 21237004 1265796 3.85 0.00 0.00 0.00 0 openclaw
05:08:04 1000 [redacted gateway pid] 96.00 5.00 0.00 0.00 101.00 15 1264.00 0.00 21237004 1265884 3.85 0.00 28.00 4.00 0 openclaw

Gateway log correlation:
2026-05-21T05:08:04.306+00:00 fetch-timeout timeoutMs=10000 elapsedMs=17175 timerDelayMs=7175 eventLoopDelayHint="timer delayed 7175ms, likely event-loop starvation" operation=fetchWithTimeout url=https://api.telegram.org/bot[REDACTED]/getMe message="fetch timeout reached; aborting operation"
2026-05-21T05:08:04.316+00:00 diagnostic heartbeat: webhooks=0/0/0 active=1 waiting=0 queued=3
2026-05-21T05:08:04.330+00:00 agent/embedded embedded run tool end: runId=[redacted run id] tool=exec toolCallId=REDACTED
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

A fetch timeout configured for 10s fired after 17.175s while the gateway process was CPU-saturated, so timeout drift from event-loop starvation was only visible as an unstructured log line.

Steps to reproduce

  1. Run the gateway under live agent/tool load.
  2. Start a provider/channel request that uses fetchWithTimeout with a 10s timeout.
  3. Saturate the gateway event loop during the timeout window.
  4. Observe the timeout firing multiple seconds late with an event-loop starvation hint.

Expected behavior

When a fetch timeout fires late enough to indicate event-loop starvation, the gateway should surface a structured diagnostic event with timeout, elapsed, and timer-delay fields so diagnostics/stability consumers can correlate timeout drift with gateway pressure.

Actual behavior

The live log reported a 10s timeout firing after 17.175s and included timerDelayMs=7175, but this was only attached to the fetch-timeout log record. There was no dedicated structured diagnostic event for the delayed timeout.

OpenClaw version

Development checkout at 79be9401306f9e6c18937850b32629f0c3029fa4.

Operating system

Linux environment with pidstat; exact distribution/version: NOT_ENOUGH_INFO

Install method

Development gateway; exact launch method: NOT_ENOUGH_INFO

Model

NOT_ENOUGH_INFO

Provider / routing chain

Telegram API request through fetchWithTimeout; full provider/model route: NOT_ENOUGH_INFO

Additional provider/model setup details

NOT_ENOUGH_INFO

Logs, screenshots, and evidence

Command run:
pidstat -h -u -r -d -p [redacted gateway pid] 1 60

Pidstat summary from 60 rows:
avg_cpu=83.66
avg_usr=79.42
avg_sys=4.25
cpu_ge_100_count=42
avg_rd=0.00
avg_majflt=0.00
max_cpu=190.0 at 05:07:47

Selected raw pidstat rows around timeout drift:
05:07:47 1000 [redacted gateway pid] 177.00 13.00 0.00 0.00 190.00 3 7079.00 0.00 21237004 1265796 3.85 0.00 8.00 0.00 0 openclaw
05:07:48 1000 [redacted gateway pid] 100.00 1.00 0.00 0.00 101.00 3 60.00 0.00 21237004 1265796 3.85 0.00 0.00 0.00 0 openclaw
05:08:04 1000 [redacted gateway pid] 96.00 5.00 0.00 0.00 101.00 15 1264.00 0.00 21237004 1265884 3.85 0.00 28.00 4.00 0 openclaw

Gateway log correlation:
2026-05-21T05:08:04.306+00:00 fetch-timeout timeoutMs=10000 elapsedMs=17175 timerDelayMs=7175 eventLoopDelayHint="timer delayed 7175ms, likely event-loop starvation" operation=fetchWithTimeout url=https://api.telegram.org/bot[REDACTED]/getMe message="fetch timeout reached; aborting operation"
2026-05-21T05:08:04.316+00:00 diagnostic heartbeat: webhooks=0/0/0 active=1 waiting=0 queued=3
2026-05-21T05:08:04.330+00:00 agent/embedded embedded run tool end: runId=[redacted run id] tool=exec toolCallId=REDACTED

Impact and severity

Affected: Gateway network requests and channel/provider health checks that rely on timer-based fetch timeouts. Severity: High; configured timeout budgets cannot be enforced reliably when the event loop is saturated. Frequency: Observed once in the 2026-05-21 05:07-05:08 pidstat window. Consequence: user-visible failures, reconnects, and health checks can appear several seconds later than intended, and support diagnostics cannot query delayed timeout events directly.

Additional information

Related historical issue: #78695 reported larger event-loop starvation symptoms. This report is for the narrower missing structured diagnostic event when fetchWithTimeout itself detects timeout timer drift.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When a fetch timeout fires late enough to indicate event-loop starvation, the gateway should surface a structured diagnostic event with timeout, elapsed, and timer-delay fields so diagnostics/stability consumers can correlate timeout drift with gateway pressure.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: event-loop starvation delays fetch timeouts during gateway load [1 pull requests]