openclaw - 💡(How to fix) Fix [P0] 16 Telegram Accounts Cause Total Event Loop Starvation — System Unusable (ELU=1.0, 48s Timer Delays, 63s Fetch Timeouts) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#78695Fetched 2026-05-07 03:33:42
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
2
Timeline (top)
commented ×1

Error Message

[telegram][diag] polling cycle finished reason=polling stall detected inFlight=0 outcome=error durationMs=123122

Root Cause

Root Cause Analysis

Fix Action

Fix / Workaround

Neither release addresses this issue. The recent patch notes cover:

  • Exec approvals fallback (Windows rename-overwrite) — unrelated
  • Control UI chat history rendering — unrelated
  • Web fetch timeout cleanup (5.6, #78439) — tangentially related, only cleans up after timeout, doesn't prevent starvation
  • Plugin runtime fetch fix (5.6) — unrelated
  • Doctor/Codex route rollback (5.6) — unrelated

Temporary Workarounds

Code Example

16 Telegram accounts × concurrent long-poll getUpdates
All requests routed through local proxy (127.0.0.1:10808)
Proxy cannot sustain 16 concurrent long-poll connections
Network requests pile up on the single-threaded event loop
Timer callbacks delayed 50+ seconds
    → 10s fetch timeouts actually take 63s to fire
Each stalled getUpdates triggers "polling stall detected" → force restart
16 accounts × restart cycles = 164 stall/restart loops
Event loop is permanently saturated (ELU = 1.0)
Agent startup stages starve for event loop time:
       - model-resolution: 15,973ms (expected <500ms)
       - system-prompt: 13,046ms (expected <200ms)
       - core-plugin-tools: 7,059ms (expected <100ms)
WebSocket handshakes fail (startup-sidecars-pending)
Control UI cannot connect
User tasks queue for tens of minutes
System is functionally dead

---

{
  "subsystem": "diagnostic",
  "eventLoopDelayP99Ms": 48150.6,
  "eventLoopDelayMaxMs": 48150.6,
  "eventLoopUtilization": 1.0,
  "cpuCoreRatio": 0.928,
  "phase": "channels.telegram.start-account"
}

---

{
  "subsystem": "fetch-timeout",
  "timeoutMs": 10000,
  "elapsedMs": 63048,
  "timerDelayMs": 53048,
  "eventLoopDelayHint": "timer delayed 53048ms, likely event-loop starvation",
  "operation": "fetchWithTimeout",
  "url": "https://api.telegram.org/bot809652.../getMe"
}

---

[telegram] Polling stall detected (active getUpdates stuck for 123.13s); forcing restart.
[telegram][diag] polling cycle finished reason=polling stall detected inFlight=0 outcome=error durationMs=123122
Telegram polling runner stopped (polling stall detected); restarting in 4.43s.
[telegram][diag] closing stale transport before rebuild
[telegram][diag] rebuilding transport for next polling cycle
Repeats 164 times

---

{
  "subsystem": "agent/embedded",
  "phase": "stream-ready",
  "totalMs": 22832,
  "stages": {
    "workspace-sandbox": "135ms",
    "core-plugin-tools": "7059ms",
    "bootstrap-context": "664ms",
    "bundle-tools": "1517ms",
    "system-prompt": "13046ms",
    "model-resolution": "15973ms"
  }
}

---

{
  "subsystem": "gateway/ws",
  "cause": "startup-sidecars-pending",
  "handshake": "failed",
  "durationMs": 2795
}
RAW_BUFFERClick to expand / collapse

🔴 P0: 16 Telegram Accounts Concurrent Polling Causes Total Event Loop Starvation — System Unusable After v2026.5.4 Upgrade

Severity: 🔴 P0 — Complete system unusability Affected Versions: v2026.5.4 → v2026.5.6 (latest) Platform: Windows 10.0.26200 x64, Node.js v25.9.0


TL;DR

Configuring 16 Telegram bot accounts simultaneously causes catastrophic event loop starvation: 100% ELU, 48-second P99 timer delays, 63-second fetch timeouts (configured for 10s), 164 polling stall/restart loops per startup. Agent task prep inflates from ~2s to 22.8s. The entire Gateway process becomes effectively unusable. This is a systemic design flaw in the Telegram channel's concurrent polling mechanism — no rate limiting, no backpressure, no event-loop protection.


Environment

ItemValue
OpenClaw Version2026.5.6 (c97b9f7)
Node.jsv25.9.0
OSWindows 10.0.26200 (x64)
Telegram Accounts16 bot accounts (16 channels.telegram entries)
HTTP Proxyhttp://127.0.0.1:10808
Agents16 (main, analyst, ufc, edu, ppe, auto, learning, translator, food, social, trade, coding, assistant, geek, business, data)
Config Size41KB
Log File~1.3MB / 1,113 lines per startup session

Symptoms

MetricValueExpectedSeverity
Event Loop Utilization1.0 (100%) sustained<0.3🔴 Critical
Event Loop Delay P9948,150ms (48 seconds)<100ms🔴 Critical
CPU Usage92.8-93.5%<30%🔴 Critical
Telegram Polling Stall/Restart Cycles164 per startup0🔴 Critical
Fetch Timeouts (configured 10s)109 occurrences, actual elapsed up to 63s0🔴 Critical
WebSocket Handshake Failures9 (cause: startup-sidecars-pending)0🟠 High
Agent Prep Time (stream-ready only)22,832ms<2,000ms🔴 Critical
User Task Response TimeTens of minutesSeconds🔴 Critical

Root Cause Analysis

The Death Spiral

16 Telegram accounts × concurrent long-poll getUpdates
    → All requests routed through local proxy (127.0.0.1:10808)
    → Proxy cannot sustain 16 concurrent long-poll connections
    → Network requests pile up on the single-threaded event loop
    → Timer callbacks delayed 50+ seconds
    → 10s fetch timeouts actually take 63s to fire
    → Each stalled getUpdates triggers "polling stall detected" → force restart
    → 16 accounts × restart cycles = 164 stall/restart loops
    → Event loop is permanently saturated (ELU = 1.0)
    → Agent startup stages starve for event loop time:
       - model-resolution: 15,973ms (expected <500ms)
       - system-prompt: 13,046ms (expected <200ms)
       - core-plugin-tools: 7,059ms (expected <100ms)
    → WebSocket handshakes fail (startup-sidecars-pending)
    → Control UI cannot connect
    → User tasks queue for tens of minutes
    → System is functionally dead

Design Flaws Identified

  1. No per-account polling rate limiting — All 16 Telegram bots fire getUpdates simultaneously with no stagger or backoff.
  2. No global channel concurrency cap — No limit on how many accounts can poll concurrently.
  3. No event-loop starvation protection — When ELU approaches 1.0, no channel degradation or throttling kicks in.
  4. No proxy-aware configuration — The Telegram channel doesn't account for proxy capacity constraints.
  5. Startup order issue — Telegram channels start before Agent sidecars are ready, blocking startup-sidecars-pending handshakes.

Log Evidence

1. Event Loop Starvation (48-second P99 delay)

{
  "subsystem": "diagnostic",
  "eventLoopDelayP99Ms": 48150.6,
  "eventLoopDelayMaxMs": 48150.6,
  "eventLoopUtilization": 1.0,
  "cpuCoreRatio": 0.928,
  "phase": "channels.telegram.start-account"
}

2. Fetch Timeout — 10s Configured, 63s Actual Elapsed

{
  "subsystem": "fetch-timeout",
  "timeoutMs": 10000,
  "elapsedMs": 63048,
  "timerDelayMs": 53048,
  "eventLoopDelayHint": "timer delayed 53048ms, likely event-loop starvation",
  "operation": "fetchWithTimeout",
  "url": "https://api.telegram.org/bot809652.../getMe"
}
Configured TimeoutActual ElapsedTimer DelayEL Starvation
10,000ms63,048ms53,048ms53 seconds
10,000ms48,476ms38,476ms38 seconds
10,000ms38,654ms28,654ms28 seconds
10,000ms29,568ms19,568ms19 seconds

3. Telegram Polling Storm (164 cycles per startup)

[telegram] Polling stall detected (active getUpdates stuck for 123.13s); forcing restart.
[telegram][diag] polling cycle finished reason=polling stall detected inFlight=0 outcome=error durationMs=123122
Telegram polling runner stopped (polling stall detected); restarting in 4.43s.
[telegram][diag] closing stale transport before rebuild
[telegram][diag] rebuilding transport for next polling cycle
→ Repeats 164 times

4. Agent Startup Prep Inflated 10x

{
  "subsystem": "agent/embedded",
  "phase": "stream-ready",
  "totalMs": 22832,
  "stages": {
    "workspace-sandbox": "135ms",
    "core-plugin-tools": "7059ms",
    "bootstrap-context": "664ms",
    "bundle-tools": "1517ms",
    "system-prompt": "13046ms",
    "model-resolution": "15973ms"
  }
}
StageActualExpectedInflation
model-resolution15,973ms<500ms32x
system-prompt13,046ms<200ms65x
core-plugin-tools7,059ms<100ms70x
Total stream-ready22,832ms<2,000ms11x

This is BEFORE any model API call. The actual model request hasn't even started.

5. WebSocket Handshake Failures

{
  "subsystem": "gateway/ws",
  "cause": "startup-sidecars-pending",
  "handshake": "failed",
  "durationMs": 2795
}

9 WebSocket connection failures — all due to sidecars being blocked by the starving event loop.


v2026.5.5 / v2026.5.6 Assessment

Neither release addresses this issue. The recent patch notes cover:

  • Exec approvals fallback (Windows rename-overwrite) — unrelated
  • Control UI chat history rendering — unrelated
  • Web fetch timeout cleanup (5.6, #78439) — tangentially related, only cleans up after timeout, doesn't prevent starvation
  • Plugin runtime fetch fix (5.6) — unrelated
  • Doctor/Codex route rollback (5.6) — unrelated

This is a Telegram-channel-specific systemic design gap that has never been addressed in any release.


Requested Fixes

Must Have (P0)

  1. Staggered polling startup — Stagger getUpdates initiation across accounts (e.g., 2-3 second intervals) to prevent simultaneous burst.
  2. Per-account exponential backoff — On polling stall/failure, apply exponential backoff per account (not just a fixed 4s restart delay).
  3. Global channel concurrency limiter — Cap concurrent active getUpdates requests (e.g., max 6-8 simultaneous). Queue remaining accounts.
  4. Event-loop starvation circuit breaker — When eventLoopUtilization > 0.8 or eventLoopDelayP99 > 5000ms, automatically suspend non-critical channel polling until ELU recovers.

Should Have

  1. Proxy-aware polling configuration — Option to set polling.concurrentAccounts cap based on proxy capacity.
  2. Health-based auto-degradation — When proxy or Telegram API is degraded, automatically reduce polling frequency.
  3. Startup phase isolation — Agent sidecar initialization should not be blocked by Telegram channel startup failures.

Temporary Workarounds

  1. Reduce active Telegram accounts from 16 to 2-4
  2. Increase polling interval in Telegram channel config
  3. Verify proxy capacity — Confirm 127.0.0.1:10808 can handle 16 concurrent long-poll connections
  4. Disable unused plugins — Remove active-memory config residue

Reproduction Steps

  1. Configure 16 Telegram bot accounts in channels.telegram
  2. All accounts route through a local HTTP proxy (127.0.0.1:10808)
  3. Start Gateway: openclaw gateway start
  4. Observe: Event loop hits 100% utilization within 30 seconds, all agent tasks become unusably slow

Reproducibility: 100% — occurs on every startup with this configuration.


Attachments

  • Full log: openclaw-2026-05-07.log (~1.3MB, available from C:\Users\Spring666\AppData\Local\Temp\openclaw\)
  • Config: openclaw.json (41KB, sanitized version available on request)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [P0] 16 Telegram Accounts Cause Total Event Loop Starvation — System Unusable (ELU=1.0, 48s Timer Delays, 63s Fetch Timeouts) [1 comments, 2 participants]