openclaw - 💡(How to fix) Fix [P0] 16 Telegram Accounts Cause Total Event Loop Starvation — System Unusable (ELU=1.0, 48s Timer Delays, 63s Fetch Timeouts) [1 comments, 2 participants]

xiaobu1112 · 2026-05-07T01:23:03Z

[openclaw] 🔴 P0: 16 Telegram Accounts Concurrent Polling Causes Total Event Loop Starvation — System Unusable After v2026.5.4 Upgrade Severity : 🔴 P0 — Compl… ## Fix / Workaround **Neither release addresses this issue.** The recent patch notes cover: - Exec approvals fallback (Windows rename-overwrite) — unrelated - Control UI chat history rendering — unrelated - Web fetch timeout cleanup (5.6, #78439) — tangentially related, only cleans up after timeout, doesn't prevent starvation - Plugin runtime fetch fix (5.6) — unrelated - Doctor/Codex route rollback (5.6) — unrelated ## Temporary Workarounds # 🔴 P0: 16 Telegram Accounts Concurrent Polling Causes Total Event Loop Starvation — System Unusable After v2026.5.4 Upgrade **Severity**: 🔴 P0 — Complete system unusability **Affected Versions**: v2026.5.4 → v2026.5.6 (latest) **Platform**: Windows 10.0.26200 x64, Node.js v25.9.0 --- ## TL;DR Configuring **16 Telegram bot accounts** simultaneously causes **catastrophic event loop starvation**: 100% ELU, 48-second P99 timer delays, 63-second fetch timeouts (configured for 10s), 164 polling stall/restart loops per startup. Agent task prep inflates from ~2s to **22.8s**. The entire Gateway process becomes effectively unusable. This is a systemic design flaw in the Telegram channel's concurrent polling mechanism — no rate limiting, no backpressure, no event-loop protection. --- ## Environment | Item | Value | |------|-------| | **OpenClaw Version** | 2026.5.6 (c97b9f7) | | **Node.js** | v25.9.0 | | **OS** | Windows 10.0.26200 (x64) | | **Telegram Accounts** | **16 bot accounts** (16 channels.telegram entries) | | **HTTP Proxy** | http://127.0.0.1:10808 | | **Agents** | 16 (main, analyst, ufc, edu, ppe, auto, learning, translator, food, social, trade, coding, assistant, geek, business, data) | | **Config Size** | 41KB | | **Log File** | ~1.3MB / 1,113 lines per startup session | --- ## Symptoms | Metric | Value | Expected | Severity | |--------|-------|----------|----------| | Event Loop Utilization | **1.0 (100%)** sustained | <0.3 | 🔴 Critical | | Event Loop Delay P99 | **48,150ms** (48 seconds) | <100ms | 🔴 Critical | | CPU Usage | **92.8-93.5%** | <30% | 🔴 Critical | | Telegram Polling Stall/Restart Cycles | **164** per startup | 0 | 🔴 Critical | | Fetch Timeouts (configured 10s) | **109 occurrences**, actual elapsed up to **63s** | 0 | 🔴 Critical | | WebSocket Handshake Failures | **9** (cause: startup-sidecars-pending) | 0 | 🟠 High | | Agent Prep Time (stream-ready only) | **22,832ms** | <2,000ms | 🔴 Critical | | User Task Response Time | **Tens of minutes** | Seconds | 🔴 Critical | --- ## Root Cause Analysis ### The Death Spiral ``` 16 Telegram accounts × concurrent long-poll getUpdates → All requests routed through local proxy (127.0.0.1:10808) → Proxy cannot sustain 16 concurrent long-poll connections → Network requests pile up on the single-threaded event loop → Timer callbacks delayed 50+ seconds → 10s fetch timeouts actually take 63s to fire → Each stalled getUpdates triggers "polling stall detected" → force restart → 16 accounts × restart cycles = 164 stall/restart loops → Event loop is permanently saturated (ELU = 1.0) → Agent startup stages starve for event loop time: - model-resolution: 15,973ms (expected <500ms) - system-prompt: 13,046ms (expected <200ms) - core-plugin-tools: 7,059ms (expected <100ms) → WebSocket handshakes fail (startup-sidecars-pending) → Control UI cannot connect → User tasks queue for tens of minutes → System is functionally dead ``` ### Design Flaws Identified 1. **No per-account polling rate limiting** — All 16 Telegram bots fire `getUpdates` simultaneously with no stagger or backoff. 2. **No global channel concurrency cap** — No limit on how many accounts can poll concurrently. 3. **No event-loop starvation protection** — When ELU approaches 1.0, no channel degradation or throttling kicks in. 4. **No proxy-aware configuration** — The Telegram channel doesn't account for proxy capacity constraints. 5. **Startup order issue** — Telegram channels start before Agent sidecars are ready, blocking `startup-sidecars-pending` handshakes. --- ## Log Evidence ### 1. Event Loop Starvation (48-second P99 delay) ```json { "subsystem": "diagnostic", "eventLoopDelayP99Ms": 48150.6, "eventLoopDelayMaxMs": 48150.6, "eventLoopUtilization": 1.0, "cpuCoreRatio": 0.928, "phase": "channels.telegram.start-account" } ``` ### 2. Fetch Timeout — 10s Configured, 63s Actual Elapsed ```json { "subsystem": "fetch-timeout", "timeoutMs": 10000, "elapsedMs": 63048, "timerDelayMs": 53048, "eventLoopDelayHint": "timer delayed 53048ms, likely event-loop starvation", "operation": "fetchWithTimeout", "url": "https://api.telegram.org/bot809652.../getMe" } ``` | Configured Timeout | Actual Elapsed | Timer Delay | EL Starvation | |--------------------|----------------|-------------|---------------| | 10,000ms | 63,048ms | 53,048ms | 53 seconds | | 10,000ms | 48,476ms | 38,476ms | 38 seconds | | 10,000ms | 3

openclaw2026-05-07 01:23:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#78695•Fetched 2026-05-07 03:33:42

View on GitHub

Comments

Participants

Timeline

Reactions

Author

xiaobu1112

Participants

clawsweeper[bot]

xiaobu1112

Timeline (top)

commented ×1

Error Message

[telegram][diag] polling cycle finished reason=polling stall detected inFlight=0 outcome=error durationMs=123122

Root Cause

Root Cause Analysis

Fix Action

Fix / Workaround

Neither release addresses this issue. The recent patch notes cover:

Exec approvals fallback (Windows rename-overwrite) — unrelated
Control UI chat history rendering — unrelated
Web fetch timeout cleanup (5.6, #78439) — tangentially related, only cleans up after timeout, doesn't prevent starvation
Plugin runtime fetch fix (5.6) — unrelated
Doctor/Codex route rollback (5.6) — unrelated

Temporary Workarounds

Code Example

16 Telegram accounts × concurrent long-poll getUpdates
    → All requests routed through local proxy (127.0.0.1:10808)
    → Proxy cannot sustain 16 concurrent long-poll connections
    → Network requests pile up on the single-threaded event loop
    → Timer callbacks delayed 50+ seconds
    → 10s fetch timeouts actually take 63s to fire
    → Each stalled getUpdates triggers "polling stall detected" → force restart
    → 16 accounts × restart cycles = 164 stall/restart loops
    → Event loop is permanently saturated (ELU = 1.0)
    → Agent startup stages starve for event loop time:
       - model-resolution: 15,973ms (expected <500ms)
       - system-prompt: 13,046ms (expected <200ms)
       - core-plugin-tools: 7,059ms (expected <100ms)
    → WebSocket handshakes fail (startup-sidecars-pending)
    → Control UI cannot connect
    → User tasks queue for tens of minutes
    → System is functionally dead

---

{
  "subsystem": "diagnostic",
  "eventLoopDelayP99Ms": 48150.6,
  "eventLoopDelayMaxMs": 48150.6,
  "eventLoopUtilization": 1.0,
  "cpuCoreRatio": 0.928,
  "phase": "channels.telegram.start-account"
}

---

{
  "subsystem": "fetch-timeout",
  "timeoutMs": 10000,
  "elapsedMs": 63048,
  "timerDelayMs": 53048,
  "eventLoopDelayHint": "timer delayed 53048ms, likely event-loop starvation",
  "operation": "fetchWithTimeout",
  "url": "https://api.telegram.org/bot809652.../getMe"
}

---

[telegram] Polling stall detected (active getUpdates stuck for 123.13s); forcing restart.
[telegram][diag] polling cycle finished reason=polling stall detected inFlight=0 outcome=error durationMs=123122
Telegram polling runner stopped (polling stall detected); restarting in 4.43s.
[telegram][diag] closing stale transport before rebuild
[telegram][diag] rebuilding transport for next polling cycle
→ Repeats 164 times

---

{
  "subsystem": "agent/embedded",
  "phase": "stream-ready",
  "totalMs": 22832,
  "stages": {
    "workspace-sandbox": "135ms",
    "core-plugin-tools": "7059ms",
    "bootstrap-context": "664ms",
    "bundle-tools": "1517ms",
    "system-prompt": "13046ms",
    "model-resolution": "15973ms"
  }
}

---

{
  "subsystem": "gateway/ws",
  "cause": "startup-sidecars-pending",
  "handshake": "failed",
  "durationMs": 2795
}

RAW_BUFFERClick to expand / collapse

🔴 P0: 16 Telegram Accounts Concurrent Polling Causes Total Event Loop Starvation — System Unusable After v2026.5.4 Upgrade

Severity: 🔴 P0 — Complete system unusability Affected Versions: v2026.5.4 → v2026.5.6 (latest) Platform: Windows 10.0.26200 x64, Node.js v25.9.0

TL;DR

Configuring 16 Telegram bot accounts simultaneously causes catastrophic event loop starvation: 100% ELU, 48-second P99 timer delays, 63-second fetch timeouts (configured for 10s), 164 polling stall/restart loops per startup. Agent task prep inflates from ~2s to 22.8s. The entire Gateway process becomes effectively unusable. This is a systemic design flaw in the Telegram channel's concurrent polling mechanism — no rate limiting, no backpressure, no event-loop protection.

Environment

Item	Value
OpenClaw Version	2026.5.6 (c97b9f7)
Node.js	v25.9.0
OS	Windows 10.0.26200 (x64)
Telegram Accounts	16 bot accounts (16 channels.telegram entries)
HTTP Proxy	http://127.0.0.1:10808
Agents	16 (main, analyst, ufc, edu, ppe, auto, learning, translator, food, social, trade, coding, assistant, geek, business, data)
Config Size	41KB
Log File	~1.3MB / 1,113 lines per startup session

Symptoms

Metric	Value	Expected	Severity
Event Loop Utilization	1.0 (100%) sustained	<0.3	🔴 Critical
Event Loop Delay P99	48,150ms (48 seconds)	<100ms	🔴 Critical
CPU Usage	92.8-93.5%	<30%	🔴 Critical
Telegram Polling Stall/Restart Cycles	164 per startup	0	🔴 Critical
Fetch Timeouts (configured 10s)	109 occurrences, actual elapsed up to 63s	0	🔴 Critical
WebSocket Handshake Failures	9 (cause: startup-sidecars-pending)	0	🟠 High
Agent Prep Time (stream-ready only)	22,832ms	<2,000ms	🔴 Critical
User Task Response Time	Tens of minutes	Seconds	🔴 Critical

Root Cause Analysis

The Death Spiral

16 Telegram accounts × concurrent long-poll getUpdates
    → All requests routed through local proxy (127.0.0.1:10808)
    → Proxy cannot sustain 16 concurrent long-poll connections
    → Network requests pile up on the single-threaded event loop
    → Timer callbacks delayed 50+ seconds
    → 10s fetch timeouts actually take 63s to fire
    → Each stalled getUpdates triggers "polling stall detected" → force restart
    → 16 accounts × restart cycles = 164 stall/restart loops
    → Event loop is permanently saturated (ELU = 1.0)
    → Agent startup stages starve for event loop time:
       - model-resolution: 15,973ms (expected <500ms)
       - system-prompt: 13,046ms (expected <200ms)
       - core-plugin-tools: 7,059ms (expected <100ms)
    → WebSocket handshakes fail (startup-sidecars-pending)
    → Control UI cannot connect
    → User tasks queue for tens of minutes
    → System is functionally dead

Design Flaws Identified

No per-account polling rate limiting — All 16 Telegram bots fire getUpdates simultaneously with no stagger or backoff.
No global channel concurrency cap — No limit on how many accounts can poll concurrently.
No event-loop starvation protection — When ELU approaches 1.0, no channel degradation or throttling kicks in.
No proxy-aware configuration — The Telegram channel doesn't account for proxy capacity constraints.
Startup order issue — Telegram channels start before Agent sidecars are ready, blocking startup-sidecars-pending handshakes.

Log Evidence

1. Event Loop Starvation (48-second P99 delay)

{
  "subsystem": "diagnostic",
  "eventLoopDelayP99Ms": 48150.6,
  "eventLoopDelayMaxMs": 48150.6,
  "eventLoopUtilization": 1.0,
  "cpuCoreRatio": 0.928,
  "phase": "channels.telegram.start-account"
}

2. Fetch Timeout — 10s Configured, 63s Actual Elapsed

{
  "subsystem": "fetch-timeout",
  "timeoutMs": 10000,
  "elapsedMs": 63048,
  "timerDelayMs": 53048,
  "eventLoopDelayHint": "timer delayed 53048ms, likely event-loop starvation",
  "operation": "fetchWithTimeout",
  "url": "https://api.telegram.org/bot809652.../getMe"
}

Configured Timeout	Actual Elapsed	Timer Delay	EL Starvation
10,000ms	63,048ms	53,048ms	53 seconds
10,000ms	48,476ms	38,476ms	38 seconds
10,000ms	38,654ms	28,654ms	28 seconds
10,000ms	29,568ms	19,568ms	19 seconds

3. Telegram Polling Storm (164 cycles per startup)

[telegram] Polling stall detected (active getUpdates stuck for 123.13s); forcing restart.
[telegram][diag] polling cycle finished reason=polling stall detected inFlight=0 outcome=error durationMs=123122
Telegram polling runner stopped (polling stall detected); restarting in 4.43s.
[telegram][diag] closing stale transport before rebuild
[telegram][diag] rebuilding transport for next polling cycle
→ Repeats 164 times

4. Agent Startup Prep Inflated 10x

{
  "subsystem": "agent/embedded",
  "phase": "stream-ready",
  "totalMs": 22832,
  "stages": {
    "workspace-sandbox": "135ms",
    "core-plugin-tools": "7059ms",
    "bootstrap-context": "664ms",
    "bundle-tools": "1517ms",
    "system-prompt": "13046ms",
    "model-resolution": "15973ms"
  }
}

Stage	Actual	Expected	Inflation
model-resolution	15,973ms	<500ms	32x
system-prompt	13,046ms	<200ms	65x
core-plugin-tools	7,059ms	<100ms	70x
Total stream-ready	22,832ms	<2,000ms	11x

This is BEFORE any model API call. The actual model request hasn't even started.

5. WebSocket Handshake Failures

{
  "subsystem": "gateway/ws",
  "cause": "startup-sidecars-pending",
  "handshake": "failed",
  "durationMs": 2795
}

9 WebSocket connection failures — all due to sidecars being blocked by the starving event loop.

v2026.5.5 / v2026.5.6 Assessment

Neither release addresses this issue. The recent patch notes cover:

Exec approvals fallback (Windows rename-overwrite) — unrelated
Control UI chat history rendering — unrelated
Web fetch timeout cleanup (5.6, #78439) — tangentially related, only cleans up after timeout, doesn't prevent starvation
Plugin runtime fetch fix (5.6) — unrelated
Doctor/Codex route rollback (5.6) — unrelated

This is a Telegram-channel-specific systemic design gap that has never been addressed in any release.

Requested Fixes

Must Have (P0)

Staggered polling startup — Stagger getUpdates initiation across accounts (e.g., 2-3 second intervals) to prevent simultaneous burst.
Per-account exponential backoff — On polling stall/failure, apply exponential backoff per account (not just a fixed 4s restart delay).
Global channel concurrency limiter — Cap concurrent active getUpdates requests (e.g., max 6-8 simultaneous). Queue remaining accounts.
Event-loop starvation circuit breaker — When eventLoopUtilization > 0.8 or eventLoopDelayP99 > 5000ms, automatically suspend non-critical channel polling until ELU recovers.

Should Have

Proxy-aware polling configuration — Option to set polling.concurrentAccounts cap based on proxy capacity.
Health-based auto-degradation — When proxy or Telegram API is degraded, automatically reduce polling frequency.
Startup phase isolation — Agent sidecar initialization should not be blocked by Telegram channel startup failures.

Temporary Workarounds

Reduce active Telegram accounts from 16 to 2-4
Increase polling interval in Telegram channel config
Verify proxy capacity — Confirm 127.0.0.1:10808 can handle 16 concurrent long-poll connections
Disable unused plugins — Remove active-memory config residue

Reproduction Steps

Configure 16 Telegram bot accounts in channels.telegram
All accounts route through a local HTTP proxy (127.0.0.1:10808)
Start Gateway: openclaw gateway start
Observe: Event loop hits 100% utilization within 30 seconds, all agent tasks become unusably slow

Reproducibility: 100% — occurs on every startup with this configuration.

Attachments

Full log: openclaw-2026-05-07.log (~1.3MB, available from C:\Users\Spring666\AppData\Local\Temp\openclaw\)
Config: openclaw.json (41KB, sanitized version available on request)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #orchestration issue #cache issue #memory leak #API versioning

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix [P0] 16 Telegram Accounts Cause Total Event Loop Starvation — System Unusable (ELU=1.0, 48s Timer Delays, 63s Fetch Timeouts) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root Cause Analysis

Fix Action

Fix / Workaround

Temporary Workarounds

Code Example

🔴 P0: 16 Telegram Accounts Concurrent Polling Causes Total Event Loop Starvation — System Unusable After v2026.5.4 Upgrade

TL;DR

Environment

Symptoms

Root Cause Analysis

The Death Spiral

Design Flaws Identified

Log Evidence

1. Event Loop Starvation (48-second P99 delay)

2. Fetch Timeout — 10s Configured, 63s Actual Elapsed

3. Telegram Polling Storm (164 cycles per startup)

4. Agent Startup Prep Inflated 10x

5. WebSocket Handshake Failures

v2026.5.5 / v2026.5.6 Assessment

Requested Fixes

Must Have (P0)

Should Have

Temporary Workarounds

Reproduction Steps

Attachments

Still need to ship something?

RELATED_DISCOVERY

TRENDING