openclaw - 💡(How to fix) Fix [Bug]: 2026.5.22 Docker/WSL2 gateway event-loop starvation, 284s provider-auth prewarm, slow Telegram turn, and local RPC timeouts [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

On OpenClaw 2026.5.22 running in Docker on a Windows 11 + WSL2 host, the gateway can become severely unresponsive even after reducing the runtime surface area.

The strongest signal is gateway-level event-loop starvation / CPU pressure around provider-auth prewarm and embedded-run startup/context preparation. This affects both user-facing Telegram responsiveness and local gateway control-plane RPCs.

This appears related to, but not fully covered by, existing 2026.5.22 event-loop / provider-auth reports such as #85999, #86201, #86512, and #86073. The distinct data point here is a Dockerized WSL2 setup with a lightweight Telegram agent probe, large cached runtime context despite a small visible prompt, and local node-management RPCs timing out while the gateway is under pressure.

Root Cause

  • Provider-auth prewarm should not block or starve the gateway event loop for minutes.
  • Channel timers such as Telegram getMe should not fire tens of seconds late because the Node event loop is blocked.
  • A lightweight agent with a small visible prompt should not carry unexpectedly large cached runtime context unless the extra context source is visible/diagnosable.
  • Local gateway RPCs such as nodes list, nodes status, and nodes remove should remain responsive or degrade gracefully even while provider auth or agent startup work is happening.
  • RAG/tool availability should not by itself cause a tiny greeting-style Telegram turn to take roughly 151s.

Fix Action

Fixed

Code Example

total startup: about 36.4s
model-resolution: about 16.7s
auth: under 1s
attempt-workspace: about 18.8s

---

provider auth state pre-warmed in about 284s
high eventLoopMax during the same startup/prewarm window

---

openclaw nodes remove --node <stale-disconnected-node>  -> gateway timeout after 10000ms
openclaw nodes list                                    -> timed out / unresponsive
openclaw nodes status                                  -> timed out / unresponsive

---

same OpenClaw version
same lightweight agent files
same model initially
separate state directory
separate container/compose project
separate gateway port
Telegram only for that one agent
no WhatsApp
no other Telegram accounts
no webhooks
no cron/background jobs
no old sessions/transcripts/cache
minimal plugins/tools
Phase 1: greeting only, no RAG/MCP if possible
Phase 2: add RAG/MCP and test one small corpus query
RAW_BUFFERClick to expand / collapse

Bug type

Performance / responsiveness regression

Summary

On OpenClaw 2026.5.22 running in Docker on a Windows 11 + WSL2 host, the gateway can become severely unresponsive even after reducing the runtime surface area.

The strongest signal is gateway-level event-loop starvation / CPU pressure around provider-auth prewarm and embedded-run startup/context preparation. This affects both user-facing Telegram responsiveness and local gateway control-plane RPCs.

This appears related to, but not fully covered by, existing 2026.5.22 event-loop / provider-auth reports such as #85999, #86201, #86512, and #86073. The distinct data point here is a Dockerized WSL2 setup with a lightweight Telegram agent probe, large cached runtime context despite a small visible prompt, and local node-management RPCs timing out while the gateway is under pressure.

Environment

  • OpenClaw: 2026.5.22
  • Host: Windows 11 with WSL2
  • Runtime: Docker container gateway
  • Node in container: v24.x
  • Primary operational channel: Telegram
  • Gateway bind: LAN/loopback style local deployment
  • Model routing: OpenAI/Codex OAuth path with fallbacks configured
  • Built-in memory/wiki/dreaming/session-memory paths: disabled during the investigation
  • Plugin discovery: allowlist-based
  • WhatsApp slash-command troubleshooting: parked during this investigation

What was simplified before the latest observation

The following were disabled or reduced before the latest test window:

  • built-in memory-core
  • built-in memory-wiki
  • memory-wiki bridge indexing / auto-compile behavior
  • memory-core dreaming phases
  • session-memory hook
  • default automatic memory search
  • compaction memory flush
  • memory plugins removed from active plugin allowlist
  • operational testing shifted to Telegram rather than WhatsApp slash commands

A stale Windows nodehost scheduled task from an older setup was also found and disabled. It was not running at the time of the latest observations.

Observed symptoms

  1. Slow lightweight Telegram agent turn

A lightweight agent used as a probe had a small visible prompt/bootstrap, but still showed large cached runtime context and slow end-to-end latency.

Observed values:

  • visible/tracked prompt estimate: around 2.9k tokens
  • actual cached runtime context in observed session logs: roughly 20k-33k tokens
  • one Telegram inbound-to-outbound turn: roughly 151 seconds
  • RAG tools executed successfully in that session, so retrieval execution itself did not appear to be the main delay
  1. Embedded startup overhead before useful work

For the same slow turn, embedded startup trace showed approximately:

total startup: about 36.4s
model-resolution: about 16.7s
auth: under 1s
attempt-workspace: about 18.8s

This suggests substantial gateway/runtime overhead before useful model/tool work.

  1. Delayed Telegram timer / event-loop starvation evidence

During the same window, a Telegram getMe 10s timeout fired after roughly 65s, with about 55s timer delay. The gateway log labelled this as likely event-loop starvation.

That makes the Telegram timeout look like a symptom of a blocked/delayed Node event loop, not a primary Telegram network issue.

  1. Severe provider-auth prewarm after restart

A later gateway startup showed provider-auth prewarm around 284 seconds with high event-loop max delay.

Approximate observation:

provider auth state pre-warmed in about 284s
high eventLoopMax during the same startup/prewarm window
  1. Local gateway management RPCs timed out

While the gateway was in this degraded state, simple local node-management RPCs from inside the gateway container timed out after 10s.

Examples:

openclaw nodes remove --node <stale-disconnected-node>  -> gateway timeout after 10000ms
openclaw nodes list                                    -> timed out / unresponsive
openclaw nodes status                                  -> timed out / unresponsive

This indicates the problem is not only Telegram UX. The local gateway control plane itself can become unresponsive.

Expected behavior

  • Provider-auth prewarm should not block or starve the gateway event loop for minutes.
  • Channel timers such as Telegram getMe should not fire tens of seconds late because the Node event loop is blocked.
  • A lightweight agent with a small visible prompt should not carry unexpectedly large cached runtime context unless the extra context source is visible/diagnosable.
  • Local gateway RPCs such as nodes list, nodes status, and nodes remove should remain responsive or degrade gracefully even while provider auth or agent startup work is happening.
  • RAG/tool availability should not by itself cause a tiny greeting-style Telegram turn to take roughly 151s.

Actual behavior

  • Provider-auth prewarm was observed taking around 284s.
  • Event-loop timer delay was observed in a Telegram fetch timeout: a 10s timeout fired after about 65s.
  • A lightweight Telegram agent turn took about 151s inbound-to-outbound.
  • Embedded startup showed about 36.4s before useful work, including about 16.7s in model resolution and about 18.8s in attempt workspace.
  • Local node-management RPCs timed out, showing that the gateway control plane became unresponsive.

Current interpretation

The best current interpretation is that the gateway is experiencing event-loop starvation / CPU pressure in 2026.5.22, likely amplified by provider-auth prewarm, model/provider resolution, workspace/run setup, context assembly, and/or loaded runtime surfaces.

The symptoms line up with other public 2026.5.22 reports, but this case adds Docker/WSL2 evidence and a clean lightweight-agent probe where visible prompt size is small but cached runtime context is unexpectedly large.

Planned local isolation test

The next planned local diagnostic is an isolated clean-gateway comparison:

same OpenClaw version
same lightweight agent files
same model initially
separate state directory
separate container/compose project
separate gateway port
Telegram only for that one agent
no WhatsApp
no other Telegram accounts
no webhooks
no cron/background jobs
no old sessions/transcripts/cache
minimal plugins/tools
Phase 1: greeting only, no RAG/MCP if possible
Phase 2: add RAG/MCP and test one small corpus query

Interpretation of that test:

  • If the agent is fast in the clean gateway, the full current gateway environment is likely amplifying the regression.
  • If it remains slow in the clean gateway, focus should shift to the embedded runtime path, provider/model-resolution, context assembly, and provider-auth behavior.

Related issues / likely overlap

  • #85999: provider-auth prewarm blocking event loop on 2026.5.22
  • #86201: WSL2 slow responses / high CPU after 2026.5.22
  • #86512: 2026.5.22 high CPU and request latency regression
  • #86073: Windows 11 WebUI severe slowness on 2026.5.22

This report is filed separately because the Docker/WSL2 setup, lightweight-agent probe, large cached runtime context despite small visible prompt, and local gateway RPC timeout combination may help narrow the scope.

Sanitization note

No bot tokens, phone numbers, raw session IDs, raw node IDs, private config contents, private logs, or personal/group identifiers are included in this report.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • Provider-auth prewarm should not block or starve the gateway event loop for minutes.
  • Channel timers such as Telegram getMe should not fire tens of seconds late because the Node event loop is blocked.
  • A lightweight agent with a small visible prompt should not carry unexpectedly large cached runtime context unless the extra context source is visible/diagnosable.
  • Local gateway RPCs such as nodes list, nodes status, and nodes remove should remain responsive or degrade gracefully even while provider auth or agent startup work is happening.
  • RAG/tool availability should not by itself cause a tiny greeting-style Telegram turn to take roughly 151s.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: 2026.5.22 Docker/WSL2 gateway event-loop starvation, 284s provider-auth prewarm, slow Telegram turn, and local RPC timeouts [1 pull requests]