- Provider-auth prewarm should not block or starve the gateway event loop for minutes. - Channel timers such as Telegram `getMe` should not fire tens of seconds late because the Node event loop is blocked. - A lightweight agent with a small visible prompt should not carry unexpectedly large cached runtime context unless the extra context source is visible/diagnosable. - Local gateway RPCs such as `nodes list`, `nodes status`, and `nodes remove` should remain responsive or degrade gracefully even while provider auth or agent startup work is happening. - RAG/tool availability should not by itself cause a tiny greeting-style Telegram turn to take roughly 151s.

openclaw - 💡(How to fix) Fix [Bug]: 2026.5.22 Docker/WSL2 gateway event-loop starvation, 284s provider-auth prewarm, slow Telegram turn, and local RPC timeouts [1 pull requests]

openclaw2026-05-26 04:21:54

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

On OpenClaw 2026.5.22 running in Docker on a Windows 11 + WSL2 host, the gateway can become severely unresponsive even after reducing the runtime surface area.

The strongest signal is gateway-level event-loop starvation / CPU pressure around provider-auth prewarm and embedded-run startup/context preparation. This affects both user-facing Telegram responsiveness and local gateway control-plane RPCs.

This appears related to, but not fully covered by, existing 2026.5.22 event-loop / provider-auth reports such as #85999, #86201, #86512, and #86073. The distinct data point here is a Dockerized WSL2 setup with a lightweight Telegram agent probe, large cached runtime context despite a small visible prompt, and local node-management RPCs timing out while the gateway is under pressure.

Root Cause

Provider-auth prewarm should not block or starve the gateway event loop for minutes.
Channel timers such as Telegram getMe should not fire tens of seconds late because the Node event loop is blocked.
A lightweight agent with a small visible prompt should not carry unexpectedly large cached runtime context unless the extra context source is visible/diagnosable.
Local gateway RPCs such as nodes list, nodes status, and nodes remove should remain responsive or degrade gracefully even while provider auth or agent startup work is happening.
RAG/tool availability should not by itself cause a tiny greeting-style Telegram turn to take roughly 151s.

Fix Action

Fixed

Fixed by PR: WORKING: All Microsoft Issues and PRs (refresh) (https://github.com/openclaw/openclaw/pull/74163)

Code Example

total startup: about 36.4s
model-resolution: about 16.7s
auth: under 1s
attempt-workspace: about 18.8s

---

provider auth state pre-warmed in about 284s
high eventLoopMax during the same startup/prewarm window

---

openclaw nodes remove --node <stale-disconnected-node>  -> gateway timeout after 10000ms
openclaw nodes list                                    -> timed out / unresponsive
openclaw nodes status                                  -> timed out / unresponsive

---

same OpenClaw version
same lightweight agent files
same model initially
separate state directory
separate container/compose project
separate gateway port
Telegram only for that one agent
no WhatsApp
no other Telegram accounts
no webhooks
no cron/background jobs
no old sessions/transcripts/cache
minimal plugins/tools
Phase 1: greeting only, no RAG/MCP if possible
Phase 2: add RAG/MCP and test one small corpus query

RAW_BUFFERClick to expand / collapse

Bug type

Performance / responsiveness regression

Summary

On OpenClaw 2026.5.22 running in Docker on a Windows 11 + WSL2 host, the gateway can become severely unresponsive even after reducing the runtime surface area.

Environment

OpenClaw: 2026.5.22
Host: Windows 11 with WSL2
Runtime: Docker container gateway
Node in container: v24.x
Primary operational channel: Telegram
Gateway bind: LAN/loopback style local deployment
Model routing: OpenAI/Codex OAuth path with fallbacks configured
Built-in memory/wiki/dreaming/session-memory paths: disabled during the investigation
Plugin discovery: allowlist-based
WhatsApp slash-command troubleshooting: parked during this investigation

What was simplified before the latest observation

The following were disabled or reduced before the latest test window:

built-in memory-core
built-in memory-wiki
memory-wiki bridge indexing / auto-compile behavior
memory-core dreaming phases
session-memory hook
default automatic memory search
compaction memory flush
memory plugins removed from active plugin allowlist
operational testing shifted to Telegram rather than WhatsApp slash commands

A stale Windows nodehost scheduled task from an older setup was also found and disabled. It was not running at the time of the latest observations.

Observed symptoms

Slow lightweight Telegram agent turn

A lightweight agent used as a probe had a small visible prompt/bootstrap, but still showed large cached runtime context and slow end-to-end latency.

Observed values:

visible/tracked prompt estimate: around 2.9k tokens
actual cached runtime context in observed session logs: roughly 20k-33k tokens
one Telegram inbound-to-outbound turn: roughly 151 seconds
RAG tools executed successfully in that session, so retrieval execution itself did not appear to be the main delay

Embedded startup overhead before useful work

For the same slow turn, embedded startup trace showed approximately:

total startup: about 36.4s
model-resolution: about 16.7s
auth: under 1s
attempt-workspace: about 18.8s

This suggests substantial gateway/runtime overhead before useful model/tool work.

Delayed Telegram timer / event-loop starvation evidence

During the same window, a Telegram getMe 10s timeout fired after roughly 65s, with about 55s timer delay. The gateway log labelled this as likely event-loop starvation.

That makes the Telegram timeout look like a symptom of a blocked/delayed Node event loop, not a primary Telegram network issue.

Severe provider-auth prewarm after restart

A later gateway startup showed provider-auth prewarm around 284 seconds with high event-loop max delay.

Approximate observation:

provider auth state pre-warmed in about 284s
high eventLoopMax during the same startup/prewarm window

Local gateway management RPCs timed out

While the gateway was in this degraded state, simple local node-management RPCs from inside the gateway container timed out after 10s.

Examples:

openclaw nodes remove --node <stale-disconnected-node>  -> gateway timeout after 10000ms
openclaw nodes list                                    -> timed out / unresponsive
openclaw nodes status                                  -> timed out / unresponsive

This indicates the problem is not only Telegram UX. The local gateway control plane itself can become unresponsive.

Expected behavior

Provider-auth prewarm should not block or starve the gateway event loop for minutes.
Channel timers such as Telegram getMe should not fire tens of seconds late because the Node event loop is blocked.
A lightweight agent with a small visible prompt should not carry unexpectedly large cached runtime context unless the extra context source is visible/diagnosable.
Local gateway RPCs such as nodes list, nodes status, and nodes remove should remain responsive or degrade gracefully even while provider auth or agent startup work is happening.
RAG/tool availability should not by itself cause a tiny greeting-style Telegram turn to take roughly 151s.

Actual behavior

Provider-auth prewarm was observed taking around 284s.
Event-loop timer delay was observed in a Telegram fetch timeout: a 10s timeout fired after about 65s.
A lightweight Telegram agent turn took about 151s inbound-to-outbound.
Embedded startup showed about 36.4s before useful work, including about 16.7s in model resolution and about 18.8s in attempt workspace.
Local node-management RPCs timed out, showing that the gateway control plane became unresponsive.

Current interpretation

The best current interpretation is that the gateway is experiencing event-loop starvation / CPU pressure in 2026.5.22, likely amplified by provider-auth prewarm, model/provider resolution, workspace/run setup, context assembly, and/or loaded runtime surfaces.

The symptoms line up with other public 2026.5.22 reports, but this case adds Docker/WSL2 evidence and a clean lightweight-agent probe where visible prompt size is small but cached runtime context is unexpectedly large.

Planned local isolation test

The next planned local diagnostic is an isolated clean-gateway comparison:

same OpenClaw version
same lightweight agent files
same model initially
separate state directory
separate container/compose project
separate gateway port
Telegram only for that one agent
no WhatsApp
no other Telegram accounts
no webhooks
no cron/background jobs
no old sessions/transcripts/cache
minimal plugins/tools
Phase 1: greeting only, no RAG/MCP if possible
Phase 2: add RAG/MCP and test one small corpus query

Interpretation of that test:

If the agent is fast in the clean gateway, the full current gateway environment is likely amplifying the regression.
If it remains slow in the clean gateway, focus should shift to the embedded runtime path, provider/model-resolution, context assembly, and provider-auth behavior.

Related issues / likely overlap

#85999: provider-auth prewarm blocking event loop on 2026.5.22
#86201: WSL2 slow responses / high CPU after 2026.5.22
#86512: 2026.5.22 high CPU and request latency regression
#86073: Windows 11 WebUI severe slowness on 2026.5.22

This report is filed separately because the Docker/WSL2 setup, lightweight-agent probe, large cached runtime context despite small visible prompt, and local gateway RPC timeout combination may help narrow the scope.

Sanitization note

No bot tokens, phone numbers, raw session IDs, raw node IDs, private config contents, private logs, or personal/group identifiers are included in this report.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Provider-auth prewarm should not block or starve the gateway event loop for minutes.
Channel timers such as Telegram getMe should not fire tens of seconds late because the Node event loop is blocked.
A lightweight agent with a small visible prompt should not carry unexpectedly large cached runtime context unless the extra context source is visible/diagnosable.
Local gateway RPCs such as nodes list, nodes status, and nodes remove should remain responsive or degrade gracefully even while provider auth or agent startup work is happening.
RAG/tool availability should not by itself cause a tiny greeting-style Telegram turn to take roughly 151s.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: 2026.5.22 Docker/WSL2 gateway event-loop starvation, 284s provider-auth prewarm, slow Telegram turn, and local RPC timeouts [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

Code Example

Bug type

Summary

Environment

What was simplified before the latest observation

Observed symptoms

Expected behavior

Actual behavior

Current interpretation

Planned local isolation test

Related issues / likely overlap

Sanitization note

FAQ

Expected behavior

Still need to ship something?

TRENDING