openclaw - 💡(How to fix) Fix [BUG] Auth pre-warming blocks event loop for 60-90s, causing cascading timeouts

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Provider auth pre-warming blocks the Node.js event loop for 60-90 seconds during gateway startup, causing all concurrent I/O operations (HTTP API calls, MCP server startup, WebSocket connections) to time out. Affects all tested remote model providers.

Error Message

  • No error logs from the model APIs themselves; the calls succeed once the event loop becomes free.

Root Cause

Provider auth pre-warming blocks the Node.js event loop for 60-90 seconds during gateway startup, causing all concurrent I/O operations (HTTP API calls, MCP server startup, WebSocket connections) to time out. Affects all tested remote model providers.

Fix Action

Workaround

  • Wait 2-3 minutes after gateway start before sending messages.
  • Feishu bot identity may need 5-15 minutes to recover via background retry.

Code Example

Gateway Start
  |
  └─ Auth Pre-Warming (event loop blocked 60-90s)
       ├─ [TIMEOUT] Feishu tenant_access_token API (30s)
       ├─ [TIMEOUT] Feishu bot identity ping (30s, 5 retries over 15 min)
       ├─ [TIMEOUT] MCP server startup (30s, no tools for agent)
       ├─ [DELAY] Health check: 3-22s (normal <0.01s)
       ├─ [DELAY] Control UI: agents.list 14-16s, models.list 16-23s
       ├─ [DELAY] Message response: 80-126s (normal <10s)
       └─ [CPU] 97% utilization, 666MB memory for gateway process

---

provider auth state pre-warmed in 90402ms eventLoopMax=70732.7ms
liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu 
   eventLoopDelayP99Ms=22968 eventLoopUtilization=0.973 cpuCoreRatio=0.967

feishu[default]: bot info probe timed out after 30000ms; continuing startup
   (repeats 5 times over 15 minutes before bot identity resolves)

failed to start server "windows-automation" (...): MCP server connection timed out after 30000ms

stalled session: age=162s classification=stalled_agent_run
RAW_BUFFERClick to expand / collapse

Summary

Provider auth pre-warming blocks the Node.js event loop for 60-90 seconds during gateway startup, causing all concurrent I/O operations (HTTP API calls, MCP server startup, WebSocket connections) to time out. Affects all tested remote model providers.

Environment

  • OS: Windows 10 Pro 22H2 (19045)
  • OpenClaw: 2026.5.22 (a374c3a)
  • Node.js: v24.14.1
  • Shell: bash (via VS Code)

Reproduction Steps

  1. Start OpenClaw gateway with any remote model provider (e.g., DashScope, CTYun).
  2. Observe logs during the first 2 minutes of startup.

Reproduction rate: 100% across multiple restarts and model switches.

Tested Providers (All Affected)

ProviderModelAPI LatencyAuth Pre-WarmEvent Loop Max
DashScopeqwen-plus0.9s86,014 ms67,109 ms
DashScopeqwen-vl-max~1s78,464 ms61,942 ms
CTYun (Tianyi)GLM-5-Pro3.0s90,402 ms70,733 ms

Model APIs respond quickly via direct curl/PowerShell (0.2-1.0s). The issue is in OpenClaw's auth pre-warming, not upstream APIs.

Cascade Failure Chain

Gateway Start
  |
  └─ Auth Pre-Warming (event loop blocked 60-90s)
       ├─ [TIMEOUT] Feishu tenant_access_token API (30s)
       ├─ [TIMEOUT] Feishu bot identity ping (30s, 5 retries over 15 min)
       ├─ [TIMEOUT] MCP server startup (30s, no tools for agent)
       ├─ [DELAY] Health check: 3-22s (normal <0.01s)
       ├─ [DELAY] Control UI: agents.list 14-16s, models.list 16-23s
       ├─ [DELAY] Message response: 80-126s (normal <10s)
       └─ [CPU] 97% utilization, 666MB memory for gateway process

Key Log Evidence

provider auth state pre-warmed in 90402ms eventLoopMax=70732.7ms
liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu 
   eventLoopDelayP99Ms=22968 eventLoopUtilization=0.973 cpuCoreRatio=0.967

feishu[default]: bot info probe timed out after 30000ms; continuing startup
   (repeats 5 times over 15 minutes before bot identity resolves)

failed to start server "windows-automation" (...): MCP server connection timed out after 30000ms

stalled session: age=162s classification=stalled_agent_run

Feishu APIs (tenant_access_token, bot ping) respond in ~0.2s when tested directly with curl, but time out (30s) when called from the Node.js process during auth pre-warming.

Message Response Timeline (Real Example)

TimeEvent
09:00:16Message received from Feishu
09:01:24Core plugin tools loaded (+68s)
09:01:57MCP server timeout (+33s, total 101s)
09:02:08Stream ready (+11s, total 112s)
09:02:22Reply sent (+14s, total 126s)

Per-message overhead (post-auth-warming):

  • tool-policy: 2.3s
  • image-tool: 1.2s
  • plugin-tools: 1.6s
  • system-prompt: 3.3-5.1s
  • session-resource-loader: 4.1-7.2s

Total overhead before model call: 15-17s

Workaround

  • Wait 2-3 minutes after gateway start before sending messages.
  • Feishu bot identity may need 5-15 minutes to recover via background retry.

Expected Behavior

Auth pre-warming should be non-blocking (async) or use worker threads to avoid starving the event loop. A 60-90s synchronous block in the main event loop makes the gateway unusable during startup and degrades reliability permanently.

Affected Components

ComponentSeverityImpact
FeishuCRITICALbot identity cannot resolve, messages undeliverable for 5-15 min
WeComModerateWebSocket-based, somewhat resilient but messages delayed 47-99s
MCP ServersCRITICALall MCP servers fail to start (30s timeout)
Control UIModerateAPI calls delayed 14-25s
Health CheckMinoroccasionally slow (3-22s vs normal <0.01s)

Additional Context

  • Issue persists across multiple restarts and different model providers.
  • No error logs from the model APIs themselves; the calls succeed once the event loop becomes free.
  • The problem appears to be architectural: the gateway's startup sequence does not follow Node.js non-blocking design principles.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING