openclaw - 💡(How to fix) Fix [Bug]: Local model provider calls thread block gateway event loop on Windows beta; trivial infer run takes ~4 minutes [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

On Windows with OpenClaw 2026.5.24-beta.1, local model calls appear to block or starve the Gateway event loop. Even a trivial fresh prompt like hi, how are you or:

openclaw infer model run --model llamacpp/qwen3.5-9b-instruct-Q5_K_M.gguf --prompt "hi" --json

takes around 3 minutes.

The underlying llama.cpp backend can generate quickly in isolation, but when invoked through OpenClaw the Gateway shows repeated event-loop starvation warnings, slow WebSocket RPCs, Telegram fetch timer delays, and stalled sessions with activeWorkKind=model_call.

This reproduces with both llama.cpp and Ollama backends, so it does not look specific to one local server implementation.

Root Cause

On Windows with OpenClaw 2026.5.24-beta.1, local model calls appear to block or starve the Gateway event loop. Even a trivial fresh prompt like hi, how are you or:

openclaw infer model run --model llamacpp/qwen3.5-9b-instruct-Q5_K_M.gguf --prompt "hi" --json

takes around 3 minutes.

The underlying llama.cpp backend can generate quickly in isolation, but when invoked through OpenClaw the Gateway shows repeated event-loop starvation warnings, slow WebSocket RPCs, Telegram fetch timer delays, and stalled sessions with activeWorkKind=model_call.

This reproduces with both llama.cpp and Ollama backends, so it does not look specific to one local server implementation.

Fix Action

Fixed

Code Example

Relevant log excerpts:

[diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu interval=49s eventLoopDelayP99Ms=29813.1 eventLoopDelayMaxMs=29813.1 eventLoopUtilization=1 cpuCoreRatio=0.987 active=1 waiting=0 queued=0 work=[active=agent:main:main(processing/embedded_run,q=1,age=56s last=embedded_run:started)]

[fetch-timeout] fetch timeout after 9999ms (elapsed 18183ms) timer delayed 8184ms, likely event-loop starvation operation=fetchWithTimeout url=https://api.telegram.org/.../getMe

[agent/embedded] [trace:embedded-run] prep stages: runId=270498cf-d78a-4f58-ae81-f271e9ee4738 sessionId=d432c2dd-b18c-4ae8-947a-1dc7b409f875 phase=stream-ready totalMs=11071 stages=workspace-sandbox:2ms@2ms,skills:1ms@3ms,core-plugin-tools:2096ms@2099ms,bootstrap-context:18ms@2117ms,bundle-tools:338ms@2455ms,system-prompt:5976ms@8431ms,session-resource-loader:2604ms@11035ms,agent-session:5ms@11040ms,stream-setup:30ms@11070ms

[diagnostic] long-running session: sessionId=d432c2dd-b18c-4ae8-947a-1dc7b409f875 sessionKey=agent:main:main state=processing age=135s queueDepth=1 reason=queued_behind_active_work classification=long_running activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=87s recovery=none

[diagnostic] stalled session: sessionId=d432c2dd-b18c-4ae8-947a-1dc7b409f875 sessionKey=agent:main:main state=processing age=140s queueDepth=0 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=model_call lastProgress=model_call:started

Slow RPC examples from the same window:

sessions.list 29482ms
chat.history 30201ms
sessions.list 20701ms
sessions.list 25379ms
models.list 29399ms
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Yes

Summary

On Windows with OpenClaw 2026.5.24-beta.1, local model calls appear to block or starve the Gateway event loop. Even a trivial fresh prompt like hi, how are you or:

openclaw infer model run --model llamacpp/qwen3.5-9b-instruct-Q5_K_M.gguf --prompt "hi" --json

takes around 3 minutes.

The underlying llama.cpp backend can generate quickly in isolation, but when invoked through OpenClaw the Gateway shows repeated event-loop starvation warnings, slow WebSocket RPCs, Telegram fetch timer delays, and stalled sessions with activeWorkKind=model_call.

This reproduces with both llama.cpp and Ollama backends, so it does not look specific to one local server implementation.

Steps to reproduce

Fresh chat with a trivial prompt takes many minutes. openclaw infer model run --prompt "hi" also takes ~3 minutes. Gateway/control RPCs become very slow during the run. Telegram health/fetch timers are delayed and report likely event-loop starvation. Logs show model calls stuck with no progress.

Expected behavior

A trivial local model prompt should not starve the Gateway event loop. Even if the local backend/model is slow, Gateway timers, health checks, WebSocket RPCs, and channel polling should remain responsive or degrade gracefully.

Actual behavior

During local model calls, the Gateway event loop appears saturated:

eventLoopDelayP99Ms=20-29s eventLoopUtilization=1 cpuCoreRatio≈0.98 activeWorkKind=model_call

This makes unrelated Gateway operations appear broken or delayed.

OpenClaw version

2026.5.24-beta.1

Operating system

Windows 11

Install method

npm

Model

Qwen 3.5 9B

Provider / routing chain

openclaw -> llama.cpp -> qwen

Additional provider/model setup details

openclaw-diagnostics-2026-05-25T18-08-09-809Z-6904.zip

Configs/backends tried

llama.cpp via OpenAI-compatible endpoint Ollama backend OpenAI Responses-style config OpenAI Chat Completions-style config Tool support enabled/disabled attempts Fresh/simple prompts and fresh chats

The issue persists across local backend choices.

Diagnostics

I have an openclaw gateway diagnostics export zip generated while reproducing this. The export includes sanitized logs, gateway status, health, config shape, and stability data. I can attach it to this issue.

Logs, screenshots, and evidence

Relevant log excerpts:

[diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu interval=49s eventLoopDelayP99Ms=29813.1 eventLoopDelayMaxMs=29813.1 eventLoopUtilization=1 cpuCoreRatio=0.987 active=1 waiting=0 queued=0 work=[active=agent:main:main(processing/embedded_run,q=1,age=56s last=embedded_run:started)]

[fetch-timeout] fetch timeout after 9999ms (elapsed 18183ms) timer delayed 8184ms, likely event-loop starvation operation=fetchWithTimeout url=https://api.telegram.org/.../getMe

[agent/embedded] [trace:embedded-run] prep stages: runId=270498cf-d78a-4f58-ae81-f271e9ee4738 sessionId=d432c2dd-b18c-4ae8-947a-1dc7b409f875 phase=stream-ready totalMs=11071 stages=workspace-sandbox:2ms@2ms,skills:1ms@3ms,core-plugin-tools:2096ms@2099ms,bootstrap-context:18ms@2117ms,bundle-tools:338ms@2455ms,system-prompt:5976ms@8431ms,session-resource-loader:2604ms@11035ms,agent-session:5ms@11040ms,stream-setup:30ms@11070ms

[diagnostic] long-running session: sessionId=d432c2dd-b18c-4ae8-947a-1dc7b409f875 sessionKey=agent:main:main state=processing age=135s queueDepth=1 reason=queued_behind_active_work classification=long_running activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=87s recovery=none

[diagnostic] stalled session: sessionId=d432c2dd-b18c-4ae8-947a-1dc7b409f875 sessionKey=agent:main:main state=processing age=140s queueDepth=0 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=model_call lastProgress=model_call:started

Slow RPC examples from the same window:

sessions.list 29482ms
chat.history 30201ms
sessions.list 20701ms
sessions.list 25379ms
models.list 29399ms

Impact and severity

Local model use is effectively unusable for even trivial prompts on this setup, despite the backend itself being capable of high token/sec throughput outside OpenClaw.

Additional information

The model provider invocation path for local providers on Windows may be doing CPU-heavy synchronous work or otherwise failing to isolate the local model request/stream processing from the Gateway event loop. The expensive pre-run prep is also visible (~11s), but the main failure appears after model_call:started, where the Gateway starts reporting starvation and stalled agent runs.

Edit:

Possibly related: sessions.list stalls while local model call is active:

While the local model call is stalled, repeated Gateway WS RPCs also become very slow: text 18:53:58 [ws] ⇄ res ✓ sessions.list 20736ms 18:54:19 [ws] ⇄ res ✓ sessions.list 20652ms 18:55:19 [ws] ⇄ res ✓ sessions.list 21084ms 18:56:06 [ws] ⇄ res ✓ sessions.list 25379ms 19:18:54 [ws] ⇄ res ✓ sessions.list 20005ms 19:19:14 [ws] ⇄ res ✓ sessions.list 20414ms These occur near event-loop starvation / stalled model-call logs: text fetch timeout ... timer delayed ... likely event-loop starvation stalled session ... activeWorkKind=model_call lastProgress=model_call:started Expected: local status/session RPCs should remain responsive even if a model backend is slow. Actual: simple local RPCs take ~20-25s while the model call is active.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A trivial local model prompt should not starve the Gateway event loop. Even if the local backend/model is slow, Gateway timers, health checks, WebSocket RPCs, and channel polling should remain responsive or degrade gracefully.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING