openclaw - 💡(How to fix) Fix Single-threaded agent model_call blocks all 83 agents — eventLoop utilization sustained 1.0 even after cleaning sessions and patching memory thresholds (not Telegram, not session leak, pure dispatch bottleneck)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

#78808 was closed because Telegram polling was moved to a worker thread. But we are a Feishu (Lark) deployment with 83 agents — no Telegram, no polling bottlenecks. Our Gateway's single-threaded event loop is saturated by agent model_call contention, not channel I/O.

Fix Action

Fix / Workaround

We cleaned 796 completed subagent sessions (see #86745 for the subagent session leak). After cleanup:

  • RSS dropped from 3.2GB to 2.1GB
  • session-locks dropped from 36s to 64ms
  • eventLoop utilization remained 0.85-1.0 — proving the bottleneck is NOT session accumulation, it's the single-threaded agent dispatch

Agent model_call dispatch needs to be offloaded from the main event loop. Suggestions:

  1. Worker-thread pool for agent model_call — similar to how Telegram polling was isolated, but for the actual agent inference dispatch
  2. Per-agent event loop isolation — each agent gets its own worker thread or at minimum, model calls are dispatched to a thread pool
  3. Configurable max concurrent model_calls — if architecture can't change, at least prevent all 83 agents from competing for the event loop simultaneously
  • OpenClaw: 2026.5.24-beta.2
  • Node.js: v22.22.0
  • Agents: 83 (82 on GLM-5-Turbo, 1 on DeepSeek V4 Pro)
  • Channel: Feishu (Lark) WebSocket — NOT polling-based
  • NODE_OPTIONS=--max-old-space-size=8192 configured
  • Memory pressure thresholds already patched to 8GB/10GB
  • 796 stale subagent sessions already cleaned
  • OS: Linux (WSL2), 31GB RAM, RTX 3090

Code Example

work=[active=agent:agent-architect:feishu:group:...(processing/model_call,q=1,age=25s)]
RAW_BUFFERClick to expand / collapse

This is a follow-up to #78808 (closed as implemented for Telegram), #78861, #84903, and #86745.

The fundamental problem has NOT been solved for non-Telegram deployments

#78808 was closed because Telegram polling was moved to a worker thread. But we are a Feishu (Lark) deployment with 83 agents — no Telegram, no polling bottlenecks. Our Gateway's single-threaded event loop is saturated by agent model_call contention, not channel I/O.

Real-world evidence (83 agents, v2026.5.24-beta.2)

TimeeventLoop utildelayMaxactive agentsCause
10:330.98314.7s1agent-architect model_call stuck
10:391.024.7s1agent-architect subagent model_call
10:501.013.1s2architect + butler model_calls
11:320.99612.5s2trading agents model_calls

One agent's slow API response (GLM-5-Turbo taking 15-25s) blocks the ENTIRE Gateway. q=1 in the work queue confirms downstream agents are queued waiting.

This is not a model speed problem — DeepSeek V4 Pro also shows the same pattern. This is an architectural problem: all agent runs share one event loop, and model_call is synchronous-blocking within that loop.

Agent model_call blocking the gateway

From diagnostic liveness logs:

work=[active=agent:agent-architect:feishu:group:...(processing/model_call,q=1,age=25s)]

q=1 means downstream requests are queued. One agent's model_call that takes 20 seconds blocks all other 82 agents for 20 seconds.

After cleaning 796 stale subagent sessions

We cleaned 796 completed subagent sessions (see #86745 for the subagent session leak). After cleanup:

  • RSS dropped from 3.2GB to 2.1GB
  • session-locks dropped from 36s to 64ms
  • eventLoop utilization remained 0.85-1.0 — proving the bottleneck is NOT session accumulation, it's the single-threaded agent dispatch

Comparison: 18-agent deployment runs perfectly

A separate 18-agent deployment on identical hardware (31GB, same Node version) runs for 77 days with:

  • Gateway RSS: 667MB
  • Zero memory pressure alerts
  • eventLoop utilization: normal
  • System load: 0.02

The 83-agent deployment has the exact same per-agent workload but 4.6x the agents → 4.6x the model_call contention → eventLoop saturation.

What we need

Agent model_call dispatch needs to be offloaded from the main event loop. Suggestions:

  1. Worker-thread pool for agent model_call — similar to how Telegram polling was isolated, but for the actual agent inference dispatch
  2. Per-agent event loop isolation — each agent gets its own worker thread or at minimum, model calls are dispatched to a thread pool
  3. Configurable max concurrent model_calls — if architecture can't change, at least prevent all 83 agents from competing for the event loop simultaneously

Environment

  • OpenClaw: 2026.5.24-beta.2
  • Node.js: v22.22.0
  • Agents: 83 (82 on GLM-5-Turbo, 1 on DeepSeek V4 Pro)
  • Channel: Feishu (Lark) WebSocket — NOT polling-based
  • NODE_OPTIONS=--max-old-space-size=8192 configured
  • Memory pressure thresholds already patched to 8GB/10GB
  • 796 stale subagent sessions already cleaned
  • OS: Linux (WSL2), 31GB RAM, RTX 3090

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Single-threaded agent model_call blocks all 83 agents — eventLoop utilization sustained 1.0 even after cleaning sessions and patching memory thresholds (not Telegram, not session leak, pure dispatch bottleneck)