openclaw - 💡(How to fix) Fix [Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop [12 comments, 9 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75999Fetched 2026-05-03 04:43:28
View on GitHub
Comments
12
Participants
9
Timeline
28
Reactions
4
Timeline (top)
commented ×12cross-referenced ×9subscribed ×4closed ×1

Upgrading from 4.24/4.27 → 4.29 caused every agent dispatch to take 2–5 minutes to first reply. The gateway log shows new prep stages instrumentation in 4.29 that reports each dispatch spending ~73 s of synchronous CPU work before the LLM is even called, with single operations blocking the Node.js event loop for over 30 seconds.

The same 13-agent workspace setup on 4.27 returns replies in <1 minute.

A separate Python-based agent runtime (Hermes) on the same machine, using the same Z.AI/MiniMax/DeepSeek API keys and same glm-5-turbo model, returns replies in <10 seconds — confirming the bottleneck is inside the OpenClaw runtime, not the LLM provider, network, or model.

Root Cause

Upgrading from 4.24/4.27 → 4.29 caused every agent dispatch to take 2–5 minutes to first reply. The gateway log shows new prep stages instrumentation in 4.29 that reports each dispatch spending ~73 s of synchronous CPU work before the LLM is even called, with single operations blocking the Node.js event loop for over 30 seconds.

The same 13-agent workspace setup on 4.27 returns replies in <1 minute.

A separate Python-based agent runtime (Hermes) on the same machine, using the same Z.AI/MiniMax/DeepSeek API keys and same glm-5-turbo model, returns replies in <10 seconds — confirming the bottleneck is inside the OpenClaw runtime, not the LLM provider, network, or model.

Fix Action

Fix / Workaround

[Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop

Upgrading from 4.24/4.27 → 4.29 caused every agent dispatch to take 2–5 minutes to first reply. The gateway log shows new prep stages instrumentation in 4.29 that reports each dispatch spending ~73 s of synchronous CPU work before the LLM is even called, with single operations blocking the Node.js event loop for over 30 seconds.

Stage breakdown from a real 4.29 dispatch (commander, glm-5-turbo, ~5 min total)

Code Example

[trace:embedded-run] startup stages totalMs=28630
  workspace:1ms, runtime-plugins:3ms, hooks:0ms,
  model-resolution:6794ms, auth:12471ms,
  context-engine:0ms, attempt-dispatch:11612ms

[trace:embedded-run] prep stages totalMs=73394
  workspace-sandbox:610ms, skills:0ms,
  core-plugin-tools:8765ms, bootstrap-context:8821ms,
  bundle-tools:3532ms,
  system-prompt:23317ms,            ← largest contributor
  session-resource-loader:7546ms,
  agent-session:5ms,
  stream-setup:20798ms              ← second-largest

[diagnostic] liveness warning:
  eventLoopDelayMaxMs=34024.2 ← single 34-second event-loop block
  eventLoopUtilization=1
  cpuCoreRatio=1.013
RAW_BUFFERClick to expand / collapse

[Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop

Bug type

Performance regression (introduced in 4.29; not present in 4.27)

Summary

Upgrading from 4.24/4.27 → 4.29 caused every agent dispatch to take 2–5 minutes to first reply. The gateway log shows new prep stages instrumentation in 4.29 that reports each dispatch spending ~73 s of synchronous CPU work before the LLM is even called, with single operations blocking the Node.js event loop for over 30 seconds.

The same 13-agent workspace setup on 4.27 returns replies in <1 minute.

A separate Python-based agent runtime (Hermes) on the same machine, using the same Z.AI/MiniMax/DeepSeek API keys and same glm-5-turbo model, returns replies in <10 seconds — confirming the bottleneck is inside the OpenClaw runtime, not the LLM provider, network, or model.

Evidence

Stage breakdown from a real 4.29 dispatch (commander, glm-5-turbo, ~5 min total)

[trace:embedded-run] startup stages totalMs=28630
  workspace:1ms, runtime-plugins:3ms, hooks:0ms,
  model-resolution:6794ms, auth:12471ms,
  context-engine:0ms, attempt-dispatch:11612ms

[trace:embedded-run] prep stages totalMs=73394
  workspace-sandbox:610ms, skills:0ms,
  core-plugin-tools:8765ms, bootstrap-context:8821ms,
  bundle-tools:3532ms,
  system-prompt:23317ms,            ← largest contributor
  session-resource-loader:7546ms,
  agent-session:5ms,
  stream-setup:20798ms              ← second-largest

[diagnostic] liveness warning:
  eventLoopDelayMaxMs=34024.2 ← single 34-second event-loop block
  eventLoopUtilization=1
  cpuCoreRatio=1.013

prep stages totals 73 s and startup stages adds another 28 s, so each dispatch consumes ~100 seconds of CPU time before the model even starts streaming. With CPU saturated, the fallback chain then trips fetch-timeouts cascading for another 1–3 minutes.

408 [fetch-timeout] fetch timeout reached log lines were observed in a 2-hour window during typical use.

4.27 vs 4.29 instrumentation diff

grep prepStages.mark returns:

  • 4.29 dist/selection-CwAy0mf2.js: 9 hits (workspace-sandbox, skills, core-plugin-tools, bootstrap-context, bundle-tools, system-prompt, session-resource-loader, agent-session, stream-setup)
  • 4.27 dist/selection-*.js: 0 hits

The new prep stages instrumentation is the most visible signal that dispatch flow was substantially reworked in 4.29.

Cross-runtime baseline (same machine, same provider, same model)

RuntimeReply latencyNotes
Hermes (Python)<10 sSame glm-5-turbo, same Z.AI Coding Plan key
OpenClaw 4.27<60 sProduction agents, 13 telegram channels
OpenClaw 4.292–5 minSame workspace, same config

Reproduction steps

  1. Install [email protected] with a non-trivial workspace (≥10 skills under workspace-*/skills/) and a Z.AI / MiniMax / DeepSeek primary model.
  2. Bind a Telegram channel to one of the agents.
  3. Send any short prompt (e.g. hi).
  4. Observe in journalctl --user -u openclaw-gateway:
    • prep stages totalMs >= 60000
    • eventLoopDelayMaxMs > 5000
    • Reply latency 2–5 minutes
  5. Downgrade to [email protected] (set OPENCLAW_ALLOW_OLDER_BINARY_DESTRUCTIVE_ACTIONS=1), restart gateway, repeat step 3 — reply now <60 s.

Suspected hot paths

dist/selection-CwAy0mf2.js regions between the new prep stage marks:

  • system-prompt stage (23 s): buildEmbeddedSystemPromptbuildAgentSystemPrompt (in system-prompt-DZrkA5Mv.js:282-648) does large synchronous string concat + XML escaping + conditional rendering of all skill metadata, with no per-(skills hash + workspace files hash) cache. bootstrap-cache-CmO66T4a.js only caches per-session, invalidated each dispatch.
  • stream-setup stage (21 s): covers selection-CwAy0mf2.js:6934-7148, including applyExtraParamsToAgent calls into provider runtime deps. (Not the new Google prompt cache path — isGooglePromptCacheEligible early-returns for non-Gemini models.)

Impact

  • Telegram bots become unusable (>2 min reply means users assume the bot is broken).
  • Per-dispatch CPU saturation cascades: gateway can only handle a single request at a time without queueing.
  • [telegram] sendChatAction failed and typing TTL reached (2m); stopping typing indicator appear consistently.

Workaround in production

Pinned to [email protected] and disabled weekly-openclaw-update.timer to prevent auto-upgrade. Required:

  • Environment=OPENCLAW_ALLOW_OLDER_BINARY_DESTRUCTIVE_ACTIONS=1 systemd drop-in (since 4.27 refuses to start against a config last written by 4.29).
  • Stripping plugins.entries.active-memory.config (4.27 schema rejects it as additional properties).

Environment

  • openclaw 2026.4.29 (regression) vs 2026.4.27 (baseline working)
  • Node.js v22.22.2 (managed via nvm)
  • Ubuntu 25.10 (Linux 6.17.0-22-generic)
  • Gateway run via user systemd unit (systemctl --user)
  • 13 agents, average workspace skills/ size ~3 MB, several glm-5-turbo / MiniMax-M2.7 / deepseek-v4-flash models in fallback chains

Suggested fix direction

  1. Cache the built system prompt keyed on (skills SKILL.md hash + AGENTS.md/SOUL.md/IDENTITY.md/USER.md/MEMORY.md hashes); invalidate only when those files change. Skip buildEmbeddedSystemPrompt on cache hit.
  2. Move CPU-bound prep work off the main event loop (worker thread or chunked yield).
  3. Reduce per-dispatch work in stream-setup if possible (verify wrapper layers don't re-initialize per dispatch).

Happy to provide additional traces or test patches against affected files.

extent analysis

TL;DR

The most likely fix involves optimizing the prep stages in the OpenClaw runtime, specifically caching the built system prompt and moving CPU-bound work off the main event loop.

Guidance

  • Investigate the system-prompt stage, which takes approximately 23 seconds, and consider implementing a cache for the built system prompt to reduce the time spent on string concatenation and XML escaping.
  • Examine the stream-setup stage, which takes around 21 seconds, and look for opportunities to reduce per-dispatch work or optimize the applyExtraParamsToAgent calls.
  • Consider using worker threads or chunked yield to move CPU-bound prep work off the main event loop and prevent event loop delays.
  • Verify that the suggested fixes do not introduce any new issues or regressions.

Example

// Pseudocode example of caching the built system prompt
const systemPromptCache = {};
function buildSystemPrompt(skillsHash, workspaceFilesHash) {
  const cacheKey = `${skillsHash}-${workspaceFilesHash}`;
  if (systemPromptCache[cacheKey]) {
    return systemPromptCache[cacheKey];
  }
  const prompt = buildEmbeddedSystemPrompt(skillsHash, workspaceFilesHash);
  systemPromptCache[cacheKey] = prompt;
  return prompt;
}

Notes

The provided issue lacks information on the specific implementation details of the system-prompt and stream-setup stages, so the suggested fixes are based on the provided traces and may require additional investigation and testing.

Recommendation

Apply a workaround by caching the built system prompt and moving CPU-bound work off the main event loop, as this is likely to significantly reduce the dispatch time and prevent event loop delays.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop [12 comments, 9 participants]