openclaw - 💡(How to fix) Fix [Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop [12 comments, 9 participants]

openclaw2026-05-02 08:16:22

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#75999•Fetched 2026-05-03 04:43:28

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×12cross-referenced ×9subscribed ×4closed ×1

Upgrading from 4.24/4.27 → 4.29 caused every agent dispatch to take 2–5 minutes to first reply. The gateway log shows new prep stages instrumentation in 4.29 that reports each dispatch spending ~73 s of synchronous CPU work before the LLM is even called, with single operations blocking the Node.js event loop for over 30 seconds.

The same 13-agent workspace setup on 4.27 returns replies in <1 minute.

A separate Python-based agent runtime (Hermes) on the same machine, using the same Z.AI/MiniMax/DeepSeek API keys and same glm-5-turbo model, returns replies in <10 seconds — confirming the bottleneck is inside the OpenClaw runtime, not the LLM provider, network, or model.

Root Cause

The same 13-agent workspace setup on 4.27 returns replies in <1 minute.

Fix Action

Fix / Workaround

[Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop

Stage breakdown from a real 4.29 dispatch (commander, glm-5-turbo, ~5 min total)

Code Example

[trace:embedded-run] startup stages totalMs=28630
  workspace:1ms, runtime-plugins:3ms, hooks:0ms,
  model-resolution:6794ms, auth:12471ms,
  context-engine:0ms, attempt-dispatch:11612ms

[trace:embedded-run] prep stages totalMs=73394
  workspace-sandbox:610ms, skills:0ms,
  core-plugin-tools:8765ms, bootstrap-context:8821ms,
  bundle-tools:3532ms,
  system-prompt:23317ms,            ← largest contributor
  session-resource-loader:7546ms,
  agent-session:5ms,
  stream-setup:20798ms              ← second-largest

[diagnostic] liveness warning:
  eventLoopDelayMaxMs=34024.2 ← single 34-second event-loop block
  eventLoopUtilization=1
  cpuCoreRatio=1.013

RAW_BUFFERClick to expand / collapse

[Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop

Bug type

Performance regression (introduced in 4.29; not present in 4.27)

Summary

The same 13-agent workspace setup on 4.27 returns replies in <1 minute.

Evidence

Stage breakdown from a real 4.29 dispatch (commander, glm-5-turbo, ~5 min total)

[trace:embedded-run] startup stages totalMs=28630
  workspace:1ms, runtime-plugins:3ms, hooks:0ms,
  model-resolution:6794ms, auth:12471ms,
  context-engine:0ms, attempt-dispatch:11612ms

[trace:embedded-run] prep stages totalMs=73394
  workspace-sandbox:610ms, skills:0ms,
  core-plugin-tools:8765ms, bootstrap-context:8821ms,
  bundle-tools:3532ms,
  system-prompt:23317ms,            ← largest contributor
  session-resource-loader:7546ms,
  agent-session:5ms,
  stream-setup:20798ms              ← second-largest

[diagnostic] liveness warning:
  eventLoopDelayMaxMs=34024.2 ← single 34-second event-loop block
  eventLoopUtilization=1
  cpuCoreRatio=1.013

prep stages totals 73 s and startup stages adds another 28 s, so each dispatch consumes ~100 seconds of CPU time before the model even starts streaming. With CPU saturated, the fallback chain then trips fetch-timeouts cascading for another 1–3 minutes.

408 [fetch-timeout] fetch timeout reached log lines were observed in a 2-hour window during typical use.

4.27 vs 4.29 instrumentation diff

grep prepStages.mark returns:

4.29 dist/selection-CwAy0mf2.js: 9 hits (workspace-sandbox, skills, core-plugin-tools, bootstrap-context, bundle-tools, system-prompt, session-resource-loader, agent-session, stream-setup)
4.27 dist/selection-*.js: 0 hits

The new prep stages instrumentation is the most visible signal that dispatch flow was substantially reworked in 4.29.

Cross-runtime baseline (same machine, same provider, same model)

Runtime	Reply latency	Notes
Hermes (Python)	<10 s	Same `glm-5-turbo`, same Z.AI Coding Plan key
OpenClaw 4.27	<60 s	Production agents, 13 telegram channels
OpenClaw 4.29	2–5 min	Same workspace, same config

Reproduction steps

Install [email protected] with a non-trivial workspace (≥10 skills under workspace-*/skills/) and a Z.AI / MiniMax / DeepSeek primary model.
Bind a Telegram channel to one of the agents.
Send any short prompt (e.g. hi).
Observe in journalctl --user -u openclaw-gateway:
- prep stages totalMs >= 60000
- eventLoopDelayMaxMs > 5000
- Reply latency 2–5 minutes
Downgrade to [email protected] (set OPENCLAW_ALLOW_OLDER_BINARY_DESTRUCTIVE_ACTIONS=1), restart gateway, repeat step 3 — reply now <60 s.

Suspected hot paths

dist/selection-CwAy0mf2.js regions between the new prep stage marks:

system-prompt stage (23 s): buildEmbeddedSystemPrompt → buildAgentSystemPrompt (in system-prompt-DZrkA5Mv.js:282-648) does large synchronous string concat + XML escaping + conditional rendering of all skill metadata, with no per-(skills hash + workspace files hash) cache. bootstrap-cache-CmO66T4a.js only caches per-session, invalidated each dispatch.
stream-setup stage (21 s): covers selection-CwAy0mf2.js:6934-7148, including applyExtraParamsToAgent calls into provider runtime deps. (Not the new Google prompt cache path — isGooglePromptCacheEligible early-returns for non-Gemini models.)

Impact

Telegram bots become unusable (>2 min reply means users assume the bot is broken).
Per-dispatch CPU saturation cascades: gateway can only handle a single request at a time without queueing.
[telegram] sendChatAction failed and typing TTL reached (2m); stopping typing indicator appear consistently.

Workaround in production

Pinned to [email protected] and disabled weekly-openclaw-update.timer to prevent auto-upgrade. Required:

Environment=OPENCLAW_ALLOW_OLDER_BINARY_DESTRUCTIVE_ACTIONS=1 systemd drop-in (since 4.27 refuses to start against a config last written by 4.29).
Stripping plugins.entries.active-memory.config (4.27 schema rejects it as additional properties).

Environment

openclaw 2026.4.29 (regression) vs 2026.4.27 (baseline working)
Node.js v22.22.2 (managed via nvm)
Ubuntu 25.10 (Linux 6.17.0-22-generic)
Gateway run via user systemd unit (systemctl --user)
13 agents, average workspace skills/ size ~3 MB, several glm-5-turbo / MiniMax-M2.7 / deepseek-v4-flash models in fallback chains

Suggested fix direction

Cache the built system prompt keyed on (skills SKILL.md hash + AGENTS.md/SOUL.md/IDENTITY.md/USER.md/MEMORY.md hashes); invalidate only when those files change. Skip buildEmbeddedSystemPrompt on cache hit.
Move CPU-bound prep work off the main event loop (worker thread or chunked yield).
Reduce per-dispatch work in stream-setup if possible (verify wrapper layers don't re-initialize per dispatch).

Happy to provide additional traces or test patches against affected files.

extent analysis

TL;DR

The most likely fix involves optimizing the prep stages in the OpenClaw runtime, specifically caching the built system prompt and moving CPU-bound work off the main event loop.

Guidance

Investigate the system-prompt stage, which takes approximately 23 seconds, and consider implementing a cache for the built system prompt to reduce the time spent on string concatenation and XML escaping.
Examine the stream-setup stage, which takes around 21 seconds, and look for opportunities to reduce per-dispatch work or optimize the applyExtraParamsToAgent calls.
Consider using worker threads or chunked yield to move CPU-bound prep work off the main event loop and prevent event loop delays.
Verify that the suggested fixes do not introduce any new issues or regressions.

Example

// Pseudocode example of caching the built system prompt
const systemPromptCache = {};
function buildSystemPrompt(skillsHash, workspaceFilesHash) {
  const cacheKey = `${skillsHash}-${workspaceFilesHash}`;
  if (systemPromptCache[cacheKey]) {
    return systemPromptCache[cacheKey];
  }
  const prompt = buildEmbeddedSystemPrompt(skillsHash, workspaceFilesHash);
  systemPromptCache[cacheKey] = prompt;
  return prompt;
}

Notes

The provided issue lacks information on the specific implementation details of the system-prompt and stream-setup stages, so the suggested fixes are based on the provided traces and may require additional investigation and testing.

Recommendation

Apply a workaround by caching the built system prompt and moving CPU-bound work off the main event loop, as this is likely to significantly reduce the dispatch time and prevent event loop delays.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #retriever error #indexing error #inference speed #output truncation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix [Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop [12 comments, 9 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

[Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop

Stage breakdown from a real 4.29 dispatch (commander, glm-5-turbo, ~5 min total)

Code Example

[Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop

Bug type

Summary

Evidence

Stage breakdown from a real 4.29 dispatch (commander, glm-5-turbo, ~5 min total)

4.27 vs 4.29 instrumentation diff

Cross-runtime baseline (same machine, same provider, same model)

Reproduction steps

Suspected hot paths

Impact

Workaround in production

Environment

Suggested fix direction

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING