openclaw - ✅(Solved) Fix [perf] system-prompt rebuild blocks event loop 16-29s per run; idle keeps 1 CPU core fully utilized [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75887Fetched 2026-05-02 05:28:28
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
2
Timeline (top)
closed ×1commented ×1cross-referenced ×1

Root Cause

Symptoms

  1. Every embedded-run prep stages trace shows system-prompt = 16,527-29,185 ms as the single largest stage.
  2. liveness warning reports eventLoopUtilization=1, cpuCoreRatio≈1.06-1.08 even when active=0 waiting=0 queued=0 — main thread is fully saturated by background work.
  3. eventLoopDelayMaxMs peaks (27,363 ms, 21,323 ms, 14,621 ms) coincide with prep-stage rebuilds, causing client-visible timeouts (embedded run failover decision: stage=assistant decision=surface_error reason=timeout).
  4. Reducing bootstrapTotalMaxChars from 150,000 → 60,000 helps but cannot eliminate, because the synchronous embedding search + MMR re-rank still runs on the main thread.

Fix Action

Fix / Workaround

Proposed mitigations (in order of impact)

  1. Move embedding inference to a Worker thread (node-llama-cpp supports nThreads / off-main inference).
  2. Cache system-prompt by (workspaceState, memoryRevision, bootstrapBudget) hash so repeated runs reuse the rendered prompt.
  3. Stream MMR re-rank with setImmediate/yieldEvery N so it doesn't monopolize the loop.
  4. Optionally expose agents.defaults.memorySearch.workerThread = true.

Workaround applied locally

  • Reduced bootstrapMaxChars 20000 → 10000, bootstrapTotalMaxChars 150000 → 60000.
  • Effect: memory peak 5.0 GB → 0.75 GB, but cpuCoreRatio≈1.07 persists when idle.

PR fix notes

PR #75922: Fix plugin-only tool and registry latency regressions

Description (problem / solution / changelog)

Summary

  • Skip core coding tool construction when an explicit allowlist only requests plugin tools.
  • Keep the full workspace plugin registry cache separate from scoped plugin registry loads.
  • Add regressions for both latency paths.

Tests

  • OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test src/agents/pi-embedded-runner/run/attempt.tools-allow-regression.test.ts src/agents/pi-embedded-runner/run/attempt.test.ts src/agents/tool-policy.plugin-only-allowlist.test.ts src/agents/pi-tools.create-openclaw-coding-tools.test.ts src/plugins/plugin-lru-cache.test.ts src/plugins/loader.runtime-registry.test.ts src/plugins/loader.test.ts
  • pnpm exec oxfmt --check --threads=1 src/agents/pi-embedded-runner/run/attempt.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts src/agents/pi-embedded-runner/run/attempt.tools-allow-regression.test.ts src/plugins/loader.ts src/plugins/loader.runtime-registry.test.ts
  • git diff --check origin/main...HEAD

Fixes #75882 Fixes #75907 Fixes #75906 Fixes #75887 Fixes #75851

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts (modified, +36/-20)
  • src/agents/pi-embedded-runner/run/attempt.tools-allow-regression.test.ts (added, +59/-0)
  • src/agents/pi-embedded-runner/run/attempt.ts (modified, +53/-4)
  • src/agents/pi-tools.ts (modified, +190/-145)
  • src/plugins/loader.runtime-registry.test.ts (modified, +28/-1)
  • src/plugins/loader.ts (modified, +40/-20)

Code Example

prep stages: totalMs=57374 stages=
  workspace-sandbox:9ms@9ms,
  skills:1ms@10ms,
  core-plugin-tools:3602ms@3612ms,
  bootstrap-context:5ms@3617ms,
  bundle-tools:764ms@4381ms,
  system-prompt:19242ms@23623ms,             ← 19s on main thread
  session-resource-loader:12290ms@35913ms,
  agent-session:2ms@35915ms,
  stream-setup:21459ms@57374ms

liveness warning: reasons=event_loop_delay,event_loop_utilization
  interval=35s eventLoopDelayP99Ms=27363.6 eventLoopDelayMaxMs=27363.6
  eventLoopUtilization=1 cpuCoreRatio=0.254 active=1 waiting=0 queued=0
RAW_BUFFERClick to expand / collapse

[perf] system-prompt rebuild blocks event loop 16-29s per run; idle keeps 1 CPU core fully utilized

Environment

  • openclaw 2026.4.29 (npm: openclaw)
  • Node.js (system), Linux 6.17.0-20-generic
  • Profile: secretary (gateway port 18790)
  • Memory backend: embeddinggemma-300m-qat-Q8_0.gguf (sqlite-vec + FTS5 hybrid + MMR + temporalDecay)
  • DB: 16,464 chunks across 28 files

Symptoms

  1. Every embedded-run prep stages trace shows system-prompt = 16,527-29,185 ms as the single largest stage.
  2. liveness warning reports eventLoopUtilization=1, cpuCoreRatio≈1.06-1.08 even when active=0 waiting=0 queued=0 — main thread is fully saturated by background work.
  3. eventLoopDelayMaxMs peaks (27,363 ms, 21,323 ms, 14,621 ms) coincide with prep-stage rebuilds, causing client-visible timeouts (embedded run failover decision: stage=assistant decision=surface_error reason=timeout).
  4. Reducing bootstrapTotalMaxChars from 150,000 → 60,000 helps but cannot eliminate, because the synchronous embedding search + MMR re-rank still runs on the main thread.

Sample trace

prep stages: totalMs=57374 stages=
  workspace-sandbox:9ms@9ms,
  skills:1ms@10ms,
  core-plugin-tools:3602ms@3612ms,
  bootstrap-context:5ms@3617ms,
  bundle-tools:764ms@4381ms,
  system-prompt:19242ms@23623ms,             ← 19s on main thread
  session-resource-loader:12290ms@35913ms,
  agent-session:2ms@35915ms,
  stream-setup:21459ms@57374ms

liveness warning: reasons=event_loop_delay,event_loop_utilization
  interval=35s eventLoopDelayP99Ms=27363.6 eventLoopDelayMaxMs=27363.6
  eventLoopUtilization=1 cpuCoreRatio=0.254 active=1 waiting=0 queued=0

Root-cause hypothesis

  • node-llama-cpp embedding inference is invoked synchronously from the main event loop during system-prompt build.
  • memorySearch hybrid (vector + FTS) + MMR re-rank + temporal-decay sort over 16K chunks runs in-process.
  • No incremental cache: identical adjacent runs rebuild the prompt from scratch even when memory hasn't changed.

Proposed mitigations (in order of impact)

  1. Move embedding inference to a Worker thread (node-llama-cpp supports nThreads / off-main inference).
  2. Cache system-prompt by (workspaceState, memoryRevision, bootstrapBudget) hash so repeated runs reuse the rendered prompt.
  3. Stream MMR re-rank with setImmediate/yieldEvery N so it doesn't monopolize the loop.
  4. Optionally expose agents.defaults.memorySearch.workerThread = true.

Workaround applied locally

  • Reduced bootstrapMaxChars 20000 → 10000, bootstrapTotalMaxChars 150000 → 60000.
  • Effect: memory peak 5.0 GB → 0.75 GB, but cpuCoreRatio≈1.07 persists when idle.

Repro

  1. Configure agents.defaults.contextInjection: "always" with bootstrapTotalMaxChars: 150000 and a populated memory DB (>10K chunks).
  2. Trigger any embedded-run and observe [trace:embedded-run] prep stages system-prompt ≥15s.
  3. With agent idle, observe [diagnostic] liveness warning showing eventLoopUtilization=1 while active=0.

extent analysis

TL;DR

Move embedding inference to a Worker thread to alleviate event loop blocking caused by synchronous system-prompt rebuilds.

Guidance

  • Verify that node-llama-cpp supports off-main inference and configure it to use a Worker thread by setting nThreads to a value greater than 0.
  • Consider implementing a cache for system-prompt based on (workspaceState, memoryRevision, bootstrapBudget) hash to avoid rebuilding the prompt from scratch on identical runs.
  • Review the proposed mitigations and prioritize them based on expected impact, starting with moving embedding inference to a Worker thread.
  • Monitor eventLoopUtilization and cpuCoreRatio after applying the workaround to ensure the issue is resolved.

Example

No code snippet is provided as the issue does not contain sufficient information to generate a specific example.

Notes

The provided workaround of reducing bootstrapMaxChars and bootstrapTotalMaxChars helps reduce memory peak but does not eliminate the cpuCoreRatio≈1.07 issue when idle. A more comprehensive solution is required to address the root cause.

Recommendation

Apply the proposed mitigation of moving embedding inference to a Worker thread, as it is likely to have the most significant impact on resolving the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [perf] system-prompt rebuild blocks event loop 16-29s per run; idle keeps 1 CPU core fully utilized [1 pull requests, 1 comments, 2 participants]