openclaw - ✅(Solved) Fix [perf] system-prompt rebuild blocks event loop 16-29s per run; idle keeps 1 CPU core fully utilized [1 pull requests, 1 comments, 2 participants]

highfly-hi · 2026-05-02T01:48:44Z

[openclaw] PR 75922: Fix plugin-only tool and registry latency regressions - Repository: openclaw/openclaw - Author: obviyus - State: closed | merged: True - L… # PR #75922: Fix plugin-only tool and registry latency regressions - Repository: openclaw/openclaw - Author: obviyus - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/75922 ## Description (problem / solution / changelog) ## Summary - Skip core coding tool construction when an explicit allowlist only requests plugin tools. - Keep the full workspace plugin registry cache separate from scoped plugin registry loads. - Add regressions for both latency paths. ## Tests - `OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test src/agents/pi-embedded-runner/run/attempt.tools-allow-regression.test.ts src/agents/pi-embedded-runner/run/attempt.test.ts src/agents/tool-policy.plugin-only-allowlist.test.ts src/agents/pi-tools.create-openclaw-coding-tools.test.ts src/plugins/plugin-lru-cache.test.ts src/plugins/loader.runtime-registry.test.ts src/plugins/loader.test.ts` - `pnpm exec oxfmt --check --threads=1 src/agents/pi-embedded-runner/run/attempt.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts src/agents/pi-embedded-runner/run/attempt.tools-allow-regression.test.ts src/plugins/loader.ts src/plugins/loader.runtime-registry.test.ts` - `git diff --check origin/main...HEAD` Fixes #75882 Fixes #75907 Fixes #75906 Fixes #75887 Fixes #75851 ## Changed files - `CHANGELOG.md` (modified, +1/-0) - `src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts` (modified, +36/-20) - `src/agents/pi-embedded-runner/run/attempt.tools-allow-regression.test.ts` (added, +59/-0) - `src/agents/pi-embedded-runner/run/attempt.ts` (modified, +53/-4) - `src/agents/pi-tools.ts` (modified, +190/-145) - `src/plugins/loader.runtime-registry.test.ts` (modified, +28/-1) - `src/plugins/loader.ts` (modified, +40/-20) ## Fix / Workaround ## Proposed mitigations (in order of impact) 1. **Move embedding inference to a Worker thread** (`node-llama-cpp` supports `nThreads` / off-main inference). 2. **Cache system-prompt by `(workspaceState, memoryRevision, bootstrapBudget)` hash** so repeated runs reuse the rendered prompt. 3. **Stream MMR re-rank** with `setImmediate`/`yieldEvery N` so it doesn't monopolize the loop. 4. Optionally expose `agents.defaults.memorySearch.workerThread = true`. ## Workaround applied locally - Reduced `bootstrapMaxChars` 20000 → 10000, `bootstrapTotalMaxChars` 150000 → 60000. - Effect: memory peak 5.0 GB → 0.75 GB, but `cpuCoreRatio≈1.07` persists when idle. # [perf] system-prompt rebuild blocks event loop 16-29s per run; idle keeps 1 CPU core fully utilized ## Environment - openclaw `2026.4.29` (npm: `openclaw`) - Node.js (system), Linux 6.17.0-20-generic - Profile: `secretary` (gateway port 18790) - Memory backend: `embeddinggemma-300m-qat-Q8_0.gguf` (sqlite-vec + FTS5 hybrid + MMR + temporalDecay) - DB: 16,464 chunks across 28 files ## Symptoms 1. Every `embedded-run` `prep stages` trace shows **system-prompt = 16,527-29,185 ms** as the single largest stage. 2. `liveness warning` reports `eventLoopUtilization=1, cpuCoreRatio≈1.06-1.08` even when `active=0 waiting=0 queued=0` — main thread is fully saturated by background work. 3. `eventLoopDelayMaxMs` peaks (`27,363 ms`, `21,323 ms`, `14,621 ms`) coincide with prep-stage rebuilds, causing client-visible timeouts (`embedded run failover decision: stage=assistant decision=surface_error reason=timeout`). 4. Reducing `bootstrapTotalMaxChars` from 150,000 → 60,000 helps but cannot eliminate, because the synchronous embedding search + MMR re-rank still runs on the main thread. ## Sample trace ``` prep stages: totalMs=57374 stages= workspace-sandbox:9ms@9ms, skills:1ms@10ms, core-plugin-tools:3602ms@3612ms, bootstrap-context:5ms@3617ms, bundle-tools:764ms@4381ms, system-prompt:19242ms@23623ms, ← 19s on main thread session-resource-loader:12290ms@35913ms, agent-session:2ms@35915ms, stream-setup:21459ms@57374ms liveness warning: reasons=event_loop_delay,event_loop_utilization interval=35s eventLoopDelayP99Ms=27363.6 eventLoopDelayMaxMs=27363.6 eventLoopUtilization=1 cpuCoreRatio=0.254 active=1 waiting=0 queued=0 ``` ## Root-cause hypothesis - `node-llama-cpp` embedding inference is invoked synchronously from the main event loop during `system-prompt` build. - `memorySearch` hybrid (vector + FTS) + MMR re-rank + temporal-decay sort over 16K chunks runs in-process. - No incremental cache: identical adjacent runs rebuild the prompt from scratch even when memory hasn't changed. ## Proposed mitigations (in order of impact) 1. **Move embedding inference to a Worker thread** (`node-llama-cpp` supports `nThreads` / off-main inference). 2. **Cache system-prompt by `(workspaceState, memoryRevision, bootstrapBudget)` hash** so repeated runs reuse the rendered prompt. 3. **Stream MMR re-rank** with `setImmediate`/`yieldEvery N` so it doesn't monopolize the loop. 4. Optionally expose

openclaw2026-05-02 01:48:44

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#75887•Fetched 2026-05-02 05:28:28

View on GitHub

Comments

Participants

Timeline

Reactions

Author

highfly-hi

Participants

clawsweeper[bot]

highfly-hi

Timeline (top)

closed ×1commented ×1cross-referenced ×1

Root Cause

Symptoms

Every embedded-run prep stages trace shows system-prompt = 16,527-29,185 ms as the single largest stage.
liveness warning reports eventLoopUtilization=1, cpuCoreRatio≈1.06-1.08 even when active=0 waiting=0 queued=0 — main thread is fully saturated by background work.
eventLoopDelayMaxMs peaks (27,363 ms, 21,323 ms, 14,621 ms) coincide with prep-stage rebuilds, causing client-visible timeouts (embedded run failover decision: stage=assistant decision=surface_error reason=timeout).
Reducing bootstrapTotalMaxChars from 150,000 → 60,000 helps but cannot eliminate, because the synchronous embedding search + MMR re-rank still runs on the main thread.

Fix Action

Fix / Workaround

Proposed mitigations (in order of impact)

Move embedding inference to a Worker thread (node-llama-cpp supports nThreads / off-main inference).
Cache system-prompt by (workspaceState, memoryRevision, bootstrapBudget) hash so repeated runs reuse the rendered prompt.
Stream MMR re-rank with setImmediate/yieldEvery N so it doesn't monopolize the loop.
Optionally expose agents.defaults.memorySearch.workerThread = true.

Workaround applied locally

Reduced bootstrapMaxChars 20000 → 10000, bootstrapTotalMaxChars 150000 → 60000.
Effect: memory peak 5.0 GB → 0.75 GB, but cpuCoreRatio≈1.07 persists when idle.

PR fix notes

PR #75922: Fix plugin-only tool and registry latency regressions

Repository: openclaw/openclaw
Author: obviyus
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/75922

Description (problem / solution / changelog)

Summary

Skip core coding tool construction when an explicit allowlist only requests plugin tools.
Keep the full workspace plugin registry cache separate from scoped plugin registry loads.
Add regressions for both latency paths.

Tests

OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test src/agents/pi-embedded-runner/run/attempt.tools-allow-regression.test.ts src/agents/pi-embedded-runner/run/attempt.test.ts src/agents/tool-policy.plugin-only-allowlist.test.ts src/agents/pi-tools.create-openclaw-coding-tools.test.ts src/plugins/plugin-lru-cache.test.ts src/plugins/loader.runtime-registry.test.ts src/plugins/loader.test.ts
pnpm exec oxfmt --check --threads=1 src/agents/pi-embedded-runner/run/attempt.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts src/agents/pi-embedded-runner/run/attempt.tools-allow-regression.test.ts src/plugins/loader.ts src/plugins/loader.runtime-registry.test.ts
git diff --check origin/main...HEAD

Fixes #75882 Fixes #75907 Fixes #75906 Fixes #75887 Fixes #75851

Changed files

CHANGELOG.md (modified, +1/-0)
src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts (modified, +36/-20)
src/agents/pi-embedded-runner/run/attempt.tools-allow-regression.test.ts (added, +59/-0)
src/agents/pi-embedded-runner/run/attempt.ts (modified, +53/-4)
src/agents/pi-tools.ts (modified, +190/-145)
src/plugins/loader.runtime-registry.test.ts (modified, +28/-1)
src/plugins/loader.ts (modified, +40/-20)

Code Example

prep stages: totalMs=57374 stages=
  workspace-sandbox:9ms@9ms,
  skills:1ms@10ms,
  core-plugin-tools:3602ms@3612ms,
  bootstrap-context:5ms@3617ms,
  bundle-tools:764ms@4381ms,
  system-prompt:19242ms@23623ms,             ← 19s on main thread
  session-resource-loader:12290ms@35913ms,
  agent-session:2ms@35915ms,
  stream-setup:21459ms@57374ms

liveness warning: reasons=event_loop_delay,event_loop_utilization
  interval=35s eventLoopDelayP99Ms=27363.6 eventLoopDelayMaxMs=27363.6
  eventLoopUtilization=1 cpuCoreRatio=0.254 active=1 waiting=0 queued=0

RAW_BUFFERClick to expand / collapse

[perf] system-prompt rebuild blocks event loop 16-29s per run; idle keeps 1 CPU core fully utilized

Environment

openclaw 2026.4.29 (npm: openclaw)
Node.js (system), Linux 6.17.0-20-generic
Profile: secretary (gateway port 18790)
Memory backend: embeddinggemma-300m-qat-Q8_0.gguf (sqlite-vec + FTS5 hybrid + MMR + temporalDecay)
DB: 16,464 chunks across 28 files

Symptoms

Every embedded-run prep stages trace shows system-prompt = 16,527-29,185 ms as the single largest stage.
liveness warning reports eventLoopUtilization=1, cpuCoreRatio≈1.06-1.08 even when active=0 waiting=0 queued=0 — main thread is fully saturated by background work.
eventLoopDelayMaxMs peaks (27,363 ms, 21,323 ms, 14,621 ms) coincide with prep-stage rebuilds, causing client-visible timeouts (embedded run failover decision: stage=assistant decision=surface_error reason=timeout).
Reducing bootstrapTotalMaxChars from 150,000 → 60,000 helps but cannot eliminate, because the synchronous embedding search + MMR re-rank still runs on the main thread.

Sample trace

prep stages: totalMs=57374 stages=
  workspace-sandbox:9ms@9ms,
  skills:1ms@10ms,
  core-plugin-tools:3602ms@3612ms,
  bootstrap-context:5ms@3617ms,
  bundle-tools:764ms@4381ms,
  system-prompt:19242ms@23623ms,             ← 19s on main thread
  session-resource-loader:12290ms@35913ms,
  agent-session:2ms@35915ms,
  stream-setup:21459ms@57374ms

liveness warning: reasons=event_loop_delay,event_loop_utilization
  interval=35s eventLoopDelayP99Ms=27363.6 eventLoopDelayMaxMs=27363.6
  eventLoopUtilization=1 cpuCoreRatio=0.254 active=1 waiting=0 queued=0

Root-cause hypothesis

node-llama-cpp embedding inference is invoked synchronously from the main event loop during system-prompt build.
memorySearch hybrid (vector + FTS) + MMR re-rank + temporal-decay sort over 16K chunks runs in-process.
No incremental cache: identical adjacent runs rebuild the prompt from scratch even when memory hasn't changed.

Proposed mitigations (in order of impact)

Move embedding inference to a Worker thread (node-llama-cpp supports nThreads / off-main inference).
Cache system-prompt by (workspaceState, memoryRevision, bootstrapBudget) hash so repeated runs reuse the rendered prompt.
Stream MMR re-rank with setImmediate/yieldEvery N so it doesn't monopolize the loop.
Optionally expose agents.defaults.memorySearch.workerThread = true.

Workaround applied locally

Reduced bootstrapMaxChars 20000 → 10000, bootstrapTotalMaxChars 150000 → 60000.
Effect: memory peak 5.0 GB → 0.75 GB, but cpuCoreRatio≈1.07 persists when idle.

Repro

Configure agents.defaults.contextInjection: "always" with bootstrapTotalMaxChars: 150000 and a populated memory DB (>10K chunks).
Trigger any embedded-run and observe [trace:embedded-run] prep stages system-prompt ≥15s.
With agent idle, observe [diagnostic] liveness warning showing eventLoopUtilization=1 while active=0.

extent analysis

TL;DR

Move embedding inference to a Worker thread to alleviate event loop blocking caused by synchronous system-prompt rebuilds.

Guidance

Verify that node-llama-cpp supports off-main inference and configure it to use a Worker thread by setting nThreads to a value greater than 0.
Consider implementing a cache for system-prompt based on (workspaceState, memoryRevision, bootstrapBudget) hash to avoid rebuilding the prompt from scratch on identical runs.
Review the proposed mitigations and prioritize them based on expected impact, starting with moving embedding inference to a Worker thread.
Monitor eventLoopUtilization and cpuCoreRatio after applying the workaround to ensure the issue is resolved.

Example

No code snippet is provided as the issue does not contain sufficient information to generate a specific example.

Notes

The provided workaround of reducing bootstrapMaxChars and bootstrapTotalMaxChars helps reduce memory peak but does not eliminate the cpuCoreRatio≈1.07 issue when idle. A more comprehensive solution is required to address the root cause.

Recommendation

Apply the proposed mitigation of moving embedding inference to a Worker thread, as it is likely to have the most significant impact on resolving the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#API rate limit #retriever error #indexing error #inference speed #output truncation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [perf] system-prompt rebuild blocks event loop 16-29s per run; idle keeps 1 CPU core fully utilized [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Symptoms

Fix Action

Fix / Workaround

Proposed mitigations (in order of impact)

Workaround applied locally

PR fix notes

PR #75922: Fix plugin-only tool and registry latency regressions

Description (problem / solution / changelog)

Summary

Tests

Changed files

Code Example

[perf] system-prompt rebuild blocks event loop 16-29s per run; idle keeps 1 CPU core fully utilized

Environment

Symptoms

Sample trace

Root-cause hypothesis

Proposed mitigations (in order of impact)

Workaround applied locally

Repro

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING