openclaw - ✅(Solved) Fix [Bug]: Bundled plugin runtime mirror runs synchronously on every pi-agent invocation, blocking the gateway main thread for tens of seconds (regression in 2026.4.22+) [2 pull requests, 4 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75069Fetched 2026-05-01 05:38:29
View on GitHub
Comments
4
Participants
4
Timeline
12
Reactions
2
Author
Timeline (top)
cross-referenced ×8commented ×4

Each pi-agent invocation triggers a synchronous "mirror" walk over every bundled plugin, blocking the gateway main thread. On the first agent run after gateway start, four full sweeps stack up and produce ~80–90 seconds of contiguous main-thread blocking (cold fs page cache). On subsequent agent runs the block is shorter — roughly 15 s per sweep on warm fs cache — but it never stops happening because the per-plugin work is not memoized.

Root Cause

The agent-run path triggers loadOpenClawPlugins (a synchronous, per-plugin loop) four times during a single first send: once via resolveRuntimePluginRegistry for the runtime context, and three more times via the video/image/music tool factories where each tool spawns a runEmbeddedPiAgent that re-resolves the registry. Each of those four resolutions calls prepareBundledPluginRuntimeRoot once per plugin (~114 plugins on a default install).

Inside prepareBundledPluginRuntimeRoot, mirrorBundledRuntimeDistRootEntries walks the entire dist/ top level (~2760 .js files, ~24 MB on disk) every time. There is no "this install root has already been mirrored in this process" guard.

Specific issues we identified while reading the source and reproducing the slowness:

1. Everything in the chain is function, not async function. All file IO is *Sync:

  • readFileSync, lstatSync, realpathSync, linkSync, rmSync, mkdirSync, existsSync, readdirSync, readlinkSync, copyFileSync, symlinkSync, renameSync, writeFileSync, chmodSync, statSync.
  • Even the file-system lock waits use a hard thread block: Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms) (src/plugins/bundled-runtime-deps.ts:426).

A single sample of a prepareBundledPluginRuntimeRoot call therefore parks the event loop for the entire wall-clock cost of all the syscalls put together.

2. No process-level dedup across plugins for the dist-root walk. The same (installRoot, sourceDistRoot) pair gets walked once per plugin. With 114 plugins × 4 invocations per first send, that is ~456 full dist/ sweeps per first send, even though the walk is logically idempotent for a fixed source dist.

3. materializeBundledRuntimeMirrorDistFile's early-return is broken in the steady state. At src/plugins/bundled-runtime-deps.ts:171:

try {
  if (
    fs.realpathSync(sourcePath) === fs.realpathSync(targetPath) &&
    !fs.lstatSync(targetPath).isSymbolicLink()
  ) {
    return;
  }
} catch {}
fs.mkdirSync(path.dirname(targetPath), { recursive: true, mode: 0o755 });
fs.rmSync(targetPath, { recursive: true, force: true });
try {
  fs.linkSync(sourcePath, targetPath);
  ...
}

For a hardlink target, realpathSync(target) returns the target's own canonical path, not the source path — realpath does not "deduplicate" hardlinks. So the equality fails even when the target is already the correct hardlink that points at the same inode as the source. We always fall through to rmSync(target) + linkSync(source, target), rewriting the same hardlink with no functional change. On a default install this happens to ~462 dist-root files on every single sweep.

The intent is "skip if target already mirrors source"; the actual condition should compare (dev, ino) from lstatSync, since two paths that share (dev, ino) already point to the same on-disk content.

4. The shipped 2026.4.26 build has no shouldMaterializeBundledRuntimeMirrorDistFile cache. The main branch adds bundledRuntimeMirrorMaterializeCache keyed by stat signature, which would short-circuit subsequent calls in the same process. The shipped compiled module reads + regex-tests every file every time:

function shouldMaterializeBundledRuntimeMirrorDistFile(sourcePath) {
  if (!BUNDLED_RUNTIME_MIRROR_MATERIALIZED_EXTENSIONS.has(path.extname(sourcePath))) return false;
  try {
    return BUNDLED_RUNTIME_MIRROR_PLUGIN_REGION_RE.test(fs.readFileSync(sourcePath, "utf8"));
  } catch { return false; }
}

This alone accounts for ~24 MB of readFileSync per sweep, repeated per plugin. After the cache lands in a future release this cost goes away, but issues 1, 2, and 3 above still produce 462 unconditional unlink + linkSync per plugin per sweep.

Fix Action

Fix / Workaround

  1. Land the existing bundledRuntimeMirrorMaterializeCache from main in a patch release, even ahead of the async migration. It cuts ~24 MB of readFileSync per sweep down to one stat per file in the steady state.

PR fix notes

PR #75183: fix: simplify bundled runtime dependency repair

Description (problem / solution / changelog)

Summary

This PR unifies bundled plugin runtime-dependency repair around the package-level plan and lets npm/pnpm own dependency convergence once OpenClaw has decided a repair is required.

What changed:

  • Build the active bundled plugin dependency plan before startup/runtime imports, then repair the selected package-level install root once instead of per-plugin ad hoc scans.
  • Treat an existing node_modules tree without complete generated materialization as incomplete, even when package sentinels such as node_modules/<dep>/package.json exist.
  • Force the package-manager repair once the planner has marked a tree incomplete, so writing the generated install manifest cannot accidentally turn the repair into a no-op.
  • Keep post-install verification on requested packages and declared entry files, so package-manager success is not trusted until the staged tree is actually usable.
  • Accept generated manifest supersets for narrower plugin loads so a complete package-level stage is reused rather than pruned/reinstalled.
  • Preserve config/doctor/hot-reload behavior: config determines the plugin plan; doctor/config edits enter plan mode; startup only repairs when the plan or materialization requires it.

Fixes / related reports

Fixes #75309.

Supersedes the implementation approach in #75310 by keeping the same narrow idea but closing the reviewed hole where a generated manifest plus a no-main package sentinel could still look materialized.

Also hardens the recovery side of the already-addressed reports #75296 and #75304:

  • #75296: post-install verification already fails closed when npm/pnpm reports success but requested packages are missing; this PR additionally makes later doctor/startup repair recover from the leftover partial tree.
  • #75304: mirrored root dependencies and manifest-superset reuse already address the json5/prune crash loop; this PR prevents interrupted runtime-deps stages from staying stuck behind false package sentinels.

Not claimed: #73520, #74948, #74963, #75071, and #75288 are nearby runtime-deps lifecycle issues with different root causes.

Verification

Local:

  • pnpm docs:list
  • pnpm exec oxfmt --check --threads=1 src/plugins/bundled-runtime-deps-install.ts src/plugins/bundled-runtime-deps.ts src/plugins/bundled-runtime-deps-materialization.ts src/plugins/bundled-runtime-deps.test.ts src/commands/doctor-bundled-plugin-runtime-deps.test.ts
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md
  • pnpm check:changelog-attributions
  • git diff --check
  • pnpm test src/plugins/bundled-runtime-deps.test.ts src/commands/doctor-bundled-plugin-runtime-deps.test.ts src/gateway/server-startup-plugins.test.ts src/gateway/server.reload.test.ts -- --reporter=verbose

Blacksmith/Testbox:

  • tbx_01kqgsehj3tf33k2dmzy35ds7j: pnpm test src/plugins/bundled-runtime-deps.test.ts src/commands/doctor-bundled-plugin-runtime-deps.test.ts src/gateway/server-startup-plugins.test.ts src/gateway/server.reload.test.ts -- --reporter=verbose passed, 3 Vitest shards / 177 tests.

Broad pnpm check:changed was attempted on tbx_01kqgrw1phqcm20c9w06n1w4wx, but the Testbox full-sync omitted the tracked-but-gitignored pnpm-lock.yaml; the workaround polluted remote node_modules into the changed-file scan and tsgolint was later SIGKILLed. I do not count that polluted run as product signal.

Changed files

  • CHANGELOG.md (modified, +4/-0)
  • docs/cli/channels.md (modified, +3/-0)
  • docs/cli/configure.md (modified, +1/-0)
  • docs/cli/gateway.md (modified, +1/-1)
  • docs/cli/onboard.md (modified, +2/-0)
  • docs/cli/plugins.md (modified, +6/-2)
  • docs/docs.json (modified, +1/-0)
  • docs/gateway/doctor.md (modified, +1/-1)
  • docs/plugins/dependency-resolution.md (added, +214/-0)
  • docs/tools/acp-agents.md (modified, +2/-1)
  • docs/tools/plugin.md (modified, +6/-3)
  • scripts/lib/bundled-runtime-deps-install.mjs (modified, +18/-3)
  • scripts/postinstall-bundled-plugins.mjs (modified, +4/-146)
  • scripts/release-check.ts (modified, +1/-1)
  • src/channels/plugins/bundled.shape-guard.test.ts (modified, +2/-2)
  • src/channels/plugins/read-only.test.ts (modified, +14/-14)
  • src/channels/plugins/read-only.ts (modified, +3/-3)
  • src/cli/command-bootstrap.test.ts (modified, +28/-3)
  • src/cli/command-bootstrap.ts (modified, +12/-5)
  • src/cli/command-catalog.ts (modified, +28/-4)
  • src/cli/command-execution-startup.test.ts (modified, +7/-0)
  • src/cli/command-execution-startup.ts (modified, +1/-0)
  • src/cli/command-path-policy.test.ts (modified, +59/-68)
  • src/cli/command-path-policy.ts (modified, +1/-0)
  • src/cli/command-startup-policy.test.ts (modified, +2/-0)
  • src/cli/command-startup-policy.ts (modified, +23/-7)
  • src/cli/plugin-registry-loader.test.ts (modified, +11/-7)
  • src/cli/plugin-registry-loader.ts (modified, +10/-7)
  • src/cli/plugins-cli.list.test.ts (modified, +41/-3)
  • src/cli/plugins-cli.ts (modified, +8/-456)
  • src/cli/plugins-command-helpers.ts (modified, +8/-0)
  • src/cli/plugins-deps-command.test.ts (modified, +47/-32)
  • src/cli/plugins-deps-command.ts (modified, +40/-42)
  • src/cli/plugins-inspect-command.ts (added, +361/-0)
  • src/cli/plugins-list-command.ts (added, +114/-0)
  • src/cli/program/preaction.test.ts (modified, +9/-3)
  • src/commands/agents.providers.test.ts (modified, +1/-1)
  • src/commands/agents.providers.ts (modified, +2/-2)
  • src/commands/channels.list.auth-profiles.test.ts (modified, +2/-2)
  • src/commands/channels.remove.test.ts (modified, +45/-20)
  • src/commands/channels.resolve.test.ts (modified, +30/-24)
  • src/commands/channels/capabilities.ts (modified, +1/-1)
  • src/commands/channels/list.ts (modified, +1/-1)
  • src/commands/channels/remove.ts (modified, +9/-2)
  • src/commands/channels/resolve.ts (modified, +6/-1)
  • src/commands/channels/status-config-format.ts (modified, +1/-1)
  • src/commands/configure.wizard.test.ts (modified, +31/-1)
  • src/commands/configure.wizard.ts (modified, +2/-0)
  • src/commands/doctor-bundled-plugin-runtime-deps.test.ts (modified, +50/-18)
  • src/commands/doctor-bundled-plugin-runtime-deps.ts (modified, +25/-24)
  • src/commands/doctor-security.test.ts (modified, +2/-2)
  • src/commands/doctor-security.ts (modified, +1/-1)
  • src/commands/doctor/shared/channel-doctor.test.ts (modified, +1/-1)
  • src/commands/doctor/shared/channel-doctor.ts (modified, +1/-1)
  • src/commands/health.snapshot.test.ts (modified, +2/-2)
  • src/commands/health.ts (modified, +2/-2)
  • src/commands/onboard-non-interactive.gateway.test.ts (modified, +15/-0)
  • src/commands/onboard-non-interactive/local.ts (modified, +2/-0)
  • src/commands/post-config-runtime-deps.test.ts (added, +164/-0)
  • src/commands/post-config-runtime-deps.ts (added, +133/-0)
  • src/commands/status-all/channels.ts (modified, +3/-3)
  • src/commands/status-runtime-shared.test.ts (modified, +1/-1)
  • src/commands/status-runtime-shared.ts (modified, +1/-1)
  • src/commands/status.link-channel.ts (modified, +1/-1)
  • src/commands/status.scan-overview.test.ts (modified, +2/-2)
  • src/commands/status.scan-overview.ts (modified, +1/-1)
  • src/commands/status.scan.test.ts (modified, +3/-3)
  • src/gateway/config-reload-plan.ts (modified, +29/-0)
  • src/gateway/config-reload.test.ts (modified, +6/-0)
  • src/gateway/config-reload.ts (modified, +1/-0)
  • src/gateway/server-aux-handlers.test.ts (modified, +1/-0)
  • src/gateway/server-plugin-bootstrap.ts (modified, +5/-1)
  • src/gateway/server-plugins.ts (modified, +5/-1)
  • src/gateway/server-reload-handlers.ts (modified, +52/-0)
  • src/gateway/server-runtime-state.test.ts (added, +66/-0)
  • src/gateway/server-runtime-state.ts (modified, +5/-2)
  • src/gateway/server-startup-plugins.test.ts (modified, +130/-55)
  • src/gateway/server-startup-plugins.ts (modified, +145/-89)
  • src/gateway/server-startup-post-attach.test.ts (modified, +56/-0)
  • src/gateway/server-startup-post-attach.ts (modified, +26/-6)
  • src/gateway/server.impl.ts (modified, +66/-15)
  • src/gateway/server.reload.test.ts (modified, +92/-0)
  • src/gateway/server/readiness.test.ts (modified, +18/-0)
  • src/gateway/server/readiness.ts (modified, +3/-1)
  • src/infra/channel-summary.ts (modified, +1/-1)
  • src/infra/npm-install-env.ts (modified, +4/-0)
  • src/infra/safe-package-install.test.ts (modified, +10/-0)
  • src/infra/safe-package-install.ts (modified, +5/-0)
  • src/plugin-sdk/facade-loader.test.ts (modified, +1/-1)
  • src/plugins/bundled-runtime-deps-activity.ts (modified, +1/-5)
  • src/plugins/bundled-runtime-deps-install.ts (modified, +9/-47)
  • src/plugins/bundled-runtime-deps-lock.ts (modified, +7/-0)
  • src/plugins/bundled-runtime-deps-materialization.ts (modified, +76/-31)
  • src/plugins/bundled-runtime-deps-package-manager.ts (modified, +7/-1)
  • src/plugins/bundled-runtime-deps-roots.ts (modified, +66/-47)
  • src/plugins/bundled-runtime-deps-selection.ts (modified, +5/-5)
  • src/plugins/bundled-runtime-deps.test.ts (modified, +683/-69)
  • src/plugins/bundled-runtime-deps.ts (modified, +249/-80)
  • src/plugins/bundled-runtime-root.test.ts (modified, +109/-3)
  • src/plugins/bundled-runtime-root.ts (modified, +141/-24)

PR #75325: perf: skip runtime-deps manifest scans when materialized

Description (problem / solution / changelog)

Summary

  • Problem: packaged plugin lazy loading can still enter the package-level runtime-deps planning path even when the requested plugin deps are already materialized.
  • Why it matters: that path scans bundled plugin manifests and contributes avoidable filesystem work in the gateway CPU/event-loop starvation cluster.
  • What changed: add a fast path that proves the requested plugin deps plus mirrored root deps are already materialized before scanning all bundled plugin manifests.
  • What did NOT change (scope boundary): no install-root selection changes, no package-manager behavior changes, no Telegram/channel behavior changes.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Related #72338
  • Related #73532
  • Related #75069
  • Related #75283
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: the lazy plugin runtime-deps path only checked the full package-level plan after collecting all bundled plugin deps, so an already-materialized generated manifest still paid the all-plugin manifest scan cost.
  • Missing detection / guardrail: no regression test asserted that a materialized generated package-level manifest avoids reading unrelated plugin manifests.
  • Contributing context (if known): packaged installs with runtime-deps mirrors and channel/provider plugin loading make this path visible during gateway startup and control-plane operations.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/plugins/bundled-runtime-deps.test.ts
  • Scenario the test should lock in: generated package-level deps are already materialized, the requested plugin loads, and unrelated plugin manifests are not read.
  • Why this is the smallest reliable guardrail: it exercises the exact materialization helper path with filesystem fixtures and a read spy.
  • Existing test that already covers this (if any): adjacent runtime-deps materialization tests covered reinstall avoidance, but not scan avoidance.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

None directly. This is a performance/responsiveness optimization for packaged plugin runtime-deps loading.

Diagram (if applicable)

Before:
load plugin -> scan all bundled plugin manifests -> discover generated manifest is enough -> skip install

After:
load plugin -> prove requested deps already materialized -> skip all-plugin scan and install

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS local plus Blacksmith Linux Testbox
  • Runtime/container: Node/pnpm via repo wrappers
  • Model/provider: N/A
  • Integration/channel (if any): bundled plugin runtime-deps
  • Relevant config (redacted): OPENCLAW_PLUGIN_STAGE_DIR fixture

Steps

  1. Create a packaged OpenClaw fixture with two bundled plugins and an external runtime-deps install root.
  2. Materialize a generated package-level manifest containing both plugin deps.
  3. Load one plugin and assert the other plugin manifest is not read.

Expected

  • The lazy runtime-deps path returns without reinstalling and without scanning unrelated plugin manifests.

Actual

  • Before this patch, the path had to collect the package-level plan first. After this patch, the fast path returns before the all-plugin scan.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: focused runtime-deps test passed locally; combined changed gate passed on Blacksmith Testbox before PR split.
  • Edge cases checked: stale generated manifest and stale installed package paths remain covered by existing adjacent tests.
  • What you did not verify: live Linux npm-global Telegram repro/profile.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: an incomplete package-level generated manifest could hide missing deps.
    • Mitigation: the fast path checks only the requested plugin deps plus mirrored root deps with isRuntimeDepsPlanMaterialized; stale/incomplete manifests fall through to the existing full package-level plan.

Changed files

  • CHANGELOG.md (modified, +1/-1)
  • src/plugins/bundled-runtime-deps.test.ts (modified, +94/-0)
  • src/plugins/bundled-runtime-deps.ts (modified, +54/-1)

Code Example

lag.summary dur=89500ms ticks=5 max=49860ms over100=4

---

TOP SELF time in 89.5s window:
  8177.7ms  (garbage collector)
  2666.8ms  readFileUtf8
  2625.2ms  readFileUtf8
  2513.1ms  readFileUtf8
  2466.5ms  readFileUtf8
  ...
  1488.4ms  lstat
  1356.8ms  lstat
  1336.9ms  readFileSync (node:fs:433)
  ...
  1064.0ms  existsSync
   954.3ms  existsSync
   ...
   415.5ms  RegExp: (?:^|\n)\/\/#region extensions\/[^/\s]+(?:\/|$)

TOP TOTAL time in 89.5s window:
 45576.6ms  runEmbeddedAttempt
 45262.5ms  createOpenClawCodingTools / createOpenClawTools
 14000.9ms  createVideoGenerateTool / resolveVideoGenerationModelConfigForTool
 13953.3ms  createImageGenerateTool / resolveImageGenerationModelConfigForTool
 12876.2ms  createMusicGenerateTool / resolveMusicGenerationModelConfigForTool
 12721.4ms  runWithModelFallback → runFallbackAttempt → runAgentAttempt → runEmbeddedPiAgent

Time inside os.networkInterfaces() in window: 0.0 ms (0.0%)

---

... ← shouldMaterializeBundledRuntimeMirrorDistFile / materializeBundledRuntimeMirrorDistFile
    ← mirrorBundledRuntimeDistRootEntries
    ← prepareBundledPluginRuntimeDistMirror
     (anonymous)
    ← withBundledRuntimeDepsFilesystemLock
    ← mirrorBundledPluginRuntimeRoot
prepareBundledPluginRuntimeRoot   (per plugin, ×114)
    ← loadOpenClawPlugins | createVideoGenerateTool / createImageGenerateTool / createMusicGenerateTool

---

try {
  if (
    fs.realpathSync(sourcePath) === fs.realpathSync(targetPath) &&
    !fs.lstatSync(targetPath).isSymbolicLink()
  ) {
    return;
  }
} catch {}
fs.mkdirSync(path.dirname(targetPath), { recursive: true, mode: 0o755 });
fs.rmSync(targetPath, { recursive: true, force: true });
try {
  fs.linkSync(sourcePath, targetPath);
  ...
}

---

function shouldMaterializeBundledRuntimeMirrorDistFile(sourcePath) {
  if (!BUNDLED_RUNTIME_MIRROR_MATERIALIZED_EXTENSIONS.has(path.extname(sourcePath))) return false;
  try {
    return BUNDLED_RUNTIME_MIRROR_PLUGIN_REGION_RE.test(fs.readFileSync(sourcePath, "utf8"));
  } catch { return false; }
}

---

gw.send agent
gw.recv accepted                       ← phase-1 ack 15ms
... (5.9s)   lag.spike +5903ms
dc.recv coclaw.sessions.getById        ← unrelated RPC slips through during a brief unblock
... (32.5s)  lag.spike +32487ms
... (50.0s)  lag.spike +49860ms
gw.recv event agent                    ← first model event finally arrives
lag.summary dur=89500ms ticks=5 max=49860ms over100=4
RAW_BUFFERClick to expand / collapse

Bug type

Regression

Summary

Each pi-agent invocation triggers a synchronous "mirror" walk over every bundled plugin, blocking the gateway main thread. On the first agent run after gateway start, four full sweeps stack up and produce ~80–90 seconds of contiguous main-thread blocking (cold fs page cache). On subsequent agent runs the block is shorter — roughly 15 s per sweep on warm fs cache — but it never stops happening because the per-plugin work is not memoized.

Steps to reproduce

  1. Install OpenClaw 2026.4.26 globally via npm. The bundled plugin runtime install root lands at ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.26-<hash>/ with the default 114 bundled plugins on disk.
  2. Cold-start the gateway. Wait for it to be idle.
  3. Attach a CPU profile or a 200 ms tick main-thread lag probe inside the gateway process.
  4. Send the first agent RPC after gateway start.
  5. Observe the lag probe reports ~89 s of cumulative main-thread blocking before the first model event arrives.

A self-contained reproduction without running gateway is described under "Reproduction" below.

Expected behavior

Plugin runtime mirror preparation must not block the event loop for more than a few hundred milliseconds at a time, regardless of how many plugins are installed.

Actual behavior

Main thread is fully blocked. Lag probe summary from one observed first send:

lag.summary dur=89500ms ticks=5 max=49860ms over100=4

Three large spikes inside that 89.5 s: 5.9 s, 32.5 s, 49.9 s. CPU profile attribution (10 ms sampling, the 187 s..277 s window of a 9-minute capture, which is the lag window) places >99% of the time inside the mirror call chain:

TOP SELF time in 89.5s window:
  8177.7ms  (garbage collector)
  2666.8ms  readFileUtf8
  2625.2ms  readFileUtf8
  2513.1ms  readFileUtf8
  2466.5ms  readFileUtf8
  ...
  1488.4ms  lstat
  1356.8ms  lstat
  1336.9ms  readFileSync (node:fs:433)
  ...
  1064.0ms  existsSync
   954.3ms  existsSync
   ...
   415.5ms  RegExp: (?:^|\n)\/\/#region extensions\/[^/\s]+(?:\/|$)

TOP TOTAL time in 89.5s window:
 45576.6ms  runEmbeddedAttempt
 45262.5ms  createOpenClawCodingTools / createOpenClawTools
 14000.9ms  createVideoGenerateTool / resolveVideoGenerationModelConfigForTool
 13953.3ms  createImageGenerateTool / resolveImageGenerationModelConfigForTool
 12876.2ms  createMusicGenerateTool / resolveMusicGenerationModelConfigForTool
 12721.4ms  runWithModelFallback → runFallbackAttempt → runAgentAttempt → runEmbeddedPiAgent

Time inside os.networkInterfaces() in window: 0.0 ms (0.0%)

The 89.5 s splits into two contiguous synchronous runs:

  • 39.3 s starting at offset 187 s — root frame loadOpenClawPlugins (one full plugin-registry resolution).
  • 50.5 s starting at offset 226 s — root frame createOpenClawCodingTools. Three generate-tool factories each spin up a runEmbeddedPiAgent, and each pi-agent re-resolves the registry, which calls prepareBundledPluginRuntimeRoot once per plugin all over again.

Hot stack at the deepest sample:

... ← shouldMaterializeBundledRuntimeMirrorDistFile / materializeBundledRuntimeMirrorDistFile
    ← mirrorBundledRuntimeDistRootEntries
    ← prepareBundledPluginRuntimeDistMirror
    ← (anonymous)
    ← withBundledRuntimeDepsFilesystemLock
    ← mirrorBundledPluginRuntimeRoot
    ← prepareBundledPluginRuntimeRoot   (per plugin, ×114)
    ← loadOpenClawPlugins | createVideoGenerateTool / createImageGenerateTool / createMusicGenerateTool

Root cause analysis

The agent-run path triggers loadOpenClawPlugins (a synchronous, per-plugin loop) four times during a single first send: once via resolveRuntimePluginRegistry for the runtime context, and three more times via the video/image/music tool factories where each tool spawns a runEmbeddedPiAgent that re-resolves the registry. Each of those four resolutions calls prepareBundledPluginRuntimeRoot once per plugin (~114 plugins on a default install).

Inside prepareBundledPluginRuntimeRoot, mirrorBundledRuntimeDistRootEntries walks the entire dist/ top level (~2760 .js files, ~24 MB on disk) every time. There is no "this install root has already been mirrored in this process" guard.

Specific issues we identified while reading the source and reproducing the slowness:

1. Everything in the chain is function, not async function. All file IO is *Sync:

  • readFileSync, lstatSync, realpathSync, linkSync, rmSync, mkdirSync, existsSync, readdirSync, readlinkSync, copyFileSync, symlinkSync, renameSync, writeFileSync, chmodSync, statSync.
  • Even the file-system lock waits use a hard thread block: Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms) (src/plugins/bundled-runtime-deps.ts:426).

A single sample of a prepareBundledPluginRuntimeRoot call therefore parks the event loop for the entire wall-clock cost of all the syscalls put together.

2. No process-level dedup across plugins for the dist-root walk. The same (installRoot, sourceDistRoot) pair gets walked once per plugin. With 114 plugins × 4 invocations per first send, that is ~456 full dist/ sweeps per first send, even though the walk is logically idempotent for a fixed source dist.

3. materializeBundledRuntimeMirrorDistFile's early-return is broken in the steady state. At src/plugins/bundled-runtime-deps.ts:171:

try {
  if (
    fs.realpathSync(sourcePath) === fs.realpathSync(targetPath) &&
    !fs.lstatSync(targetPath).isSymbolicLink()
  ) {
    return;
  }
} catch {}
fs.mkdirSync(path.dirname(targetPath), { recursive: true, mode: 0o755 });
fs.rmSync(targetPath, { recursive: true, force: true });
try {
  fs.linkSync(sourcePath, targetPath);
  ...
}

For a hardlink target, realpathSync(target) returns the target's own canonical path, not the source path — realpath does not "deduplicate" hardlinks. So the equality fails even when the target is already the correct hardlink that points at the same inode as the source. We always fall through to rmSync(target) + linkSync(source, target), rewriting the same hardlink with no functional change. On a default install this happens to ~462 dist-root files on every single sweep.

The intent is "skip if target already mirrors source"; the actual condition should compare (dev, ino) from lstatSync, since two paths that share (dev, ino) already point to the same on-disk content.

4. The shipped 2026.4.26 build has no shouldMaterializeBundledRuntimeMirrorDistFile cache. The main branch adds bundledRuntimeMirrorMaterializeCache keyed by stat signature, which would short-circuit subsequent calls in the same process. The shipped compiled module reads + regex-tests every file every time:

function shouldMaterializeBundledRuntimeMirrorDistFile(sourcePath) {
  if (!BUNDLED_RUNTIME_MIRROR_MATERIALIZED_EXTENSIONS.has(path.extname(sourcePath))) return false;
  try {
    return BUNDLED_RUNTIME_MIRROR_PLUGIN_REGION_RE.test(fs.readFileSync(sourcePath, "utf8"));
  } catch { return false; }
}

This alone accounts for ~24 MB of readFileSync per sweep, repeated per plugin. After the cache lands in a future release this cost goes away, but issues 1, 2, and 3 above still produce 462 unconditional unlink + linkSync per plugin per sweep.

Reproduction (independent bench)

Three Node ESM scripts, no gateway needed. They directly import the compiled prepareBundledPluginRuntimeRoot from the installed openclaw and run sweeps against the real install root. Mirror is idempotent so the install root state after the bench is functionally identical to before it.

  • bench-mirror-real.mjs — restores the dist-root hardlinks back to symlinks (mimicking just-installed state) and runs prepareBundledPluginRuntimeRoot for several plugins in sequence. Reports per-call duration and a 200 ms tick lag probe summary.
  • bench-mirror-classes.mjs — runs four full 114-plugin sweeps in a row, mirroring the four sweeps observed in the CPU profile. Outputs per-round totals and per-plugin top-N timing.
  • bench-mirror-direct.mjs — independent reimplementation of the algorithm, tests sub-steps in isolation against a /tmp staging dir.

Steady-state warm-fs results on a WSL2 ext4 host (114 bundled plugins on disk):

WhatTime
One full 114-plugin sweep (steady state)~15 seconds
Average per-plugin prepareBundledPluginRuntimeRoot~130 ms
Four sweeps in a row (matches the four-sweep first-send pattern)~60 seconds
Hottest sub-step inside a sweepmirrorBundledRuntimeDistRootEntries (462 unlink+linkSync + 2298 existsSync + per-call regex over 2760 files)
Slowest per-plugin call (steady state, plugin with the largest dependency closure)~290 ms

Cold fs page cache (the production gateway scenario) inflates the first sweep from ~15 s to ~39 s and adds ~8 s of GC pressure across the four-sweep run. That gets us from 60 s warm to the 89.5 s observed.

The bench scripts are short (~150 lines each) and self-contained; happy to attach as a gist if useful.

Suggested fix

Convert the entire chain to fs.promises.* + await, and add per-process memoization for the dist-root mirror operation. Concretely:

  1. Async-ify the hot path. Make prepareBundledPluginRuntimeRoot / mirrorBundledPluginRuntimeRoot / prepareBundledPluginRuntimeDistMirror / mirrorBundledRuntimeDistRootEntries / refreshBundledPluginRuntimeMirrorRoot / copyBundledPluginRuntimeRoot / fingerprintBundledRuntimeMirrorSourceRoot / hashBundledRuntimeMirrorDirectory async and replace every *Sync call with the promise variant.

  2. Replace the synchronous lock with an async-friendly one. withBundledRuntimeDepsFilesystemLock can wrap the existing fs.mkdirSync(lockDir) acquisition behind an in-process Mutex (e.g., async-mutex) and await the work inside. The lock-dir acquisition itself can stay sync; the long-running work inside it must not.

  3. Yield to the event loop at directory boundaries. Inside hashBundledRuntimeMirrorDirectory and copyBundledPluginRuntimeRoot, await new Promise(setImmediate) once per directory (or per N entries) so a single large plugin tree cannot starve the loop. SHA-256 hashing has no async fs primitive, so this pattern is what keeps the hot loop cooperative.

  4. Memoize the dist-root mirror result by (installRoot, sourceDistRoot, source dist mtime). Once the dist root is mirrored in the current process, all subsequent plugins on the same source dist skip the per-plugin re-walk entirely. This collapses ~456 sweeps per first send to one.

  5. Fix materializeBundledRuntimeMirrorDistFile's early-return. Compare (dev, ino) from lstatSync(source) and lstatSync(target) instead of realpathSync equality. Two paths that share (dev, ino) are already the same file on disk; that is the actual condition the rewrite is trying to avoid.

  6. Land the existing bundledRuntimeMirrorMaterializeCache from main in a patch release, even ahead of the async migration. It cuts ~24 MB of readFileSync per sweep down to one stat per file in the steady state.

Items 4 and 5 alone collapse the 89.5 s observed block to roughly the cost of a single sweep (~15 s warm / ~39 s cold). Adding async + setImmediate yield (items 1–3) makes the remaining work non-blocking — events and other RPCs on the same gateway can interleave.

Regression evidence

src/plugins/bundled-runtime-root.ts was added on 2026-04-22 in commit 9c733956c0 ("fix(plugins): repair bundled deps on activation"), and src/plugins/bundled-runtime-mirror.ts on 2026-04-27 in commit 6f09039b0c ("fix(plugins): reuse unchanged runtime mirrors"). Together with ~8 follow-up fix(plugins): ... commits over the next two days, this code path replaced a much lighter pre-existing approach.

Earlier OpenClaw versions blocked the main thread on gateway restart but not for 80+ seconds, and openclaw logs --follow could still attach during the block. Both regressed in 2026.4.22+: on 2026.4.26, openclaw logs --follow cannot attach for the full duration of the first agent-run block.

This issue is related to but distinct from #74325 (gateway restart 75 s block). #74325 is dominated by mDNS / os.networkInterfaces() polling during gateway startup; this issue covers the bundled-plugin-mirror path that fires per agent run, with os.networkInterfaces() accounting for 0% of the lag window measured here.

OpenClaw version

2026.4.26

Operating system

Ubuntu 22.04 on WSL2 (kernel 6.6.87.2-microsoft-standard-WSL2), Node.js 22.21.1, ext4. The cold-fs amplification is platform-dependent (WSL2's seccomp_do_user_notification adds ~10 ms per os.networkInterfaces() call, but no impact on the mirror path here); the warm-fs ~15 s/sweep baseline reproduces on any platform with a default 114-plugin install.

Model

N/A — reproduces before any model call; the lag is between phase-1 ack and the first model event.

Provider / routing chain

N/A

Install method

npm global (npm install -g openclaw)

Logs, screenshots, and evidence

Plugin-side lag probe summary from one observed first send (timestamps from CoClaw's RPC/main-thread tracing):

gw.send agent
gw.recv accepted                       ← phase-1 ack 15ms
... (5.9s)   lag.spike +5903ms
dc.recv coclaw.sessions.getById        ← unrelated RPC slips through during a brief unblock
... (32.5s)  lag.spike +32487ms
... (50.0s)  lag.spike +49860ms
gw.recv event agent                    ← first model event finally arrives
lag.summary dur=89500ms ticks=5 max=49860ms over100=4

CPU profile call attribution is in the "Actual behavior" section above. Full .cpuprofile file (10 ms sampling, ~6 MB) and the bench scripts available on request.

Additional information

Reproduction was done by directly importing the compiled prepareBundledPluginRuntimeRoot from ~/.nvm/.../openclaw/dist/bundled-runtime-root-DEMD7-O_.js — same code path the gateway runs, just isolated from network and pi-agent overhead. Steady-state warm-fs numbers (~15 s/sweep) are a hard lower bound; production blocking will always be at least this much per sweep.

We are happy to provide the cpuprofile, the bench scripts, and any further measurements that help. We would also be happy to test a candidate fix on our setup.


Reported by the CoClaw team. This issue was discovered while developing @coclaw/openclaw-coclaw, a CoClaw channel plugin for OpenClaw.

extent analysis

TL;DR

The most likely fix for the main thread blocking issue is to convert the entire chain to fs.promises.* + await, add per-process memoization for the dist-root mirror operation, and fix the early-return condition in materializeBundledRuntimeMirrorDistFile.

Guidance

  • Identify and replace all synchronous file IO calls (*Sync) with their promise-based counterparts (fs.promises.*) to allow for asynchronous execution.
  • Implement a memoization mechanism to store the results of the dist-root mirror operation, so that subsequent plugins can reuse the cached result instead of re-executing the expensive operation.
  • Fix the early-return condition in materializeBundledRuntimeMirrorDistFile to correctly compare the (dev, ino) values from lstatSync instead of relying on realpathSync equality.
  • Consider landing the existing bundledRuntimeMirrorMaterializeCache from the main branch in a patch release to reduce the number of readFileSync calls.

Example

// Before
function prepareBundledPluginRuntimeRoot() {
  // ...
  fs.mkdirSync(lockDir);
  // ...
}

// After
async function prepareBundledPluginRuntimeRoot() {
  // ...
  await fs.promises.mkdir(lockDir);
  // ...
}

Notes

The provided guidance focuses on the most critical aspects of the issue. However, a thorough review of the code and additional testing may be necessary to ensure a complete fix.

Recommendation

Apply the suggested fixes, starting with the conversion to fs.promises.* and the implementation of memoization, to address the main thread blocking issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Plugin runtime mirror preparation must not block the event loop for more than a few hundred milliseconds at a time, regardless of how many plugins are installed.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Bundled plugin runtime mirror runs synchronously on every pi-agent invocation, blocking the gateway main thread for tens of seconds (regression in 2026.4.22+) [2 pull requests, 4 comments, 4 participants]