openclaw - ✅(Solved) Fix ARM64 performance: redundant loadOpenClawPlugins calls on every request via web provider and capability provider resolution [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75513Fetched 2026-05-02 05:33:40
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
2
Author
Timeline (top)
cross-referenced ×2commented ×1

Root Cause

Root causes

Fix Action

Fixed

PR fix notes

PR #75514: fix: avoid redundant loadOpenClawPlugins calls in web provider and auto-enable resolution (ARM64 perf)

Description (problem / solution / changelog)

Fixes #75513

What

Two targeted fixes that eliminate redundant loadOpenClawPlugins calls on every request, reducing tool creation time from ~47s → ~500ms on ARM64.

Fix 1: resolvePluginWebProviders — use active gateway registry directly

File: src/plugins/web-provider-runtime-shared.ts

resolvePluginWebProviders built load options with capability-specific onlyPluginIds. The resulting cacheKey never matched the active gateway registry (loaded at startup without onlyPluginIds), so resolveCompatibleRuntimePluginRegistry always returned null and loadOpenClawPlugins was called on every request.

The fix: when the active gateway registry is available, use it directly and let mapRegistryProviders filter by onlyPluginIds. The gateway registry was loaded at startup with the full plugin config.

Before (per-request on warm gateway, ARM64):

  • createWebSearchTool: ~8.3s
  • createImageGenerateTool: ~3.5s
  • createVideoGenerateTool: ~4.9s
  • createMusicGenerateTool: ~1.1s

After: <20ms each.

Fix 2: materializePluginAutoEnableCandidates — skip manifest load with empty candidates

File: src/config/plugin-auto-enable.apply.ts

applyPluginAutoEnable calls detectPluginAutoEnableCandidates (which loads the manifest registry) and then passes the result to materializePluginAutoEnableCandidates. When the candidate list is empty — common in production deployments with fully explicit plugin config — materializePluginAutoEnableCandidates loaded the manifest registry a second time unnecessarily.

The fix: early return when candidates is empty and no pre-loaded manifestRegistry was provided.

Measurements (Raspberry Pi 4, ARM64, Node 22, gateway mode)

# Warm request timing (2nd request), before fix:
before-imageGenerate  t=2ms
before-videoGenerate  t=3547ms   ← 3.5s
before-musicGenerate  t=8463ms   ← 4.9s
before-webSearch      t=9544ms   ← 1.1s
before-webFetch       t=17856ms  ← 8.3s  (no JIT benefit — pure per-request overhead)
after-createTools     t=18334ms

# Warm request timing, after fix:
before-imageGenerate  t=1ms
before-videoGenerate  t=5ms
before-musicGenerate  t=11ms
before-webSearch      t=14ms
before-webFetch       t=14ms
after-createTools     t=671ms

Total 2nd request: ~47s → ~14s. The createOpenClawCodingTools overhead is the dominant factor on ARM64; the LLM call and active-memory plugin account for the remaining ~13s.

Relation to prior work

Follows up on #73075 / PR #73076. The fixes in that PR addressed ensureOpenClawModelsJson caching. These two fixes address a separate hot path (createOpenClawCodingTools → tool provider resolution) that was not visible in the earlier investigation.

Note: resolvePluginCapabilityProviders had a related issue (fast path gated on !hasExplicitPluginConfig) that appears to already be resolved in the current main and ships in 2026.4.29.

Testing

Manually verified on ARM64 (Raspberry Pi 4). No behavioral change — the active gateway registry contains the same providers as a freshly loaded subset registry. The mapRegistryProviders filter by onlyPluginIds ensures only the relevant capability providers are returned.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/plugins/web-provider-runtime-shared.ts (modified, +21/-1)

PR #75022: fix(infer): load model catalog metadata-only for list/inspect/providers

Description (problem / solution / changelog)

Summary

  • Problem: openclaw infer model list, openclaw infer model inspect, and openclaw infer model providers hang indefinitely on 2026.4.27. The grandchild Node process spins at 100% CPU, opens zero TCP connections, and writes nothing to stdout/stderr until the timeout fires. Reported in #74986; --version, --help, and gateway status work.
  • Root cause (catalog-inspection slice): All three handlers funnel into loadModelCatalog(...) in src/agents/model-catalog.ts. Even on the read-only path, the function unconditionally calls augmentModelCatalogWithProviderPlugins(...) at the call site formerly on line 194, which threads through src/plugins/provider-runtime.ts:resolveProviderPluginsForCatalogHookssrc/plugins/provider-hook-runtime.ts:resolveProviderPluginsForHookssrc/plugins/providers.runtime.ts:resolvePluginProviderssrc/plugins/loader.ts:resolveRuntimePluginRegistry. On a CLI cold start with no active registry, resolveRuntimePluginRegistry falls through to loadOpenClawPlugins(...), which since b7a1bfd2 ("fix(plugins): cache installed manifest registry") builds an installed-manifest cache key via safeFileSignature (fs.statSync per plugin) + hashJson(...) over the index — exactly the synchronous CPU work the reporter's lsof/ps evidence points at (100% CPU, zero TCP, zero stdout). The agents list 2026.4.29 fixes (8fe449c8, d5eae0d9) addressed the same regression class on a different code path by routing channel queries through listReadOnlyChannelPluginsForConfig — i.e. "no plugin runtime on read-only metadata paths."
  • Fix: Add an opt-in skipProviderPluginAugmentation?: boolean option to loadModelCatalog. When true, the function returns the catalog assembled from PI SDK static rows + manifest static rows + cfg.models.providers configured rows, and skips the augmentModelCatalogWithProviderPlugins(...) call entirely. The three CLI inspection commands — infer model list, infer model inspect, and infer model providers (buildModelProviders) — pass readOnly: true and skipProviderPluginAugmentation: true. The flag is opt-in so existing readOnly: true callers (the only one outside this PR is appendCatalogSupplementRows in src/commands/models/list.rows.ts:347, used by models list --all) keep their dynamic plugin-derived rows.
  • What changed:
    • src/agents/model-catalog.ts:
      • Add skipProviderPluginAugmentation?: boolean to loadModelCatalog's param type with a doc comment that names the contract.
      • Gate the augmentModelCatalogWithProviderPlugins(...) call behind the new flag; emit plugin-models-skipped instead of plugin-models-merged when bypassed.
      • Add a parallel readOnlyModelCatalogPromise cache slot for readOnly: true callers that do want augmentation, so long-running hosts (appendCatalogSupplementRows) don't rebuild from scratch on every call. Skip-augmentation callers deliberately stay uncached: their result is a strict subset of rows and must not be served the with-augmentation cache. useCache: false and resetModelCatalogCache() symmetrically invalidate both slots; the empty-result branch and the catch handler null the matching slot to avoid cache poisoning.
    • src/cli/capability-cli.ts: buildModelProviders (used by infer model providers), infer model list, and infer model inspect pass readOnly: true, skipProviderPluginAugmentation: true. Command --description text now points users to openclaw models list --all for live provider-discovered models.
    • src/cli/capability-cli.test.ts: three tests assert the flag combination per command.
    • src/agents/model-catalog.test.ts: five tests lock the new contract — augmentation actually skipped when the flag is set, read-only cache reuse for non-skip callers, metadata-only result must not be served as the with-augmentation cache, useCache: false paired with readOnly: true invalidates the read-only slot, and useCache: false without readOnly also invalidates the read-only slot (cross-slot freshness from the write-path caller in src/commands/auth-choice.model-check.ts).
  • What did NOT change (scope boundary):
    • CHANGELOG.md — left untouched; release-note wording is the maintainer's call.
    • Default behavior of loadModelCatalog: when skipProviderPluginAugmentation is omitted/false, the augmentation step still runs exactly as before, so models list --all (appendCatalogSupplementRows) and every other current readOnly: true caller keeps the same catalog contents.
    • ensureOpenClawModelsJson, buildShouldSuppressBuiltInModel, the manifest planner, and the model-catalog cache for the non-read-only slot: untouched.
    • infer model run (local + gateway), infer model auth, image/audio/tts/embedding subcommands: out of scope; they are write/run paths and do not go through the read-only catalog read.
    • No new exports, no plugin-SDK / public-surface contract changes, no any introduced.

Cross-reference — already fixed upstream (no overlap with this PR)

Issue #74986 reported four hang commands. Three are addressed on main or by this PR via separate code paths; the fourth (infer model run via the gateway path) is tracked separately under "Out of scope". This PR has no textual conflict with any of the upstream fixes:

CommandStatusFiles / functions touched
agents listFixed on main by 8fe449c8, d5eae0d9 (2026-04-26)src/commands/agents.commands.list.ts, src/commands/agents.providers.ts, src/commands/health-format.ts, src/commands/message-format.ts
infer model run --localFixed on main by 12ee7f69 (2026-04-29)src/agents/pi-embedded-runner/model.ts, src/agents/simple-completion-runtime.ts, src/cli/capability-cli.ts:656 (one-line cfg, add inside runModelRun, far from this PR's buildModelProviders / registerCapabilityCli call sites)
infer model list / inspect / providersFixed by this PRsrc/agents/model-catalog.ts, src/cli/capability-cli.ts (buildModelProviders, model.command("list"|"inspect"|"providers"))
infer model run (gateway path)Not yet fixedTracked under "Out of scope" — needs a focused follow-up issue with profile evidence

Reproduction

On 2026.4.27 (or current main before this PR), with a ~/.openclaw/config.yaml similar to the reporter's:

agents:
  defaults:
    llm: { idleTimeoutSeconds: 600 }
    model: { primary: ollama/qwen3.5:397b-cloud }
models:
  providers:
    ollama:
      baseUrl: http://winhost:11434
      apiKey: ollama-local
      api: ollama
openclaw gateway status                                  # works
openclaw infer model list                                # before fix: hangs at 100% CPU until timeout
                                                          # after fix: returns the catalog and exits
openclaw infer model inspect --model openai/gpt-5.4      # same
openclaw infer model providers --json                    # same

The hung process can be confirmed with ps -o pcpu,etimes,wchan,comm -p <pid> (CPU pegged at ~100, no progress) and lsof -p <pid> (only the std{out,err} pipes, zero TCP — i.e. work is happening before any provider network probe).

Risk / Mitigation

  • Risk 1 — different output for catalog list: Skipping augmentModelCatalogWithProviderPlugins means infer model list / inspect / providers no longer surface dynamic plugin-discovered models (e.g. live Ollama models from /api/tags). The output is now: PI SDK static rows + manifest-declared rows + cfg.models.providers configured rows.
    • Mitigation: For inspection commands this is the right trade-off — the user wants "what does the catalog know about" to return promptly, not "what does the live Ollama daemon currently expose"; the latter is what models scan / models list --all are for, both of which still go through the dynamic path (their loadModelCatalog({ readOnly: true }) call site does not pass the new flag). The hang the reporter sees is a strictly worse failure mode than slightly-less-fresh output. The new flag is opt-in, so no other call site changes. The updated command --description strings point users to models list --all for live discovery.
  • Risk 2 — test coverage: Need to lock the new metadata-only contract so a future refactor doesn't silently regress.
    • Mitigation: Three CLI tests assert the flag combination per command; five model-catalog tests verify (a) augmentation is genuinely not called when the flag is set, (b) the with-augmentation read-only cache is reused on repeat calls, (c) a metadata-only result is not served to a later non-skip caller, (d) useCache: false paired with readOnly: true invalidates the read-only slot, and (e) useCache: false without readOnly (the write-path direction used by src/commands/auth-choice.model-check.ts) also invalidates the read-only slot — locking the symmetric cross-slot freshness contract so a future revert of the guard relaxation cannot silently leave a stale read-only cache visible to inspection callers.
  • Risk 3 — typing/security: No any introduced; only an existing optional parameter is added (skipProviderPluginAugmentation?: boolean) and consulted via a strict === true check. No change to data flow, secrets handling, plugin trust boundary, or external surface.

Out of scope (tracked separately)

This PR intentionally does not address:

  • loadOpenClawPlugins synchronous hot-path cost — the safeFileSignature (per-plugin fs.statSync) + hashJson(...) cache-key build introduced by b7a1bfd2. This is the underlying engine that any non-skip catalog refresh still hits, and it has separate user-visible symptoms beyond the catalog inspection commands. Tracked in #75512 (per-turn re-evaluation, BLOCKER), #75069 (synchronous mirror walk blocking gateway main thread), and #75513 (ARM64 redundant calls on every request).
  • infer model run hang via the gateway path (prepareSimpleCompletionModelForAgentresolveModelAsyncprepareProviderRuntimeAuth). The --local variant is already fixed on main by 12ee7f69; the gateway variant warrants a focused issue with profile evidence.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • CLI
  • Agents/models
  • Tests

Linked Issue/PR

Refs #74986. Of the four hang commands reported in that issue:

  • agents list — already fixed on main by 8fe449c8 / d5eae0d9 (2026-04-26).
  • infer model run --local — already fixed on main by 12ee7f69 (2026-04-29).
  • infer model list — fixed by this PR. The same fix is preventively extended to infer model inspect and infer model providers, which share the read-only catalog code path but were not individually exercised by the reporter.
  • infer model run via the gateway path — still open, intentionally out of scope here (see "Out of scope" section); needs a focused follow-up issue with profile evidence before it can be closed.

This PR therefore does not by itself fully address #74986; the issue should remain open until the gateway-run path is tracked and a separate fix lands.

Changed files

  • src/agents/model-catalog.test.ts (modified, +168/-0)
  • src/agents/model-catalog.ts (modified, +51/-13)
  • src/cli/capability-cli.test.ts (modified, +54/-0)
  • src/cli/capability-cli.ts (modified, +27/-6)

Code Example

// Before expensive loadOptions path:
const gatewayRegistry = getActivePluginRegistry();
if (gatewayRegistry) {
  return deps.mapRegistryProviders({ registry: gatewayRegistry, onlyPluginIds: params.onlyPluginIds });
}

---

if (params.candidates.length === 0 && !params.manifestRegistry) {
  return { config, changes: [], autoEnabledReasons: {} };
}

---

before-imageGenerate t=4ms
before-videoGenerate t=14725ms   ← 14.7s for createImageGenerateTool
before-musicGenerate t=32025ms   ← 17.3s for createVideoGenerateTool  
before-webSearch     t=39304ms   ← 7.3s  for createMusicGenerateTool
before-webFetch      t=47189ms   ← 8.5s  for createWebSearchTool

---

before-imageGenerate t=4ms
before-videoGenerate t=8ms       ← <1ms each
before-musicGenerate t=12ms
before-webSearch     t=17ms
before-webFetch      t=17ms      ← 3ms for createWebSearchTool
RAW_BUFFERClick to expand / collapse

Follow-up to #73075 / PR #73076

After updating to 2026.4.27 (which includes the fixes from PR #73076), response times on ARM64 (Raspberry Pi 4, Node 22) are still severely degraded. Two new independent hot paths call `loadOpenClawPlugins` on every request.

Measured impact (ARM64, Raspberry Pi 4)

Metric2026.4.27 without fix2026.4.27 with fix
`createOpenClawCodingTools` (cold)~47s~500ms
`createOpenClawCodingTools` (warm)~18s~670ms
Total 1st request~120s~38s
Total 2nd request~47s~14s

Individual tool constructor overhead (warm request, before fix):

  • `createImageGenerateTool`: ~3.5s per request
  • `createVideoGenerateTool`: ~4.9s per request
  • `createMusicGenerateTool`: ~1.1s per request
  • `createWebSearchTool`: ~8.3s per request (every request, no JIT improvement)

Root causes

Bug 1: resolvePluginWebProviders calls loadOpenClawPlugins on every request

File: src/plugins/web-provider-runtime-shared.ts

resolvePluginWebProviders builds loadOptions with capability-specific onlyPluginIds (e.g. only web-search provider plugin IDs). The cacheKey produced by these options never matches the active gateway registry (which was loaded at startup without onlyPluginIds), so resolveCompatibleRuntimePluginRegistry always returns null and loadOpenClawPlugins is called on every request.

On ARM64, `loadOpenClawPlugins` takes 8–17s per call due to synchronous module evaluation (V8 JIT). This affects every tool that uses `resolvePluginWebProviders`: web search, image generation, video generation, and music generation.

Fix: When the active gateway registry is available, use it directly and filter by onlyPluginIds afterward. The gateway registry was loaded at startup with the full plugin config, so its provider lists are already correct.

// Before expensive loadOptions path:
const gatewayRegistry = getActivePluginRegistry();
if (gatewayRegistry) {
  return deps.mapRegistryProviders({ registry: gatewayRegistry, onlyPluginIds: params.onlyPluginIds });
}

Bug 2: materializePluginAutoEnableCandidates loads manifest registry even with empty candidates

File: src/config/plugin-auto-enable.apply.ts

applyPluginAutoEnable calls detectPluginAutoEnableCandidates first — which already loads the manifest registry. If that returns an empty list (no auto-enable candidates, common in production deployments), materializePluginAutoEnableCandidates then loads it again unnecessarily.

Fix: Early return when candidates.length === 0:

if (params.candidates.length === 0 && !params.manifestRegistry) {
  return { config, changes: [], autoEnabledReasons: {} };
}

Note: The third related issue (resolvePluginCapabilityProviders skipping the active registry when hasExplicitPluginConfig is true) appears to already be fixed in the current main and is included in 2026.4.29.

Diagnosis methodology

Timing instrumentation added to createOpenClawCodingTools isolated the hot path to individual tool constructors:

before-imageGenerate t=4ms
before-videoGenerate t=14725ms   ← 14.7s for createImageGenerateTool
before-musicGenerate t=32025ms   ← 17.3s for createVideoGenerateTool  
before-webSearch     t=39304ms   ← 7.3s  for createMusicGenerateTool
before-webFetch      t=47189ms   ← 8.5s  for createWebSearchTool

After the fix (same request):

before-imageGenerate t=4ms
before-videoGenerate t=8ms       ← <1ms each
before-musicGenerate t=12ms
before-webSearch     t=17ms
before-webFetch      t=17ms      ← 3ms for createWebSearchTool

The warm request improvement (8.3s → 3ms for webSearch) confirms the issue is per-request overhead, not JIT cold-start.

Environment

  • Platform: Raspberry Pi 4, ARM64, Linux 6.12
  • Node: v22.22.0
  • OpenClaw: 2026.4.27 / 2026.4.29
  • Config: gateway mode with plugins.allow explicitly set (telegram, openrouter, active-memory, openclaw-honcho, openai, anthropic)

PR with fix: coming in a follow-up comment.

extent analysis

TL;DR

The most likely fix for the degraded response times on ARM64 is to apply the fixes for Bug 1 and Bug 2, which involve modifying the resolvePluginWebProviders and materializePluginAutoEnableCandidates functions to reduce unnecessary calls to loadOpenClawPlugins.

Guidance

  • Apply the fix for Bug 1 by using the active gateway registry directly and filtering by onlyPluginIds afterward in the resolvePluginWebProviders function.
  • Apply the fix for Bug 2 by early returning when candidates.length === 0 in the materializePluginAutoEnableCandidates function.
  • Verify the fix by measuring the response times after applying the changes and checking if the per-request overhead has been reduced.
  • Review the timing instrumentation added to createOpenClawCodingTools to isolate the hot path and confirm the issue is resolved.

Example

// Fix for Bug 1
const gatewayRegistry = getActivePluginRegistry();
if (gatewayRegistry) {
  return deps.mapRegistryProviders({ registry: gatewayRegistry, onlyPluginIds: params.onlyPluginIds });
}

// Fix for Bug 2
if (params.candidates.length === 0 && !params.manifestRegistry) {
  return { config, changes: [], autoEnabledReasons: {} };
}

Notes

The fixes are specific to the resolvePluginWebProviders and materializePluginAutoEnableCandidates functions and may not apply to other parts of the codebase. The issue is specific to the ARM64 platform and Node 22, so the fixes may not be necessary for other environments.

Recommendation

Apply the workaround by implementing the fixes for Bug 1 and Bug 2, as they directly address the root causes of the issue and have been verified to reduce the per-request overhead.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix ARM64 performance: redundant loadOpenClawPlugins calls on every request via web provider and capability provider resolution [2 pull requests, 1 comments, 2 participants]