openclaw - ✅(Solved) Fix resolvePluginCapabilityProviders triggers redundant full plugin loads per message (image/video/music generation) [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73729Fetched 2026-04-29 06:15:51
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Author
Timeline (top)
cross-referenced ×6commented ×1

Every message dispatch triggers three independent full-registry loads (image / video / music generation providers), each re-registering all ~119 plugins. Together they cost ~113 seconds per message even when the user has never configured any image/video/music provider.

Root Cause

resolvePluginCapabilityProviders (in src/plugins/capability-provider-runtime.ts) calls resolveCapabilityProviderConfig({ key, cfg }) which wraps params.cfg with capability-specific bundled pluginIds via withBundledPluginAllowlistCompat / EnablementCompat / VitestCompat. Each capability (image/video/music) has a different set of bundled pluginIds, producing a different wrapped config, which produces a different cacheKey in buildCacheKey. The downstream LRU therefore misses and re-runs the full load.

Since the actual work is "load all 119 plugins" regardless of which capability key is requested, these three loads are effectively duplicates.

Fix Action

Workaround

Applied the above patch locally to dist/capability-provider-runtime-*.js and observed second-message dispatch drop from 5m26s → 45s (~113s saved).

Related: #73728

PR fix notes

PR #73847: fix(plugins): key web-provider snapshot cache on config-content fingerprint (#73730)

Description (problem / solution / changelog)

Fixes #73730. Refs #73729 and #73835 (sister/corroborating reports in the same lane).

Problem

`resolvePluginWebProviders` previously kept its snapshot cache as:

```ts WeakMap<OpenClawConfig, WeakMap<NodeJS.ProcessEnv, Map<string, Entry>>> ```

— keyed on `OpenClawConfig` object identity at the outer level. As reported in #73730 with full instrumentation, callers like `resolveWebSearchRuntimeConfig` and `resolveWebFetchRuntimeConfig` build a fresh `config` object per dispatch, so the outer `WeakMap.get(cacheOwnerConfig)` always missed even though the inner `cacheKey` string was identical (`load-miss key=0afb40389a fields={ws:".../workspace",scope:"b623e8",plg:"85d4c2",...}` repeated message-after-message). Every dispatch paid the full ~30s `loadOpenClawPlugins` cycle.

Three users reported variants of the same root cause:

  • #73730 poolside-ventures: web-provider snapshot WeakMap miss (this PR)
  • #73729 poolside-ventures: capability-provider full reload per message (sister bug, same root cause shape)
  • #73835 brokemac79: idle gateway high CPU/RSS, CPU profile points to repeated `loadOpenClawPlugins` → `mirrorBundledPluginRuntimeRoot`

Fix

Switch the snapshot cache from an identity-keyed nested `WeakMap` to a flat `Map<string, Entry>` keyed entirely on `buildWebProviderSnapshotCacheKey`. The cache key is extended to include a stable content fingerprint of the resolution-relevant `config.plugins` subset (allowlist, entries enabled state, per-plugin config — exactly what `loadPluginManifestRegistryForPluginRegistry` and `loadInstalledWebProviderManifestRecords` actually consume).

Equal-content fresh config objects now produce the same cache key and hit. Genuinely different configs produce different keys and stay isolated — no false-positive collisions.

The fingerprint computation itself is memoized by config-object identity (`WeakMap<config, hashString>`), so callers that share a reference pay the hash cost only once. Callers that build a fresh config per dispatch (the original failure mode) still pay one `hashJson` per call, but `hashJson` runs in microseconds versus `loadOpenClawPlugins` running in seconds — net wall-clock win is the same ~30s saved per dispatch the issue measured.

What changed

FileChange
`web-provider-resolution-shared.ts`Added `fingerprintWebProviderResolutionConfig` helper + extended `buildWebProviderSnapshotCacheKey` to include the fingerprint
`web-provider-runtime-shared.ts`Changed `WebProviderSnapshotCache` type from `WeakMap<config, WeakMap<env, Map<key, Entry>>>` to `Map<string, Entry>`, simplified the lookup/store sites accordingly. Dropped the no-longer-needed `OpenClawConfig` type import.
`web-provider-runtime-shared.test.ts`Two new regression tests
`CHANGELOG.md`Unreleased Fixes line citing #73730 + #73729 + #73835

Tests

``` pnpm vitest run src/plugins/web-provider-runtime-shared.test.ts → 5 passed (3 existing + 2 new)

pnpm vitest run src/plugins/web-provider-runtime-shared.test.ts \ src/plugins/web-provider-resolution-shared.test.ts \ src/plugins/web-fetch-providers.runtime.test.ts \ src/plugins/web-search-providers.runtime.test.ts → 32 passed (30 existing + 2 new, no regressions across the four files) ```

The two new regression tests:

  1. Fresh-but-equal-content configs hit the cache — exercises the exact #73730 path: build a new `config` object reference per call with identical content; assert `loadOpenClawPlugins` is invoked once across two calls (pre-fix: twice).
  2. Content-different configs miss the cache — invariant guard: `{ plugins: { entries: { brave: { enabled: true } } } }` and `{ plugins: { entries: { brave: { enabled: false } } } }` produce different fingerprints and both calls miss the cache.

Why this shape over the alternatives in my earlier triage comment

In #73730 I proposed two shapes: (1) identity-intern the resolved config, (2) hash config content into the cache key. This PR implements (2) because:

  • Interning would fight `OpenClawConfig` mutation, which several gateway paths perform (config reloads, cron edits)
  • The fingerprint approach has a clean invalidation story: edit any `config.plugins.entries[*]` field → hash differs → next call misses → cache repopulates with the new state
  • TTL eviction (`resolvePluginSnapshotCacheTtlMs`) was already part of the contract, so the `Map` size is already bounded

If maintainer prefers shape (1) instead, happy to rebase. The diff for shape (1) would be smaller but trickier to invalidate.

Why this is one-shot for #73730 only

#73729 (capability-provider) and #73835 (gateway prewarm) share the same root-cause shape but live in different files. Folding all three into one PR would cross 400 LOC and 3+ separable concerns. This PR fixes #73730 with the smallest possible scope; the same content-fingerprint pattern can be applied to the capability-provider cache (`capability-provider-runtime.ts`) and the gateway prewarm path as follow-ups if the maintainer agrees with the direction here.

🦞 lobster-biscuit


Sign-Off: hclsys

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/plugins/web-provider-resolution-shared.test.ts (modified, +51/-0)
  • src/plugins/web-provider-resolution-shared.ts (modified, +63/-0)
  • src/plugins/web-provider-runtime-shared.test.ts (modified, +166/-0)
  • src/plugins/web-provider-runtime-shared.ts (modified, +66/-30)

PR #73853: [AI-assisted] fix(plugins): reduce startup provider registry reloads

Description (problem / solution / changelog)

Fixes #73835. Fixes #73729. Refs #73730. Refs #73847.

Summary

This follows @hclsys's #73847, which covers the #73730 web-provider snapshot cache path. This PR intentionally leaves that web-provider work out and focuses on the remaining repeated plugin-registry load surfaces reported in #73835 and #73729.

  • Keep gateway startup primary-model prewarm on provider-discovery entries only, with the active workspace passed through so startup metadata snapshots can be reused instead of falling through to full plugin runtime loads.
  • Thread the entry-only provider discovery mode through models.json planning and fingerprinting so cache entries remain distinct from full discovery.
  • Scope capability-provider fallback registry loads to the manifest-derived bundled owner plugins, avoiding broad image/video/music snapshot loads during tool setup.

Issue Context

#73835's CPU profile points at startup/model prewarm repeatedly reaching loadOpenClawPlugins and bundled runtime mirror refresh work. #73729 reports the related capability-provider path where image, video, and music provider listing can trigger repeated full registry loads. #73730 is covered by @hclsys's #73847, and this PR is meant to complement that teamwork effort rather than duplicate it.

AI Assistance

AI-assisted with Codex. The implementation was driven from #73835, the reporter-provided Discord guidance, and the linked #73729/#73730 discussion.

Tests

  • node scripts/test-projects.mjs src/gateway/server-startup.test.ts src/gateway/server-startup-post-attach.test.ts src/agents/models-config.providers.implicit.discovery-scope.test.ts src/plugins/provider-discovery.runtime.test.ts src/plugins/capability-provider-runtime.test.ts src/plugins/web-provider-runtime-shared.test.ts src/plugins/web-search-providers.runtime.test.ts src/plugins/web-fetch-providers.runtime.test.ts -> passed 3 Vitest shards, 8 files, 72 tests
  • corepack pnpm test:contracts:plugins -> 56 files, 751 tests passed
  • node .\node_modules\@typescript\native-preview\bin\tsgo.js -p tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test-cpu-prewarm.tsbuildinfo -> passed
  • git diff --check -> passed

Contributor Checklist Notes

Update: addressed Greptile P2 in 66e9f93f.

  • Backend/runtime change; no screenshots applicable.
  • I did not run full pnpm build && pnpm check && pnpm test; instead I ran the focused affected shards, plugin contract lane, and TypeScript test project check above.
  • Attempted local Codex review per CONTRIBUTING.md, but the Windows app execution alias failed with Access is denied for both codex review --base origin/main and codex review --uncommitted.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/models-config.plan.ts (modified, +7/-0)
  • src/agents/models-config.providers.implicit.discovery-scope.test.ts (modified, +16/-0)
  • src/agents/models-config.providers.implicit.ts (modified, +2/-0)
  • src/agents/models-config.ts (modified, +10/-0)
  • src/gateway/server-startup-post-attach.test.ts (modified, +10/-0)
  • src/gateway/server-startup-post-attach.ts (modified, +11/-0)
  • src/gateway/server-startup.test.ts (modified, +9/-0)
  • src/plugins/capability-provider-runtime.test.ts (modified, +77/-0)
  • src/plugins/capability-provider-runtime.ts (modified, +32/-3)
  • src/plugins/provider-discovery.runtime.test.ts (modified, +11/-0)

Code Example

load-miss key=49bf11d083 fields={ws:null,scope:"2be88c",plg:"5ec1c8",actMeta:"dbc6e2",...}
  from: at resolvePluginCapabilityProviders (capability-provider-runtime:198)
        <- at resolvePluginImageGenerationProviders (provider-registry:27)
        <- at listImageGenerationProviders (...:53)
load-done key=49bf11d083 elapsedMs=39211 pluginsLoaded=119

load-miss key=76705525ad fields={ws:null,scope:"2be88c",plg:"057e03",actMeta:"43a091",...}
  from: at resolvePluginCapabilityProviders
        <- at resolvePluginVideoGenerationProviders
        <- at listVideoGenerationProviders
load-done key=76705525ad elapsedMs=47425 pluginsLoaded=119

load-miss key=c9bc9a5997 fields={ws:null,scope:"2be88c",plg:"276d72",actMeta:"a15618",...}
  from: at resolvePluginCapabilityProviders
        <- at resolvePluginMusicGenerationProviders
load-done key=c9bc9a5997 elapsedMs=26324 pluginsLoaded=119

---

const capResultCache = new WeakMap<object, Map<CapabilityKey, {providers: Provider[], expiresAt: number}>>();
const NO_CFG_ANCHOR = {};
const TTL_MS = 5 * 60 * 1000;

function resolvePluginCapabilityProviders(params) {
  const anchor = params.cfg ?? NO_CFG_ANCHOR;
  let bucket = capResultCache.get(anchor);
  if (!bucket) { bucket = new Map(); capResultCache.set(anchor, bucket); }
  const cached = bucket.get(params.key);
  if (cached && cached.expiresAt > Date.now()) return cached.providers;
  const providers = /* existing implementation */;
  bucket.set(params.key, { providers, expiresAt: Date.now() + TTL_MS });
  return providers;
}
RAW_BUFFERClick to expand / collapse

Environment

  • openclaw 2026.4.26
  • Windows 11, Node.js (bundled)
  • feishu channel plugin, ~7 user plugins + bundled providers

Summary

Every message dispatch triggers three independent full-registry loads (image / video / music generation providers), each re-registering all ~119 plugins. Together they cost ~113 seconds per message even when the user has never configured any image/video/music provider.

Reproduction

  1. Fresh gateway restart with feishu channel configured
  2. Send a plain-text message to the bot (e.g. "hello")
  3. Observe dispatch time: ~5 minutes

Evidence

Instrumenting loadOpenClawPlugins to log cache-miss events (cacheKey short-hash + onlyPluginIds + stack frames) shows three separate loads per dispatch, each with onlyPluginIds: null (i.e. full scope):

load-miss key=49bf11d083 fields={ws:null,scope:"2be88c",plg:"5ec1c8",actMeta:"dbc6e2",...}
  from: at resolvePluginCapabilityProviders (capability-provider-runtime:198)
        <- at resolvePluginImageGenerationProviders (provider-registry:27)
        <- at listImageGenerationProviders (...:53)
load-done key=49bf11d083 elapsedMs=39211 pluginsLoaded=119

load-miss key=76705525ad fields={ws:null,scope:"2be88c",plg:"057e03",actMeta:"43a091",...}
  from: at resolvePluginCapabilityProviders
        <- at resolvePluginVideoGenerationProviders
        <- at listVideoGenerationProviders
load-done key=76705525ad elapsedMs=47425 pluginsLoaded=119

load-miss key=c9bc9a5997 fields={ws:null,scope:"2be88c",plg:"276d72",actMeta:"a15618",...}
  from: at resolvePluginCapabilityProviders
        <- at resolvePluginMusicGenerationProviders
load-done key=c9bc9a5997 elapsedMs=26324 pluginsLoaded=119

Note the three cache keys differ only in the plg (plugins config hash) and actMeta fields, while onlyPluginIds, activate/loadMods/etc. are identical. All three return "all 119 plugins" because onlyPluginIds is null — the compat config only affects allowlist merging, not load scope.

Root Cause

resolvePluginCapabilityProviders (in src/plugins/capability-provider-runtime.ts) calls resolveCapabilityProviderConfig({ key, cfg }) which wraps params.cfg with capability-specific bundled pluginIds via withBundledPluginAllowlistCompat / EnablementCompat / VitestCompat. Each capability (image/video/music) has a different set of bundled pluginIds, producing a different wrapped config, which produces a different cacheKey in buildCacheKey. The downstream LRU therefore misses and re-runs the full load.

Since the actual work is "load all 119 plugins" regardless of which capability key is requested, these three loads are effectively duplicates.

Suggested Fix

Add a per-capability result cache keyed by stable inputs (not by compatConfig object identity). A minimal fix:

const capResultCache = new WeakMap<object, Map<CapabilityKey, {providers: Provider[], expiresAt: number}>>();
const NO_CFG_ANCHOR = {};
const TTL_MS = 5 * 60 * 1000;

function resolvePluginCapabilityProviders(params) {
  const anchor = params.cfg ?? NO_CFG_ANCHOR;
  let bucket = capResultCache.get(anchor);
  if (!bucket) { bucket = new Map(); capResultCache.set(anchor, bucket); }
  const cached = bucket.get(params.key);
  if (cached && cached.expiresAt > Date.now()) return cached.providers;
  const providers = /* existing implementation */;
  bucket.set(params.key, { providers, expiresAt: Date.now() + TTL_MS });
  return providers;
}

Workaround

Applied the above patch locally to dist/capability-provider-runtime-*.js and observed second-message dispatch drop from 5m26s → 45s (~113s saved).

Related: #73728

extent analysis

TL;DR

Implement a per-capability result cache to avoid duplicate loads of all 119 plugins for each message dispatch.

Guidance

  • Identify the resolvePluginCapabilityProviders function in src/plugins/capability-provider-runtime.ts as the root cause of the issue.
  • Apply the suggested fix by adding a per-capability result cache, using a WeakMap to store cached results and a Map to store capability-specific providers.
  • Verify the fix by checking the dispatch time after sending a plain-text message to the bot, expecting a significant reduction in time.
  • Consider applying the workaround by patching the dist/capability-provider-runtime-*.js file locally to observe the performance improvement.

Example

The provided code snippet in the issue body demonstrates a minimal fix using a WeakMap and a Map to cache capability-specific providers:

const capResultCache = new WeakMap<object, Map<CapabilityKey, {providers: Provider[], expiresAt: number}>>();

This example shows how to implement the cache and store providers for each capability.

Notes

The suggested fix assumes that the CapabilityKey and Provider types are defined elsewhere in the codebase. Additionally, the TTL_MS value (5 minutes) may need to be adjusted based on the specific requirements of the application.

Recommendation

Apply the suggested fix by implementing the per-capability result cache, as it addresses the root cause of the issue and provides a significant performance improvement.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix resolvePluginCapabilityProviders triggers redundant full plugin loads per message (image/video/music generation) [2 pull requests, 1 comments, 2 participants]