openclaw - ✅(Solved) Fix Gateway re-scans plugin metadata during model normalization [2 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#78461Fetched 2026-05-07 03:36:42
View on GitHub
Comments
2
Participants
2
Timeline
7
Reactions
2
Author
Timeline (top)
referenced ×3commented ×2cross-referenced ×2

Gateway-owned plugin metadata snapshots are not always reused during model alias / manifest model-id normalization. In a workspace-scoped gateway runtime, resolveDefaultModelForAgent() can fall back to loadPluginMetadataSnapshot() during every embedded agent attempt, causing repeated plugin index / manifest discovery in the Node gateway process.

Root Cause

The gateway sets the current plugin metadata snapshot without the gateway workspace:

setCurrentPluginMetadataSnapshot(pluginLookUpTable, { config: gatewayPluginConfigAtStart });

But manifest model-id normalization resolves the active registry workspace:

const workspaceDir = params.workspaceDir ?? getActivePluginRegistryWorkspaceDirFromState();
const current = getCurrentPluginMetadataSnapshot({ config, env, workspaceDir });

That can make the current snapshot lookup miss even though the gateway already has a valid metadata snapshot for the running plugin set. When it misses, the normalization path reloads plugin metadata instead of reusing the gateway-owned snapshot.

Fix Action

Fix / Workaround

Validation from local patch

PR fix notes

PR #78480: fix(plugins): reuse workspace snapshot for model normalization

Description (problem / solution / changelog)

Problem

Manifest model-id normalization can miss the Gateway-owned plugin metadata snapshot when the caller does not pass an explicit workspace and the active runtime workspace has not been set yet. In that case resolveMetadataSnapshotForPolicies() calls getCurrentPluginMetadataSnapshot() without allowWorkspaceScopedSnapshot, so a valid workspace-scoped current snapshot is rejected and the path falls back to loadPluginMetadataSnapshot().

That matches the source-level slow path reported in #78461: model normalization can rebuild plugin metadata / installed plugin indexes during embedded agent prep even though Gateway already prepared a compatible metadata snapshot.

Fixes #78461.

Fix

  • Normalize omitted config to {} before snapshot compatibility checks, matching adjacent manifest readers.
  • Pass allowWorkspaceScopedSnapshot: true for unscoped callers, while still honoring an explicit workspaceDir or active runtime workspace when present.
  • Avoid passing workspaceDir: undefined into the fallback loader.
  • Add regression coverage for a workspace-scoped current snapshot with no active runtime workspace.

Commit shape

This is intentionally split into two commits:

  1. fix(plugins): reuse workspace snapshot for model normalization
  2. test(plugins): cover unscoped model normalization snapshot reuse

Scope boundary

What changed:

  • src/plugins/manifest-model-id-normalization.ts
  • src/plugins/manifest-model-id-normalization.test.ts

What did not change:

  • provider selection semantics
  • plugin install/index discovery policy
  • current snapshot compatibility checks
  • Gateway lifecycle / startup ordering
  • provider runtime cache behavior from #77948

Audits

  • Existing-helper check: reused existing getCurrentPluginMetadataSnapshot(... allowWorkspaceScopedSnapshot) behavior already used by model catalog and manifest-contract readers.
  • Shared-helper caller check: did not change getCurrentPluginMetadataSnapshot contract; only changed this caller.
  • Rival scan: #77948 is adjacent provider-runtime snapshot reuse, but does not touch src/plugins/manifest-model-id-normalization.ts; #78474 is package-state probe resolution for #78462 and does not cover this path.

Tests

pnpm test src/plugins/manifest-model-id-normalization.test.ts src/plugins/current-plugin-metadata-snapshot.test.ts src/agents/pi-embedded-runner/run/attempt-stage-timing.test.ts
# passed 2 Vitest shards; 18 tests passed

pnpm -s tsgo:core
# exit 0

pnpm -s build
# exit 0

git diff --check origin/main..HEAD
# exit 0

Real behavior proof

Behavior or issue addressed: Unscoped manifest model-id normalization now reuses the Gateway-owned workspace-scoped plugin metadata snapshot instead of falling back to a cold loadPluginMetadataSnapshot() path when the active runtime workspace is still unset. This addresses the #78461 model-normalization snapshot miss.

Real environment tested: Local OpenClaw source checkout /home/chenglunhu/code/openclaw, branch fix/manifest-model-normalization-snapshot-78461, built with pnpm -s build, OpenClaw 2026.5.6, Node v24.13.0, Linux/WSL host. Runtime env for the smoke used an empty temporary OPENCLAW_STATE_DIR and OPENCLAW_DISABLE_BUNDLED_PLUGINS=1 so the alpha/demo-model result can only come from the current metadata snapshot, not from a fallback manifest scan.

Exact steps or command run after this patch:

pnpm -s build
OPENCLAW_DISABLE_BUNDLED_PLUGINS=1 OPENCLAW_STATE_DIR="$(mktemp -d)" node --input-type=module <<'EOF'
import { t as normalizeProviderModelIdWithManifest } from './dist/manifest-model-id-normalization-BsZfNFDu.js';
import { r as setCurrentPluginMetadataSnapshot, t as clearCurrentPluginMetadataSnapshot } from './dist/current-plugin-metadata-snapshot-vavLhBTt.js';
import { _ as resolveInstalledPluginIndexPolicyHash } from './dist/installed-plugin-index-store-BekK6vjO.js';

const policyHash = resolveInstalledPluginIndexPolicyHash({});
const workspaceDir = '/tmp/openclaw-real-workspace-78461';
const index = {
  version: 1,
  hostContractVersion: 'real-proof',
  compatRegistryVersion: 'real-proof',
  migrationVersion: 1,
  policyHash,
  generatedAtMs: Date.now(),
  installRecords: {},
  plugins: [],
  diagnostics: [],
};
const snapshot = {
  policyHash,
  workspaceDir,
  index,
  plugins: [
    {
      id: 'normalizer-real-proof',
      modelIdNormalization: {
        providers: {
          demo: {
            prefixWhenBare: 'alpha',
          },
        },
      },
    },
  ],
};

clearCurrentPluginMetadataSnapshot();
setCurrentPluginMetadataSnapshot(snapshot, { config: {}, env: process.env });
const normalized = normalizeProviderModelIdWithManifest({
  provider: 'demo',
  context: { provider: 'demo', modelId: 'demo-model' },
});
console.log(`openclawVersion=2026.5.6`);
console.log(`command=node --input-type=module dist manifest-normalization smoke`);
console.log(`activeRuntimeWorkspace=unset`);
console.log(`storedSnapshotWorkspace=${workspaceDir}`);
console.log(`normalized=${normalized}`);
console.log(`workspaceScopedSnapshotReused=${normalized === 'alpha/demo-model'}`);
EOF

Evidence after fix: copied terminal output from the built OpenClaw dist smoke:

openclawVersion=2026.5.6
command=node --input-type=module dist manifest-normalization smoke
activeRuntimeWorkspace=unset
storedSnapshotWorkspace=/tmp/openclaw-real-workspace-78461
normalized=alpha/demo-model
workspaceScopedSnapshotReused=true

Observed result after fix: With no active runtime workspace and an empty plugin state dir, normalizeProviderModelIdWithManifest() returned alpha/demo-model from the stored workspace-scoped current snapshot. That verifies the unscoped caller reused the current snapshot path after this patch.

What was not tested: I did not run a live Gateway CPU profile in this PR. The live performance evidence is in #78461; this PR verifies the exact source-level snapshot miss with a built-dist smoke and regression test.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/plugins/manifest-model-id-normalization.test.ts (modified, +14/-0)
  • src/plugins/manifest-model-id-normalization.ts (modified, +6/-4)

PR #78589: perf(plugins): re-publish loaded snapshot to current-snapshot slot in gateway flows

Description (problem / solution / changelog)

What

Re-publish freshly loaded plugin metadata snapshots into the gateway's single-slot current-plugin-metadata-snapshot handoff so the model-resolution hot path stops walking the installed plugin index from disk on every transcript-event broadcast / heartbeat / model-resolution iteration.

The chain: every transcript-event broadcast → buildGatewaySessionRowresolveSessionModelRef → ... → resolveMetadataSnapshotForPolicies. That function is meant to reuse the workspace-scoped plugin-metadata snapshot the gateway publishes at boot. In a running gateway the cached slot can be empty (it gets cleared on every writePersistedInstalledPluginIndex, and pre-fix was never re-published until the next gateway restart). Once the slot is empty, every call falls back to loadPluginMetadataSnapshot, which walks the installed plugin index from disk on every event broadcast.

Why

Local CPU profiles of the deployed gateway showed this fallback path consuming ~3.75% inclusive of main-thread time on a quiet system, with buildInstalledPluginIndex appearing in the inclusive top of every heartbeat / model-resolution iteration.

How

When resolveMetadataSnapshotForPolicies has to load fresh because the slot is empty, re-publish the freshly loaded snapshot into the same single-slot handoff (current-plugin-metadata-snapshot) the gateway already uses at boot. The publish only fires when:

  1. getActivePluginRegistryWorkspaceDirFromState() returns a workspace (we're inside a gateway flow, not a CLI/test).
  2. The loaded snapshot already carries a workspaceDir.

This is the architecturally-sanctioned producer mechanism, not a new cache:

  • src/plugins/AGENTS.md still forbids persistent metadata caches.
  • The slot is invalidated by every writePersistedInstalledPluginIndex exactly as before.
  • We just lazily refill it the same way gateway boot does.

CLI flows and tests with no published workspace continue to do per-call fingerprint-driven freshness, preserving the manifest-edit-detection contract covered by the existing reflects-manifest-edits test.

Real behavior proof

  • Behavior or issue addressed: The model-resolution hot path (every transcript-event broadcast → buildGatewaySessionRowresolveSessionModelRef → ... → resolveMetadataSnapshotForPolicies) was walking the installed plugin index from disk on every iteration whenever the current-plugin-metadata-snapshot slot was empty (the post-writePersistedInstalledPluginIndex state). After the fix the slot stays warm between persisted-index writes the same way it does at boot.

  • Real environment tested: Local OpenClaw topic branch perf/manifest-model-id-lazy-publish at f7972b51b8, run from a real worktree at ~/projects/worktrees/openclaw/perf-manifest-model-id-lazy-publish against real on-disk plugin install index + manifest in temp dirs, real setActivePluginRegistry runtime state, real loadPluginMetadataSnapshot disk walk, real setCurrentPluginMetadataSnapshot single-slot publish, real normalizeProviderModelIdWithManifest public surface. No vitest mocks of the seam under test.

  • Exact steps or command run after this patch: pnpm tsx scripts/proof-78589-manifest-model-id-lazy-publish.ts. The script instruments fs.openSync to count opens of installs.json (the fs-safe tryReadJsonSync path uses openSync + readSync, not readFileSync, so wrapping openSync is the right hook), runs N=5 normalize calls per scenario, and self-asserts the perf invariant (calls 2..N must add 0 opens once the slot is warm). To verify the proof catches the bug, I also ran git checkout origin/main -- src/plugins/manifest-model-id-normalization.ts && pnpm tsx scripts/proof-78589-manifest-model-id-lazy-publish.ts, observed the proof fail with cold-slot post-call snapshot: expected defined, got undefined, then restored.

  • Evidence after fix (terminal output, copied live runtime output, saved log artifact): full captured runtime output is saved at ~/reports/proof-78589-manifest-model-id-lazy-publish/run-output.txt and reproduced in the fenced block below — actual stdout from running the proof script against the patched topic branch on a real OpenClaw checkout.

    Copied live output (terminal capture from pnpm tsx scripts/proof-78589-manifest-model-id-lazy-publish.ts):

    [proof-manifest] Real-runtime behavior proof for manifest-model-id lazy re-publish.
    [proof-manifest] Production code paths: normalizeProviderModelIdWithManifest + loadPluginMetadataSnapshot
    [proof-manifest]                            + setCurrentPluginMetadataSnapshot + setActivePluginRegistry.
    
    [proof-manifest] Scenario 1: gateway flow, cold slot (the bug fix).
    [proof-manifest]   call 1 (cold) -> 3 installs.json open(s).
    [proof-manifest]   calls 2..5 (slot warmed by fix) -> 0 installs.json open(s).
    
    [proof-manifest] Scenario 2: gateway flow, warm slot (pre-existing reuse).
    [proof-manifest]   5 normalize calls -> 0 installs.json open(s).
    
    [proof-manifest] Scenario 3: CLI flow, no active workspace (refresh contract).
    [proof-manifest]   5 normalize calls -> 15 installs.json open(s) (must scale with calls).
    
    [proof-manifest] All runtime assertions passed.
  • Observed result after fix: Across 5 normalize calls in the gateway-flow scenario, the cold load opened installs.json 3 times once, then calls 2..5 added 0 opens — the slot is warm and the disk walk is gone. The warm-slot scenario performs 0 opens across all 5 calls (pre-existing reuse pinned). The CLI-flow scenario performs 15 opens across 5 calls (3× per cold load × N), confirming the slot is still NOT refilled in CLI flows so manifest-edit detection between CLI invocations stays intact.

  • What was not tested: Steady-state CPU profile of the deployed gateway after the fix landed (the original ~3.75% inclusive main-thread time observation was pre-fix only; that profile is what motivated the change but was not re-captured for this PR — the proof script measures the underlying perf invariant the CPU spend was tracking). I also did not measure persisted-index-write invalidation timing under load.

  • Before evidence (optional): Same proof script after git checkout origin/main -- src/plugins/manifest-model-id-normalization.ts produced [proof-manifest] FAILED: ... cold-slot post-call snapshot: expected defined, got undefined, confirming the proof's assertion fires on the pre-fix code.

Proof script details (mechanism, supplemental to evidence above)

scripts/proof-78589-manifest-model-id-lazy-publish.ts is a self-checking real-runtime harness that drives the production normalize hot path against real production singletons (no vitest mocks of the seam under test):

  • real on-disk plugin install index + manifest in temp dirs
  • real setActivePluginRegistry runtime state
  • real loadPluginMetadataSnapshot disk walk
  • real setCurrentPluginMetadataSnapshot single-slot publish
  • real normalizeProviderModelIdWithManifest public surface

It instruments fs.openSync to count opens of installs.json (the fs-safe tryReadJsonSync path the production code uses opens the file with openSync + readSync, so wrapping openSync is the right hook). Across N=5 normalize calls per scenario, it asserts:

  1. Gateway, cold slot (the fix): cold load opens the index ≥1 time, then calls 2..N add 0 opens (the perf invariant). Slot is observably refilled after call 1.
  2. Gateway, warm slot (pre-existing reuse): 0 opens across N calls.
  3. CLI flow, no active workspace: opens scale with N — the slot must stay empty so CLI surfaces still observe manifest edits between invocations.

The proof exits non-zero on any invariant violation. I reverted the topic's manifest-model-id-normalization.ts change locally and confirmed the proof fails (cold-slot post-call snapshot: expected defined, got undefined), then restored.

(Captured runtime output reproduced under Evidence after fix above.)

[proof-manifest] Real-runtime behavior proof for manifest-model-id lazy re-publish.
[proof-manifest] Production code paths: normalizeProviderModelIdWithManifest + loadPluginMetadataSnapshot
[proof-manifest]                            + setCurrentPluginMetadataSnapshot + setActivePluginRegistry.

[proof-manifest] Scenario 1: gateway flow, cold slot (the bug fix).
[proof-manifest]   call 1 (cold) -> 3 installs.json open(s).
[proof-manifest]   calls 2..5 (slot warmed by fix) -> 0 installs.json open(s).

[proof-manifest] Scenario 2: gateway flow, warm slot (pre-existing reuse).
[proof-manifest]   5 normalize calls -> 0 installs.json open(s).

[proof-manifest] Scenario 3: CLI flow, no active workspace (refresh contract).
[proof-manifest]   5 normalize calls -> 15 installs.json open(s) (must scale with calls).

[proof-manifest] All runtime assertions passed.

Validation gate

  • pnpm tsgo:core: clean
  • pnpm vitest run src/plugins/manifest-model-id-normalization.test.ts: 6/6 passed (includes the existing reflects-manifest-edits test plus two new cases pinning the gateway-flow re-publish and the CLI-flow non-publish behavior)
  • pnpm oxlint scripts/proof-manifest-model-id-lazy-publish.ts src/plugins/manifest-model-id-normalization.ts: 0 warnings, 0 errors
  • pnpm plugin-sdk:api:check: OK
  • pnpm tsx scripts/proof-manifest-model-id-lazy-publish.ts: all runtime assertions passed (output above)
  • git diff --check: clean

Tests

Unit tests added in the original commit (bc5748143b):

  • src/plugins/manifest-model-id-normalization.test.ts — two new cases:
    • re-publishes a freshly loaded snapshot to the current-snapshot slot when a gateway-style plugin-registry workspace is active
    • does not re-publish when no active plugin-registry workspace is set (CLI flow keeps fingerprint-driven freshness)

The new proof script is the real-runtime supplement for these unit tests.

CHANGELOG entry: included under ## Unreleased### Changes.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • scripts/proof-78589-manifest-model-id-lazy-publish.ts (added, +420/-0)
  • src/plugins/manifest-model-id-normalization.test.ts (modified, +46/-0)
  • src/plugins/manifest-model-id-normalization.ts (modified, +24/-8)

Code Example

prep stages ... totalMs=15513
system-prompt:openclaw-reference-paths:8788ms

---

runEmbeddedAttempt
resolveDefaultModelForAgent
resolveConfiguredModelRef
buildModelAliasIndex
parseModelRefWithCompatAlias
parseModelRef
normalizeProviderModelId
loadManifestModelIdNormalizationPolicies
resolveMetadataSnapshotForPolicies
loadPluginMetadataSnapshot
loadPluginRegistrySnapshotWithMetadata
loadInstalledPluginIndex
discoverOpenClawPlugins

---

setCurrentPluginMetadataSnapshot(pluginLookUpTable, { config: gatewayPluginConfigAtStart });

---

const workspaceDir = params.workspaceDir ?? getActivePluginRegistryWorkspaceDirFromState();
const current = getCurrentPluginMetadataSnapshot({ config, env, workspaceDir });

---

setCurrentPluginMetadataSnapshot(pluginLookUpTable, {
  config: gatewayPluginConfigAtStart,
  workspaceDir: defaultWorkspaceDir,
});

---

const workspaceScopedCurrent = getCurrentPluginMetadataSnapshot({
  config: params.config,
  env,
  allowWorkspaceScopedSnapshot: true,
});

---

node scripts/test-projects.mjs \
  src/plugins/current-plugin-metadata-snapshot.test.ts \
  src/plugins/manifest-model-id-normalization.test.ts \
  src/agents/pi-embedded-runner/run/attempt-stage-timing.test.ts

Test Files 3 passed
Tests 18 passed

---

pnpm tsgo:core
pnpm build

---

setCurrentPluginMetadataSnapshot(pluginLookUpTable, { config: gatewayPluginConfigAtStart });
setCurrentPluginMetadataSnapshot(nextPluginLookUpTable, { config: params.nextConfig });
RAW_BUFFERClick to expand / collapse

Summary

Gateway-owned plugin metadata snapshots are not always reused during model alias / manifest model-id normalization. In a workspace-scoped gateway runtime, resolveDefaultModelForAgent() can fall back to loadPluginMetadataSnapshot() during every embedded agent attempt, causing repeated plugin index / manifest discovery in the Node gateway process.

Observed impact

On a live gateway running OpenClaw 2026.5.4 (70d92b5), embedded agent prep repeatedly logged:

prep stages ... totalMs=15513
system-prompt:openclaw-reference-paths:8788ms

An inspector CPU profile showed the time was not actually reference-path lookup; the marker included the preceding model resolution path:

runEmbeddedAttempt
resolveDefaultModelForAgent
resolveConfiguredModelRef
buildModelAliasIndex
parseModelRefWithCompatAlias
parseModelRef
normalizeProviderModelId
loadManifestModelIdNormalizationPolicies
resolveMetadataSnapshotForPolicies
loadPluginMetadataSnapshot
loadPluginRegistrySnapshotWithMetadata
loadInstalledPluginIndex
discoverOpenClawPlugins

This made every agent turn pay a large fixed cost before the stream was ready, and contributed to gateway event-loop pressure.

Root cause

The gateway sets the current plugin metadata snapshot without the gateway workspace:

setCurrentPluginMetadataSnapshot(pluginLookUpTable, { config: gatewayPluginConfigAtStart });

But manifest model-id normalization resolves the active registry workspace:

const workspaceDir = params.workspaceDir ?? getActivePluginRegistryWorkspaceDirFromState();
const current = getCurrentPluginMetadataSnapshot({ config, env, workspaceDir });

That can make the current snapshot lookup miss even though the gateway already has a valid metadata snapshot for the running plugin set. When it misses, the normalization path reloads plugin metadata instead of reusing the gateway-owned snapshot.

Suggested fix

Two small changes fixed the live repro locally:

  1. Include defaultWorkspaceDir when handing off the current metadata snapshot at gateway startup and reload:
setCurrentPluginMetadataSnapshot(pluginLookUpTable, {
  config: gatewayPluginConfigAtStart,
  workspaceDir: defaultWorkspaceDir,
});
  1. In manifest model-id normalization, when the caller did not explicitly request a workspace, allow reuse of a workspace-scoped current snapshot:
const workspaceScopedCurrent = getCurrentPluginMetadataSnapshot({
  config: params.config,
  env,
  allowWorkspaceScopedSnapshot: true,
});

Validation from local patch

After applying the above locally:

  • The per-turn prep stages totalMs=15513-16467 warnings stopped appearing.
  • A shira embedded agent smoke completed successfully with OK.
  • Targeted tests passed:
node scripts/test-projects.mjs \
  src/plugins/current-plugin-metadata-snapshot.test.ts \
  src/plugins/manifest-model-id-normalization.test.ts \
  src/agents/pi-embedded-runner/run/attempt-stage-timing.test.ts

Test Files 3 passed
Tests 18 passed

Also passed:

pnpm tsgo:core
pnpm build

Version notes

I also inspected the published [email protected] tarball. The dist bundle still appears to call:

setCurrentPluginMetadataSnapshot(pluginLookUpTable, { config: gatewayPluginConfigAtStart });
setCurrentPluginMetadataSnapshot(nextPluginLookUpTable, { config: params.nextConfig });

So this does not appear fixed in 2026.5.5.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Gateway re-scans plugin metadata during model normalization [2 pull requests, 2 comments, 2 participants]