openclaw - 💡(How to fix) Fix [Bug]: status / sessions.list / models.list block event loop ~9s per call due to plugin manifest cache miss on every session row

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

RPC handlers that iterate session rows (status, sessions.list, models.list, topics.list) block the event loop for ~9 seconds per call by rebuilding the plugin manifest normalization snapshot once per session row. With 70+ sessions accumulated in one agent, every first-screen RPC fan-out turns into a guaranteed 10-second freeze, even after the gateway has fully started.

Root Cause

  1. buildSessionRows in src/commands/status.summary.ts calls resolveSessionModelRef(cfg, entry, agentId) once per session.
  2. resolveSessionModelRef in src/commands/status.summary.runtime.ts falls through to parseModelRef → normalizeModelRef → normalizeProviderModelId → normalizeStaticProviderModelId → normalizeProviderModelIdWithManifest → resolveManifestModelIdNormalizationPolicy → loadManifestModelIdNormalizationPolicies.
  3. The parseModelRef / normalizeProviderModelId parameter shapes only expose allowPluginNormalization and manifestPlugins. There is no path to forward config or workspaceDir, so by the time loadManifestModelIdNormalizationPolicies({}) is reached, both are undefined.
  4. That routes through resolveMetadataSnapshotForPolicies({})getCurrentPluginMetadataSnapshot({config: undefined, workspaceDir: undefined}).
  5. In src/plugins/current-plugin-metadata-snapshot.ts, the snapshot is rejected by the guard if (snapshot.workspaceDir !== undefined && requestedWorkspaceDir === undefined) return undefined. The gateway-side state was populated with a non-undefined workspaceDir by setCurrentPluginMetadataSnapshot(pluginLookUpTable, { config: gatewayPluginConfigAtStart }) in src/gateway/server.impl.ts:666 (which falls back to snapshot.workspaceDir when options.workspaceDir is missing). The cache is therefore a permanent miss for this call chain.
  6. The fallback path loadPluginMetadataSnapshot({config: {}, env, workspaceDir}) runs in full per row, and resolveMetadataSnapshotForPolicies returns cacheable: false so subsequent rows never populate cachedPolicies either. Final cost: ~49 ms × 73 sessions × 2 (because byAgent and allSessions both invoke buildSessionRows over the same store paths) ≈ 7.3 s of redundant manifest rebuilding.

Fix Action

Fix / Workaround

Option A (minimal patch): in loadManifestModelIdNormalizationPolicies (src/plugins/manifest-model-id-normalization.ts), add a process-wide fallback cache for the params.config === undefined case, since none of the call paths from buildSessionRows carry config down. A short TTL (e.g., 60 seconds, matching the existing cache's lifecycle on policy changes) keeps invalidation correct on plugin updates. A local patch of this exact shape on the bundled dist reduces openclaw status --json from 11.27 s to 3.8 s with no functional regression — JSON output is identical except for natural age deltas.

Concrete patch shape applied locally to verify:

Code Example

[STATUS-PROF] +707ms   after-runtime-module-import
[STATUS-PROF] +60ms    after-resolveModelAndContextDefaults
[STATUS-PROF] +3957ms  byAgent[main]  buildRows=3956ms n=71
[STATUS-PROF] +53ms    byAgent[tester]  buildRows=53ms n=1
[STATUS-PROF] +53ms    byAgent[xiaoquan] buildRows=53ms n=1
[STATUS-PROF] +3933ms  after-allSessions n=73  (allSessions reruns buildSessionRows over the same store paths)
[STATUS-PROF-ROWS] calls=146 resolveModelRef=7906ms resolveCtxTokens=0.5ms resolveRuntime=4.6ms
[STATUS-PROF-NPM] calls=147 tStatic=7260ms tLoadRT=0ms tNormRT=0ms

---

let cachedPolicies: ManifestModelIdNormalizationPolicyCache | undefined;
+
+// Process-wide fallback cache for the "no config" call path
+// (e.g. status / sessions.list row resolvers that cannot forward config/workspaceDir).
+// 60s TTL matches the staleness window of the existing fingerprint cache.
+let fallbackCachedPolicies: Map<string, PluginManifestModelIdNormalizationProvider> | null = null;
+let fallbackCachedAtMs = 0;
+const FALLBACK_CACHE_TTL_MS = 60_000;

 function loadManifestModelIdNormalizationPolicies(
   params: ManifestModelIdNormalizationLookupParams = {},
 ): Map<string, PluginManifestModelIdNormalizationProvider> {
   if (params.plugins) {
     return collectManifestModelIdNormalizationPolicies(params.plugins);
   }
+  if (!params.config) {
+    const now = Date.now();
+    if (fallbackCachedPolicies && now - fallbackCachedAtMs < FALLBACK_CACHE_TTL_MS) {
+      return fallbackCachedPolicies;
+    }
+  }
   const { snapshot, cacheable } = resolveMetadataSnapshotForPolicies(params);
   const configFingerprint = snapshot.configFingerprint;
   if (cacheable && configFingerprint && cachedPolicies?.configFingerprint === configFingerprint) {
     return cachedPolicies.policies;
   }
   const policies = collectManifestModelIdNormalizationPolicies(snapshot.plugins);
   if (cacheable && configFingerprint) {
     cachedPolicies = { configFingerprint, policies };
   }
+  if (!params.config) {
+    fallbackCachedPolicies = policies;
+    fallbackCachedAtMs = Date.now();
+  }
   return policies;
 }

---

openclaw gateway restart
time openclaw status --json > /dev/null     # ~3.8 s (was ~11 s)
time openclaw status --json > /dev/null     # ~3.8 s (cache hits, no further degradation)
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug

Summary

RPC handlers that iterate session rows (status, sessions.list, models.list, topics.list) block the event loop for ~9 seconds per call by rebuilding the plugin manifest normalization snapshot once per session row. With 70+ sessions accumulated in one agent, every first-screen RPC fan-out turns into a guaranteed 10-second freeze, even after the gateway has fully started.

Steps to reproduce

  1. Install OpenClaw 2026.5.7 globally (npm install -g openclaw).
  2. Accumulate 70+ sessions in one agent (~/.openclaw/agents/<id>/sessions/sessions.json).
  3. Run time openclaw status --json — observe ~11 seconds wall clock.
  4. Re-run — still ~11 seconds (cache never warms up).

Expected behavior

After the gateway has finished startup-time plugin bootstrap, status / sessions.list / models.list should return well under a second, and subsequent calls should reuse the already-built plugin manifest snapshot.

Actual behavior

Every invocation takes ~10 seconds with no improvement on repeated calls. The gateway log reports eventLoopDelayMaxMs ~ 9445, and other RPCs that happen to share the fan-out batch get queued behind this serial work.

OpenClaw version

2026.5.7 (also reproducible on origin/main HEAD — call chain is unchanged)

Operating system

Ubuntu 22.04 on WSL2 (Linux 6.6.87.2-microsoft-standard-WSL2)

Model

N/A

Provider / routing chain

N/A

Install method

npm global

Logs, screenshots, and evidence

Profiled with process.hrtime.bigint() instrumentation injected into the bundled getStatusSummary and normalizeProviderModelId:

[STATUS-PROF] +707ms   after-runtime-module-import
[STATUS-PROF] +60ms    after-resolveModelAndContextDefaults
[STATUS-PROF] +3957ms  byAgent[main]  buildRows=3956ms n=71
[STATUS-PROF] +53ms    byAgent[tester]  buildRows=53ms n=1
[STATUS-PROF] +53ms    byAgent[xiaoquan] buildRows=53ms n=1
[STATUS-PROF] +3933ms  after-allSessions n=73  (allSessions reruns buildSessionRows over the same store paths)
[STATUS-PROF-ROWS] calls=146 resolveModelRef=7906ms resolveCtxTokens=0.5ms resolveRuntime=4.6ms
[STATUS-PROF-NPM] calls=147 tStatic=7260ms tLoadRT=0ms tNormRT=0ms

tStatic=7260ms is time spent inside normalizeStaticProviderModelId, which means the manifest normalization path runs in full ~49 ms per session row (146 rows × 49 ms ≈ 7.2 s) instead of hitting any cache.

Root cause

  1. buildSessionRows in src/commands/status.summary.ts calls resolveSessionModelRef(cfg, entry, agentId) once per session.
  2. resolveSessionModelRef in src/commands/status.summary.runtime.ts falls through to parseModelRef → normalizeModelRef → normalizeProviderModelId → normalizeStaticProviderModelId → normalizeProviderModelIdWithManifest → resolveManifestModelIdNormalizationPolicy → loadManifestModelIdNormalizationPolicies.
  3. The parseModelRef / normalizeProviderModelId parameter shapes only expose allowPluginNormalization and manifestPlugins. There is no path to forward config or workspaceDir, so by the time loadManifestModelIdNormalizationPolicies({}) is reached, both are undefined.
  4. That routes through resolveMetadataSnapshotForPolicies({})getCurrentPluginMetadataSnapshot({config: undefined, workspaceDir: undefined}).
  5. In src/plugins/current-plugin-metadata-snapshot.ts, the snapshot is rejected by the guard if (snapshot.workspaceDir !== undefined && requestedWorkspaceDir === undefined) return undefined. The gateway-side state was populated with a non-undefined workspaceDir by setCurrentPluginMetadataSnapshot(pluginLookUpTable, { config: gatewayPluginConfigAtStart }) in src/gateway/server.impl.ts:666 (which falls back to snapshot.workspaceDir when options.workspaceDir is missing). The cache is therefore a permanent miss for this call chain.
  6. The fallback path loadPluginMetadataSnapshot({config: {}, env, workspaceDir}) runs in full per row, and resolveMetadataSnapshotForPolicies returns cacheable: false so subsequent rows never populate cachedPolicies either. Final cost: ~49 ms × 73 sessions × 2 (because byAgent and allSessions both invoke buildSessionRows over the same store paths) ≈ 7.3 s of redundant manifest rebuilding.

Suggested fix

Option A (minimal patch): in loadManifestModelIdNormalizationPolicies (src/plugins/manifest-model-id-normalization.ts), add a process-wide fallback cache for the params.config === undefined case, since none of the call paths from buildSessionRows carry config down. A short TTL (e.g., 60 seconds, matching the existing cache's lifecycle on policy changes) keeps invalidation correct on plugin updates. A local patch of this exact shape on the bundled dist reduces openclaw status --json from 11.27 s to 3.8 s with no functional regression — JSON output is identical except for natural age deltas.

Concrete patch shape applied locally to verify:

 let cachedPolicies: ManifestModelIdNormalizationPolicyCache | undefined;
+
+// Process-wide fallback cache for the "no config" call path
+// (e.g. status / sessions.list row resolvers that cannot forward config/workspaceDir).
+// 60s TTL matches the staleness window of the existing fingerprint cache.
+let fallbackCachedPolicies: Map<string, PluginManifestModelIdNormalizationProvider> | null = null;
+let fallbackCachedAtMs = 0;
+const FALLBACK_CACHE_TTL_MS = 60_000;

 function loadManifestModelIdNormalizationPolicies(
   params: ManifestModelIdNormalizationLookupParams = {},
 ): Map<string, PluginManifestModelIdNormalizationProvider> {
   if (params.plugins) {
     return collectManifestModelIdNormalizationPolicies(params.plugins);
   }
+  if (!params.config) {
+    const now = Date.now();
+    if (fallbackCachedPolicies && now - fallbackCachedAtMs < FALLBACK_CACHE_TTL_MS) {
+      return fallbackCachedPolicies;
+    }
+  }
   const { snapshot, cacheable } = resolveMetadataSnapshotForPolicies(params);
   const configFingerprint = snapshot.configFingerprint;
   if (cacheable && configFingerprint && cachedPolicies?.configFingerprint === configFingerprint) {
     return cachedPolicies.policies;
   }
   const policies = collectManifestModelIdNormalizationPolicies(snapshot.plugins);
   if (cacheable && configFingerprint) {
     cachedPolicies = { configFingerprint, policies };
   }
+  if (!params.config) {
+    fallbackCachedPolicies = policies;
+    fallbackCachedAtMs = Date.now();
+  }
   return policies;
 }

To validate without rebuilding, the same change can be applied directly to the bundled dist/manifest-model-id-normalization-<hash>.js (the file that defines loadManifestModelIdNormalizationPolicies). Then:

openclaw gateway restart
time openclaw status --json > /dev/null     # ~3.8 s (was ~11 s)
time openclaw status --json > /dev/null     # ~3.8 s (cache hits, no further degradation)

Gateway-side fan-out (UI cold start, 9 RPCs in flight): the four heavy RPCs (status / models.list / sessions.list / topics.list) drop from ~9.6 s to ~2.4–2.9 s, with the rest of the batch unblocked.

Option B (cleaner): thread cfg and workspaceDir through resolveSessionModelRef → parseModelRef → normalizeProviderModelId → normalizeStaticProviderModelId → normalizeProviderModelIdWithManifest so getCurrentPluginMetadataSnapshot receives the same workspaceDir the gateway used at startup. This makes the existing cache logic work as intended and matches how loadManifestModelCatalog already routes those parameters.

Option C (orthogonal): buildSessionRows is invoked twice over the same store paths inside getStatusSummary (once via byAgent, once via allSessions). Even after the cache fix, every row is computed twice. Restructuring to compute rows once and slice them into both views would halve the remaining resolve* work.

Additional information


Reported by the CoClaw team. This issue was discovered while developing @coclaw/openclaw-coclaw, a CoClaw channel plugin for OpenClaw.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After the gateway has finished startup-time plugin bootstrap, status / sessions.list / models.list should return well under a second, and subsequent calls should reuse the already-built plugin manifest snapshot.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: status / sessions.list / models.list block event loop ~9s per call due to plugin manifest cache miss on every session row