openclaw - ✅(Solved) Fix loadPluginMetadataSnapshot() called thousands of times without caching, blocking event loop for minutes [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#77983Fetched 2026-05-06 06:18:17
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
2
Timeline (top)
cross-referenced ×2commented ×1labeled ×1

loadPluginMetadataSnapshot() in src/plugins/plugin-metadata-snapshot.ts is called from 28+ code paths without effective caching. Each call performs synchronous filesystem I/O (read installs.json, list dirs, stat ~100+ manifest files) blocking the Node.js event loop. During active agent runs, these calls accumulate at ~44/sec, blocking the event loop for 1-7 minutes and making the Control UI completely unresponsive.

Root Cause

Root cause analysis:

Fix Action

Fixed

PR fix notes

PR #77985: fix: add TTL cache to loadPluginMetadataSnapshot() to prevent event l…

Description (problem / solution / changelog)

Summary

  • Problem: loadPluginMetadataSnapshot() is called from 28+ code paths without effective caching. Each call performs synchronous filesystem I/O (read installs.json, list dirs, stat ~100+ manifest files).
  • Why it matters: During active agent runs, these accumulate at ~44/sec, blocking the Node.js event loop for 1-7 minutes. The Control UI becomes completely unresponsive.
  • What changed: Added a 5-second TTL cache directly to loadPluginMetadataSnapshot(). This covers all 28+ callers at once.
  • What did NOT change: No changes to the snapshot computation logic, plugin loading, or any other subsystem. The cache is transparent and expires automatically.

Change Type

  • Bug fix

Scope

  • Gateway / orchestration
  • UI / DX

Linked Issue/PR

  • Closes #77983
  • Related #76804

Real behavior proof

  • Behavior or issue addressed: Control UI freeze (1-7 min) when switching sessions during active agent runs
  • Real environment tested: Ubuntu 22.04.5 LTS, OpenClaw 2026.4.29, npm global install, Node.js v22.22.0
  • Exact steps or command run after this patch:
    1. Patched dist file plugin-metadata-snapshot-ClmzhofB.js with 5s TTL cache
    2. Restarted gateway: sudo systemctl restart openclaw-gateway
    3. Spawned sub-agent with tool calls: openclaw agent --task "read /etc/hostname and report"
    4. While sub-agent running, switched sessions 3 times in Control UI at http://localhost:18789
  • Evidence after fix:
    # Before fix (from timeline JSONL):
    # plugins.metadata.scan span count: 13,260 in 5 minutes (44/sec)
    # Worst WS response: chat.history=419,952ms, node.list=419,953ms, sessions.list=310,301ms
    # strace: 1,204,545 file operations in 10 minutes
    # ~100+ extensions scanned ~2,504 times each
    
    # After fix — WebSocket latency during session switch:
    # (gateway running, sub-agent active with tool calls)
    $ curl -s -o /dev/null -w "%{time_total}s" http://localhost:18789/api/sessions
    0.042s
    $ curl -s -o /dev/null -w "%{time_total}s" http://localhost:18789/api/sessions
    0.038s
    $ curl -s -o /dev/null -w "%{time_total}s" http://localhost:18789/api/sessions
    0.041s
    
    # Verified in gateway log: no sustained blocking
    # grep -c "plugins.metadata.scan" /tmp/tl-after.jsonl
    3  # (vs 13,260 before)
  • Observed result after fix: All 3 session switches completed instantly (<1s). Control UI responsive throughout. Worst WS response dropped from 419,953ms to 3,111ms (100x+ improvement).
  • What was not tested: Windows/macOS, Docker install method, source build (tested on patched dist file only)
  • Before evidence: Timeline JSONL showing 13,260 plugins.metadata.scan spans in 5 min (44/sec). Strace showing 1,204,545 file ops in 10 min. All ~100+ extensions scanned ~2,504 times each. CPU steady at 37-39% (I/O bound).

Root Cause

  • Root cause: Two caching layers exist but are both insufficient. (1) manifestMetadataCache in manifest-metadata-scan.ts requires all filesystem ops to compute the cache key, making the cache check itself expensive. (2) getCurrentPluginMetadataSnapshot() in current-plugin-metadata-snapshot.ts provides a higher-level cache, but 28+ callers bypass it and call loadPluginMetadataSnapshot() directly.
  • Missing detection / guardrail: No TTL cache at the lowest level function. No alarm for excessive scan frequency.
  • Contributing context: The critical caller is in selection/session-resource-loader, triggered on every sub-agent spawn.

Regression Test Plan

  • Coverage level: Unit test
  • Target test: src/plugins/__tests__/plugin-metadata-snapshot.test.ts
  • Scenario: Call loadPluginMetadataSnapshot() 100 times within 5 seconds with same params; assert filesystem I/O happens only once.
  • Why this is the smallest reliable guardrail: It directly tests the cache behavior at the function level.

User-visible / Behavior Changes

  • Session switching in Control UI completes in <1 second instead of 1-7 minutes during active agent runs
  • No config changes required

Diagram

Before:
[session switch] -> [28+ callers -> loadPluginMetadataSnapshot() -> filesystem scan each] -> [event loop blocked 1-7 min]

After:
[session switch] -> [28+ callers -> loadPluginMetadataSnapshot() -> cache HIT (5s TTL)] -> [event loop free, <1s response]

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Ubuntu 22.04.5 LTS
  • Runtime/container: Node.js v22.22.0, npm global
  • Model/provider: z-ai/glm-5.1 (issue is model-independent)
  • Integration/channel: webchat (Control UI)

Steps

  1. Install OpenClaw 2026.4.29 with plugins
  2. Spawn a sub-agent that uses tools (read/exec)
  3. While the sub-agent is running, switch sessions in the Control UI
  4. Observe the UI freezes for 1-7 minutes

Expected

Session switching completes in <1 second.

Actual

UI frozen for 1-7 minutes. WS responses take 419,953ms.

Evidence

  • Trace/log snippets
  • Perf numbers

Human Verification

  • Verified scenarios: Applied fix to dist file, tested with 3 session switches during active sub-agent run
  • Edge cases checked: Cache expiry after 5s TTL, rapid successive calls
  • What I did not verify: Source build (fix tested on dist file only)

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: Stale cache could return outdated plugin metadata if plugins are installed/uninstalled during the 5s window
  • Mitigation: 5s TTL is short enough that this is extremely unlikely in practice. Plugin install/uninstall is a rare operation compared to session switching.

Changed files

  • src/plugins/plugin-metadata-snapshot.ts (modified, +16/-1)

PR #78013: Cache plugin metadata snapshots briefly

Description (problem / solution / changelog)

Summary

  • add a bounded short-TTL cache at loadPluginMetadataSnapshot()
  • key cache entries by workspace/scope before synchronous manifest traversal
  • clear transient snapshot cache through the existing current metadata lifecycle boundary
  • add TTL, scope, and lifecycle regressions

Fixes #77983.

Verification

  • Local workflow batch verification passed before PR branch extraction.
  • Targeted regression lives in src/plugins/plugin-metadata-snapshot.cache.test.ts.

Changed files

  • src/plugins/current-plugin-metadata-state.ts (modified, +8/-0)
  • src/plugins/plugin-metadata-snapshot.cache.test.ts (added, +119/-0)
  • src/plugins/plugin-metadata-snapshot.ts (modified, +78/-1)

Code Example

Timeline JSONL showing 13,260 plugins.metadata.scan spans in 5 min available on request. Strace log showing 1.2M file ops available on request.
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

loadPluginMetadataSnapshot() in src/plugins/plugin-metadata-snapshot.ts is called from 28+ code paths without effective caching. Each call performs synchronous filesystem I/O (read installs.json, list dirs, stat ~100+ manifest files) blocking the Node.js event loop. During active agent runs, these calls accumulate at ~44/sec, blocking the event loop for 1-7 minutes and making the Control UI completely unresponsive.

Steps to reproduce

  1. Install OpenClaw 2026.4.29 with plugins (e.g. @openclaw/line, @openclaw/discord)
  2. Spawn a sub-agent that uses tools (read/exec)
  3. While the sub-agent is running, switch sessions in the Control UI
  4. Observe the UI freezes for 1-7 minutes

Optional instrumentation:

  • Enable timeline: OPENCLAW_DIAGNOSTICS=timeline OPENCLAW_DIAGNOSTICS_TIMELINE_PATH=/tmp/tl.jsonl
  • Monitor WS: journalctl --user -u openclaw-gateway -f | grep [ws]
  • After reproduction, check timeline for plugins.metadata.scan span count

Expected behavior

Session switching in the Control UI should complete in <1 second regardless of active agent runs. Plugin metadata should be cached and not re-scanned from the filesystem on every call.

Actual behavior

WebSocket responses are blocked for minutes during active agent runs:

  • chat.history: 419,952ms (7 min)
  • node.list: 419,953ms (7 min)
  • sessions.list: 310,301ms (5 min)

Timeline data shows plugins.metadata.scan span fires 13,260 times in 5 minutes (44/sec), averaging 56.5ms each. That is 2.5 seconds of synchronous blocking per second of wall time.

Strace data shows 1,204,545 file operations in 10 minutes. All ~100+ extensions scanned ~2,504 times each. CPU stays at ~37-39% (I/O bound, not CPU bound).

OpenClaw version

2026.4.29

Operating system

Ubuntu 22.04.5 LTS

Install method

npm global

Model

z-ai/glm-5.1 (issue is model-independent)

Provider / routing chain

openclaw -> z-ai (direct)

Additional provider/model setup details

N/A — bug is in plugin metadata scanning, not model/provider specific.

Logs, screenshots, and evidence

Timeline JSONL showing 13,260 plugins.metadata.scan spans in 5 min available on request. Strace log showing 1.2M file ops available on request.

Impact and severity

Affected: All self-hosted OpenClaw users with installed plugins who use the Control UI during active agent runs Severity: High — blocks workflow entirely, UI is unusable for minutes Frequency: Always reproducible when switching sessions during active agent runs Consequence: Users cannot monitor or manage agents in real-time; the Control UI appears frozen/dead

Additional information

Root cause analysis:

Two caching layers exist but are both insufficient:

  1. manifestMetadataCache in manifest-metadata-scan.ts — cache key computation requires all filesystem ops, so the cache check itself is expensive
  2. getCurrentPluginMetadataSnapshot() in current-plugin-metadata-snapshot.ts — higher-level cache, but 28+ callers bypass it and call loadPluginMetadataSnapshot() directly

The critical caller is in selection/session-resource-loader — called on every sub-agent spawn.

Proposed fix: Add a 5-second TTL cache directly to loadPluginMetadataSnapshot(). This covers all 28+ callers at once.

Verified fix: Applied to dist file, worst WS response dropped from 420,000ms to 3,111ms (100x+ improvement). Tested by switching sessions 3 times during active agent run with no freeze.

Also related to issue #76804 (Linux environment CPU usage during plugin scan).

extent analysis

TL;DR

Implement a caching mechanism with a suitable TTL in the loadPluginMetadataSnapshot() function to reduce the frequency of synchronous filesystem I/O operations.

Guidance

  • Identify the loadPluginMetadataSnapshot() function in src/plugins/plugin-metadata-snapshot.ts and add a caching layer with a TTL (e.g., 5 seconds) to minimize repeated filesystem operations.
  • Verify the cache implementation by monitoring the plugins.metadata.scan span count in the timeline data and checking for significant reductions in WebSocket response times.
  • Review the manifestMetadataCache in manifest-metadata-scan.ts and getCurrentPluginMetadataSnapshot() in current-plugin-metadata-snapshot.ts to ensure they are not bypassed by the new caching mechanism.
  • Test the fix by switching sessions during active agent runs and monitoring the Control UI's responsiveness.

Example

// Example caching implementation using a simple TTL cache
const cache = {};
const ttl = 5000; // 5 seconds

function loadPluginMetadataSnapshot() {
  const cacheKey = 'plugin-metadata-snapshot';
  if (cache[cacheKey] && cache[cacheKey].expires > Date.now()) {
    return cache[cacheKey].data;
  }
  const data = // perform filesystem I/O operations to load plugin metadata
  cache[cacheKey] = { data, expires: Date.now() + ttl };
  return data;
}

Notes

The proposed fix focuses on adding a caching layer to loadPluginMetadataSnapshot(), which should cover all 28+ callers. However, it's essential to review the entire codebase to ensure that the new caching mechanism is not bypassed by other functions.

Recommendation

Apply the proposed fix by adding a 5-second TTL cache directly to loadPluginMetadataSnapshot(), as it has been verified to significantly improve performance (100x+ reduction in worst WebSocket response time).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Session switching in the Control UI should complete in <1 second regardless of active agent runs. Plugin metadata should be cached and not re-scanned from the filesystem on every call.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix loadPluginMetadataSnapshot() called thousands of times without caching, blocking event loop for minutes [2 pull requests, 1 comments, 2 participants]