openclaw - โœ…(Solved) Fix Plugin registry LRU cache evictions from cron scope-key proliferation cause 22-35s synchronous event loop blocks [1 pull requests, 1 comments, 2 participants]

Official PRs (โ€ฆ)
ON THIS PAGE

Recommended Tools

ร—6

Utilities matched from this issueโ€™s tags and category โ€” try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful ยท Quick feedback

Loadingโ€ฆ
GitHub stats
openclaw/openclaw#75851โ€ขFetched 2026-05-02 05:29:00
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
2
Timeline (top)
cross-referenced ร—2closed ร—1commented ร—1

On deployments with many enabled cron jobs, the plugin registry LRU cache fills with scope-key variants, causing periodic 22โ€“35s synchronous event loop blocks when a workspace entry is evicted and must cold-reload.

Root Cause

pluginLoaderCacheState in loader-CLyHx60E.js is a 128-slot LRU keyed by:

workspace :: global :: stock :: JSON(plugins + installs + loadPaths) :: scopeKey :: ...

The scopeKey component varies per invocation based on onlyPluginIds. On a deployment with 64 enabled cron jobs, each cron session can request a different plugin subset, generating a distinct (workspace ร— scope) cache key. With enough cron jobs firing in a short window, the 128 slots fill with scope variants and LRU-evict the main workspace's warm entry.

The cold reload involves synchronous jiti TypeScript compilation + V8 JIT for the full workspace plugin set โ€” 22โ€“35s that blocks the event loop entirely, causing P99 spikes visible to all concurrent WebSocket connections.

Fix Action

Workaround

Disable cron jobs that run on auxiliary agents (e.g., haiku-utility) that aren't strictly necessary โ€” reduces LRU churn frequency. Does not eliminate the root cause.

Related: #67000 (session reuse for embedded agents) touches adjacent territory but focuses on session lifecycle rather than the plugin registry cache specifically.

PR fix notes

PR #75922: Fix plugin-only tool and registry latency regressions

Description (problem / solution / changelog)

Summary

  • Skip core coding tool construction when an explicit allowlist only requests plugin tools.
  • Keep the full workspace plugin registry cache separate from scoped plugin registry loads.
  • Add regressions for both latency paths.

Tests

  • OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test src/agents/pi-embedded-runner/run/attempt.tools-allow-regression.test.ts src/agents/pi-embedded-runner/run/attempt.test.ts src/agents/tool-policy.plugin-only-allowlist.test.ts src/agents/pi-tools.create-openclaw-coding-tools.test.ts src/plugins/plugin-lru-cache.test.ts src/plugins/loader.runtime-registry.test.ts src/plugins/loader.test.ts
  • pnpm exec oxfmt --check --threads=1 src/agents/pi-embedded-runner/run/attempt.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts src/agents/pi-embedded-runner/run/attempt.tools-allow-regression.test.ts src/plugins/loader.ts src/plugins/loader.runtime-registry.test.ts
  • git diff --check origin/main...HEAD

Fixes #75882 Fixes #75907 Fixes #75906 Fixes #75887 Fixes #75851

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts (modified, +36/-20)
  • src/agents/pi-embedded-runner/run/attempt.tools-allow-regression.test.ts (added, +59/-0)
  • src/agents/pi-embedded-runner/run/attempt.ts (modified, +53/-4)
  • src/agents/pi-tools.ts (modified, +190/-145)
  • src/plugins/loader.runtime-registry.test.ts (modified, +28/-1)
  • src/plugins/loader.ts (modified, +40/-20)

Code Example

workspace :: global :: stock :: JSON(plugins + installs + loadPaths) :: scopeKey :: ...

---

22:42:14  runtime-plugins:5ms    โ† warm (just loaded at startup)
22:45:02  runtime-plugins:3ms    โ† warm
22:49:38  runtime-plugins:3ms    โ† warm
22:54:58  runtime-plugins:3ms    โ† warm  (session f89b801e)
23:01:03  runtime-plugins:27299ms โ† COLD LOAD after cron burst
23:02:41  runtime-plugins:3ms    โ† warm again
23:06:47  runtime-plugins:35059ms โ† COLD LOAD โ€” same session f89b801e that was warm at 22:54

---

eventLoopDelayP99Ms=16064  eventLoopUtilization=1  active=1
RAW_BUFFERClick to expand / collapse

Summary

On deployments with many enabled cron jobs, the plugin registry LRU cache fills with scope-key variants, causing periodic 22โ€“35s synchronous event loop blocks when a workspace entry is evicted and must cold-reload.

Root cause

pluginLoaderCacheState in loader-CLyHx60E.js is a 128-slot LRU keyed by:

workspace :: global :: stock :: JSON(plugins + installs + loadPaths) :: scopeKey :: ...

The scopeKey component varies per invocation based on onlyPluginIds. On a deployment with 64 enabled cron jobs, each cron session can request a different plugin subset, generating a distinct (workspace ร— scope) cache key. With enough cron jobs firing in a short window, the 128 slots fill with scope variants and LRU-evict the main workspace's warm entry.

The cold reload involves synchronous jiti TypeScript compilation + V8 JIT for the full workspace plugin set โ€” 22โ€“35s that blocks the event loop entirely, causing P99 spikes visible to all concurrent WebSocket connections.

Evidence

Production trace (OpenClaw 2026.4.29, 64 enabled cron jobs, Linux VPS):

22:42:14  runtime-plugins:5ms    โ† warm (just loaded at startup)
22:45:02  runtime-plugins:3ms    โ† warm
22:49:38  runtime-plugins:3ms    โ† warm
22:54:58  runtime-plugins:3ms    โ† warm  (session f89b801e)
23:01:03  runtime-plugins:27299ms โ† COLD LOAD after cron burst
23:02:41  runtime-plugins:3ms    โ† warm again
23:06:47  runtime-plugins:35059ms โ† COLD LOAD โ€” same session f89b801e that was warm at 22:54

The 23:01 and 23:06 cold loads happen after the scheduler fires multiple crons in the same window (SMS poller every 3min, reply-triage every 5min, watchdogs every 30min, etc.). The resulting scope-key churn rotates the main workspace entry out of the LRU.

The P99 event loop delay spike during a cold load:

eventLoopDelayP99Ms=16064  eventLoopUtilization=1  active=1

Followed by a command lane timeout on the in-flight session.

Why it's painful

  • The cold reload is synchronous โ€” it blocks the entire Node.js event loop for 22โ€“35s, affecting all concurrent connections, not just the affected cron session.
  • It recurs unpredictably โ€” any run that happens to follow a cron burst that filled the LRU triggers it.
  • There is no signal in the logs that an LRU eviction occurred; the only symptom is a sudden runtime-plugins spike on a session that was previously warm.

Proposed fixes

Option A (preferred): Pre-warm configured agent workspaces at gateway startup

On startup, iterate all configured agent IDs, resolve their workspace dirs, and eagerly populate the LRU with the full-scope registry for each. This ensures main + any auxiliary agent workspaces are always in cache regardless of cron scope churn. Cost: ~25s at startup (acceptable, already happens for the first cron anyway). Benefit: eliminates all mid-session cold loads for configured agents.

Option B: Make the cold load async / off the main thread

Move jiti compilation and V8 JIT warmup to a worker thread or break it into async microtasks so the event loop remains responsive during the 25s load period. Harder to implement but would eliminate the event loop blocking entirely regardless of cache state.

Option C: Scope-insensitive warming for known agents

When a request comes in for a workspace that is in the LRU under a different scope key, serve the full-scope registry (warm) rather than doing a cold load for the requested scope. This requires checking whether a superset-scoped entry is available before declaring a cache miss.

Workaround

Disable cron jobs that run on auxiliary agents (e.g., haiku-utility) that aren't strictly necessary โ€” reduces LRU churn frequency. Does not eliminate the root cause.

Related: #67000 (session reuse for embedded agents) touches adjacent territory but focuses on session lifecycle rather than the plugin registry cache specifically.

extent analysis

TL;DR

Pre-warm configured agent workspaces at gateway startup to eliminate mid-session cold loads and event loop blocks.

Guidance

  • Implement Option A: Pre-warm configured agent workspaces at gateway startup to populate the LRU cache with full-scope registries for each workspace, ensuring they remain in cache despite cron scope churn.
  • Consider Option B: Make the cold load async / off the main thread as a more complex but effective solution to eliminate event loop blocking.
  • Evaluate Option C: Scope-insensitive warming for known agents as a potential optimization to serve warm registries for workspaces with superset-scoped entries.
  • As a temporary workaround, disable non-essential cron jobs on auxiliary agents to reduce LRU churn frequency.

Example

No code snippet is provided as the issue does not require a specific code example to understand the proposed solutions.

Notes

The proposed fixes assume that the plugin registry LRU cache is the primary cause of the event loop blocks. Implementing these solutions may require additional testing and validation to ensure they effectively address the issue.

Recommendation

Apply Option A: Pre-warm configured agent workspaces at gateway startup as the preferred solution, as it is the most straightforward and effective way to eliminate mid-session cold loads and event loop blocks, with an acceptable startup cost of ~25s.

Vote matrix ยท Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loadingโ€ฆ

Still need to ship something?

ร—6

Another batch ranked right after the header list โ€” different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - โœ…(Solved) Fix Plugin registry LRU cache evictions from cron scope-key proliferation cause 22-35s synchronous event loop blocks [1 pull requests, 1 comments, 2 participants]