openclaw - 💡(How to fix) Fix [Bug]: runtime-deps cache hit-rate ~13% on Pi 400 — staging blocks event-loop on every embedded-run (regression in 2026.4.29) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#76200Fetched 2026-05-03 04:40:56
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
2
Author
Timeline (top)
labeled ×2closed ×1commented ×1unsubscribed ×1

After upgrading from 2026.4.27 to 2026.4.29 (with 2026.4.28 not present in CHANGELOG, possibly retracted), plugin-runtime-deps cache hit-rate dropped to ~13% on a Pi 400 deployment. Every embedded-run spawn re-installs npm-deps for multiple bundled plugins (acpx, openai, runway), blocking the event loop for 12-21 sec each. On Pi-class hardware this cascades into Bot Framework 15s timeouts, multi-second TUI keystroke lag, and ~5 min response times for trivial DMs. After 6 days uptime three unscheduled crashes were needed within ~1 hour.

Root Cause

After upgrading from 2026.4.27 to 2026.4.29 (with 2026.4.28 not present in CHANGELOG, possibly retracted), plugin-runtime-deps cache hit-rate dropped to ~13% on a Pi 400 deployment. Every embedded-run spawn re-installs npm-deps for multiple bundled plugins (acpx, openai, runway), blocking the event loop for 12-21 sec each. On Pi-class hardware this cascades into Bot Framework 15s timeouts, multi-second TUI keystroke lag, and ~5 min response times for trivial DMs. After 6 days uptime three unscheduled crashes were needed within ~1 hour.

Fix Action

Fix / Workaround

Frequency: always — every embedded-run spawn (heartbeat 30 min default, plus per-DM, plus per-dispatch). Observed 23+ staging events in ~12h on a single agent.

Local mitigations attempted:

  • installRuntimeDeps: false — not viable: triggers forceSetupOnlyChannelPlugins which breaks embedded-run spawning
  • systemd-Drop-in MemoryHigh=1.5G MemoryMax=2G CPUQuota=300% Nice=10 RestartSec=60 — limits crash-cascade but does not address structural miss-rate
  • Heartbeat-interval 30 → 60 min (planned) — reduces frequency of staging-storms but does not fix per-spawn cost
  • Hardware migration to Pi 500+ (in transit) — expected to reduce per-miss-time ~3-5× via faster CPU + dedicated NVMe; structural cache issue persists on all hardware classes

Happy to provide raw [plugins] event dump, full eventLoopDelay histogram, or test patches against bundled-runtime-deps-Dj2QXhNg.js cache-validation logic.

Code Example

// ~/.openclaw/openclaw.json (relevant sections, redacted)
{
  "plugins": {
    "entries": {
      "duckduckgo": { "enabled": true },
      "msteams": { "enabled": true },
      "openai": { "enabled": true },
      "memory-core": { "enabled": true },
      "anthropic": { "enabled": true },
      "searxng": { "enabled": true },
      "memory-wiki": { "enabled": true },
      "diffs": { "enabled": true }
    }
  },
  "channels": {
    "msteams": { "blockStreaming": true }
  },
  "agents": { "list": [{ "id": "main", "default": true }] },
  "commitments": { "enabled": false }
}

---

Auto-update version trace (10 days, captured from `/tmp/openclaw/auto-update-<date>.log`):


27.04: 4.244.24    | 1 plugin install (chokidar)
28.04: 4.244.25    | 1 plugin install (chokidar)
29.04: 4.254.26    | gateway version mismatch
30.04: 4.264.27    | 1 plugin install (chokidar)
01.05: 4.274.29    | 49 plugin installs FRESH  ← regression-trigger version
02.05: 4.294.29    | 49 fresh AGAIN (cache miss)


Sample plugin-staging events (2026-05-02, redacted from `/tmp/openclaw/openclaw-2026-05-02.log`):


02:02:08 [plugins] acpx installed bundled runtime deps in 21ms     [HIT]
03:24:22 [plugins] acpx installed bundled runtime deps in 7438ms   [MISS]
03:29:11 [plugins] acpx installed bundled runtime deps in 21ms     [HIT]
11:08:32 [plugins] acpx installed bundled runtime deps in 10663ms  [MISS]
13:47:02 [plugins] acpx installed bundled runtime deps in 10746ms  [MISS]
13:49:07 [plugins] acpx installed bundled runtime deps in 12767ms  [MISS]


Plus event-loop measurements during plugin-staging:


13:13:52 liveness warning: eventLoopDelayMaxMs=97576 eventLoopUtilization=1.0 cpuCoreRatio=0.246
14:01:20 liveness warning: eventLoopDelayMaxMs=32665 eventLoopUtilization=1.0 cpuCoreRatio=1.044
17:00:51 liveness warning: eventLoopDelayMaxMs=47479 eventLoopUtilization=0.999 cpuCoreRatio=1.011


CHANGELOG-references (suspected change-points in 2026.4.29):


- Plugins/runtime-deps: replace stale symlinked mirror target roots before
  writing runtime-mirror temp files and skip rewriting already materialized
  hardlinks. Fixes #75108; refs #75069.
- Plugins/runtime-deps: keep bundled provider policy config loading from
  staging plugin runtime dependencies. Fixes #74971.
- Plugins/runtime-deps: always write a dependency map in generated runtime-deps
  install manifests. Fixes #74949.


Hypothesis (not yet ground-truth-verified): the new mirror-target-validation logic from #75108 fails freshness check across embedded-run spawns, falsely triggering `npm install` instead of cache-hit + symlink.

---

const specsHash = crypto.createHash('sha256')
    .update(JSON.stringify(RUNTIME_SPECS))
    .digest('hex').slice(0, 12);
const sharedDir = `${pluginsRoot}/.shared-runtime/${specsHash}`;
if (!await exists(sharedDir)) {
    await npmInstall(sharedDir, RUNTIME_SPECS);
}
await symlink(sharedDir, `${pluginsRoot}/${pluginName}/runtime`);
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

After upgrading from 2026.4.27 to 2026.4.29 (with 2026.4.28 not present in CHANGELOG, possibly retracted), plugin-runtime-deps cache hit-rate dropped to ~13% on a Pi 400 deployment. Every embedded-run spawn re-installs npm-deps for multiple bundled plugins (acpx, openai, runway), blocking the event loop for 12-21 sec each. On Pi-class hardware this cascades into Bot Framework 15s timeouts, multi-second TUI keystroke lag, and ~5 min response times for trivial DMs. After 6 days uptime three unscheduled crashes were needed within ~1 hour.

Steps to reproduce

  1. Install OpenClaw 2026.4.29 on a Raspberry Pi 400 (or comparable resource-constrained ARM host) with bundled plugins acpx + openai + runway enabled
  2. Have an agent configured with heartbeat interval 30 min and a Microsoft Teams channel attached
  3. Spawn 2-3 back-to-back embedded-runs (heartbeat-driven or via inbound DM)
  4. Tail /tmp/openclaw/openclaw-<date>.log | grep '\[plugins\]'
  5. Observe: each plugin staging event reports installed in 5000-21000ms rather than sub-100ms cache-hit

Cache-miss-rate exceeds 80% within first 3 spawn cycles. Observed deterministically across 23 events in ~12h on 2026-05-02.

Expected behavior

Per pre-2026.4.29 behavior on identical Pi 400 hardware (2026.4.27 from 2026-04-30): plugin-runtime-deps were installed on first spawn and reliably cache-hit (sub-100ms) on subsequent spawns. Staging events appeared rarely (1-2 per day, only when manifest changed) rather than per embedded-run. CHANGELOG 2026.4.29 entry "Plugins/runtime-deps: replace stale symlinked mirror target roots ... so cross-version container upgrades no longer crash-loop on read-only image-layer paths while warm mirrors do less churn" (Fixes #75108; refs #75069) suggests cache should still be honored for warm mirrors after the refactor.

Actual behavior

Cache-validation appears to fail across embedded-run spawns. Plugin-staging triggers fresh npm install on nearly every spawn. Measured 2026-05-02 (Pi 400 deployment):

PluginEventsCache-hits (<100ms)Cache-misses (>5s)Avg miss-time
acpx83 (37%)59.6s
openai50516.7s
runway50512.8s
memory-core30312.0s
browser10131.2s
diffs10118.9s
file-transfer101n/a

Total cache-hit-rate: 3/23 = 13%. All three hits are acpx events (21-24ms). Every other plugin shows 100% miss with stable miss-times across the day — no warming.

Per-plugin trend (first vs last event of the day):

  • acpx: first=8539ms, last=12767ms → STABLE-MISS
  • memory-core: first=14363ms, last=14941ms → STABLE-MISS
  • openai: first=12629ms, last=15240ms → STABLE-MISS
  • runway: first=12731ms, last=12132ms → STABLE-MISS

User-visible cascade: while staging-loop runs (60-90 sec on Pi 400), the gateway event-loop is fully utilized; the msteams-Plugin handler cannot process inbound POSTs within the Bot Framework 15s timeout, causing Microsoft to mark the bot unresponsive.

OpenClaw version

2026.4.29 (installed 2026-05-02 03:11 CEST via auto-update from 2026.4.27; 2026.4.28 not present in CHANGELOG, presumed retracted)

Operating system

Raspberry Pi OS (Debian 12 bookworm, arm64, kernel 6.12.75+rpt-rpi-v8) on Raspberry Pi 400 hardware (BCM2711, A72 @ 1.8 GHz, 4 GB LPDDR4-3200, USB-3 SSD shared bus with USB-Ethernet dongle)

Install method

npm install -g openclaw via Node.js v24.x. Auto-updated nightly via wrapper script ~/.openclaw/workspace/scripts/openclaw-update.sh which calls openclaw update followed by service-restart via systemctl --user restart openclaw-gateway.service.

Model

openai-codex/gpt-5.5 as primary; fallback chain: claude-cli/claude-sonnet-4-6, codex-cli/gpt-5.5, openai-codex/gpt-5.5, claude-cli/claude-haiku-4-5. Heartbeat-triggered embedded-runs invoke spawned acpx/openai/runway plugins regardless of effective model.

Provider / routing chain

Pi → local OpenClaw gateway (port 18789, loopback only) → Tailscale Funnel (https://tva-office-workstation-01.tail3d2c4c.ts.net/api/messages) → Microsoft Bot Service → Microsoft Teams. Gateway-to-CLI: per embedded-run, gateway spawns acpx/runway/openai child via plugin-runtime-deps staging.

Additional provider/model setup details

// ~/.openclaw/openclaw.json (relevant sections, redacted)
{
  "plugins": {
    "entries": {
      "duckduckgo": { "enabled": true },
      "msteams": { "enabled": true },
      "openai": { "enabled": true },
      "memory-core": { "enabled": true },
      "anthropic": { "enabled": true },
      "searxng": { "enabled": true },
      "memory-wiki": { "enabled": true },
      "diffs": { "enabled": true }
    }
  },
  "channels": {
    "msteams": { "blockStreaming": true }
  },
  "agents": { "list": [{ "id": "main", "default": true }] },
  "commitments": { "enabled": false }
}

acpx, runway, browser, file-transfer plugins are bundled-internal (not in user plugins.entries) but still trigger runtime-deps staging on each embedded-run spawn.

Logs, screenshots, and evidence

Auto-update version trace (10 days, captured from `/tmp/openclaw/auto-update-<date>.log`):


27.04: 4.244.24    | 1 plugin install (chokidar)
28.04: 4.244.25    | 1 plugin install (chokidar)
29.04: 4.254.26    | gateway version mismatch
30.04: 4.264.27    | 1 plugin install (chokidar)
01.05: 4.274.29    | 49 plugin installs FRESH  ← regression-trigger version
02.05: 4.294.29    | 49 fresh AGAIN (cache miss)


Sample plugin-staging events (2026-05-02, redacted from `/tmp/openclaw/openclaw-2026-05-02.log`):


02:02:08 [plugins] acpx installed bundled runtime deps in 21ms     [HIT]
03:24:22 [plugins] acpx installed bundled runtime deps in 7438ms   [MISS]
03:29:11 [plugins] acpx installed bundled runtime deps in 21ms     [HIT]
11:08:32 [plugins] acpx installed bundled runtime deps in 10663ms  [MISS]
13:47:02 [plugins] acpx installed bundled runtime deps in 10746ms  [MISS]
13:49:07 [plugins] acpx installed bundled runtime deps in 12767ms  [MISS]


Plus event-loop measurements during plugin-staging:


13:13:52 liveness warning: eventLoopDelayMaxMs=97576 eventLoopUtilization=1.0 cpuCoreRatio=0.246
14:01:20 liveness warning: eventLoopDelayMaxMs=32665 eventLoopUtilization=1.0 cpuCoreRatio=1.044
17:00:51 liveness warning: eventLoopDelayMaxMs=47479 eventLoopUtilization=0.999 cpuCoreRatio=1.011


CHANGELOG-references (suspected change-points in 2026.4.29):


- Plugins/runtime-deps: replace stale symlinked mirror target roots before
  writing runtime-mirror temp files and skip rewriting already materialized
  hardlinks. Fixes #75108; refs #75069.
- Plugins/runtime-deps: keep bundled provider policy config loading from
  staging plugin runtime dependencies. Fixes #74971.
- Plugins/runtime-deps: always write a dependency map in generated runtime-deps
  install manifests. Fixes #74949.


Hypothesis (not yet ground-truth-verified): the new mirror-target-validation logic from #75108 fails freshness check across embedded-run spawns, falsely triggering `npm install` instead of cache-hit + symlink.

Impact and severity

Affected users/systems/channels: all OpenClaw deployments running 2026.4.29 with bundled plugins (acpx/openai/runway/etc.) on resource-constrained ARM hardware (Pi 400, similar single-board-computers). Likely also impacts low-spec x86 deployments to a lesser degree (faster CPU shortens per-miss time but does not fix structural miss-rate).

Severity: blocks workflow. On Pi 400: makes Microsoft Teams channel effectively unusable for interactive DMs (~5 min response time for trivial messages, with empty typing-Activity placeholders accumulating in chat). System destabilization (3 unscheduled hard reboots within ~1 hour required on 2026-05-02 to recover from compounded plugin-staging-storm + memory pressure).

Frequency: always — every embedded-run spawn (heartbeat 30 min default, plus per-DM, plus per-dispatch). Observed 23+ staging events in ~12h on a single agent.

Consequence: missed Microsoft Teams DMs (Bot Service times out at 15s, marks bot unhealthy), TUI keystroke lag making interactive use infeasible, multiple gateway crashes requiring power-cycle (warm reboot left sshd unstartable, only cold power-cycle recovered), and significant drag on user productivity across the day. Customer-facing work blocked on the affected day.

Additional information

Last known good version: 2026.4.27 (installed 2026-04-30, no measurable issue on identical Pi 400 hardware). First known bad version: 2026.4.29 (installed 2026-05-02, regression first observed within hours). 2026.4.28: NOT_ENOUGH_INFO — version not present in CHANGELOG.md, presumed retracted; not tested on this deployment.

Local mitigations attempted:

  • installRuntimeDeps: false — not viable: triggers forceSetupOnlyChannelPlugins which breaks embedded-run spawning
  • systemd-Drop-in MemoryHigh=1.5G MemoryMax=2G CPUQuota=300% Nice=10 RestartSec=60 — limits crash-cascade but does not address structural miss-rate
  • Heartbeat-interval 30 → 60 min (planned) — reduces frequency of staging-storms but does not fix per-spawn cost
  • Hardware migration to Pi 500+ (in transit) — expected to reduce per-miss-time ~3-5× via faster CPU + dedicated NVMe; structural cache issue persists on all hardware classes

Proposed fix (RFC): persistent cache by specs-hash, shared via symlink/hardlink:

const specsHash = crypto.createHash('sha256')
    .update(JSON.stringify(RUNTIME_SPECS))
    .digest('hex').slice(0, 12);
const sharedDir = `${pluginsRoot}/.shared-runtime/${specsHash}`;
if (!await exists(sharedDir)) {
    await npmInstall(sharedDir, RUNTIME_SPECS);
}
await symlink(sharedDir, `${pluginsRoot}/${pluginName}/runtime`);

First install: ~21s (one-time). All subsequent spawns with same spec-set: <100ms (symlink only). LRU-eviction when pool > X GB. Plus per-stage telemetry (cache-hit / cache-miss / install-time-ms) so cache-pathology becomes visible without log-grep.

Three other 2026.4.29 regressions observed in parallel are tracked separately; happy to file each as a distinct upstream issue if useful (stuck-recovery overshoot from Refs #73581 et al., resume-fallback-chain re-evaluation of provider health, empty typing-Activity in Teams under blockStreaming=true). Internal write-up at docs/research/2026-05-02-openclaw-upstream-bugs.md.

Happy to provide raw [plugins] event dump, full eventLoopDelay histogram, or test patches against bundled-runtime-deps-Dj2QXhNg.js cache-validation logic.

extent analysis

TL;DR

The most likely fix is to implement a persistent cache by specs-hash, shared via symlink/hardlink, to address the high cache-miss rate and subsequent performance issues.

Guidance

  • Investigate the Plugins/runtime-deps changes in the 2026.4.29 release, specifically the new mirror-target-validation logic, to determine if it's causing the cache freshness check to fail across embedded-run spawns.
  • Verify that the proposed fix, using a persistent cache by specs-hash, shared via symlink/hardlink, resolves the issue by reducing the cache-miss rate and improving performance.
  • Consider implementing per-stage telemetry to monitor cache-hit/miss rates and install times to better understand cache pathology.
  • Test patches against the bundled-runtime-deps cache-validation logic to ensure the fix is effective.

Example

The proposed fix uses a specs-hash to create a shared directory for runtime dependencies, which can be symlinked to the plugin's runtime directory:

const specsHash = crypto.createHash('sha256')
    .update(JSON.stringify(RUNTIME_SPECS))
    .digest('hex').slice(0, 12);
const sharedDir = `${pluginsRoot}/.shared-runtime/${specsHash}`;
if (!await exists(sharedDir)) {
    await npmInstall(sharedDir, RUNTIME_SPECS);
}
await symlink(sharedDir, `${pluginsRoot}/${pluginName}/runtime`);

This approach allows for a one-time installation of dependencies and subsequent spawns to use the cached version, reducing the cache-miss rate and improving performance.

Notes

The issue is specific to the 2026.4.29 release and affects OpenClaw deployments with bundled plugins on resource-constrained ARM hardware. The proposed fix may need to be adapted for other hardware configurations or OpenClaw versions.

Recommendation

Apply the proposed fix, implementing a persistent cache by specs-hash, shared via symlink/hardlink, to address the high cache-m

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Per pre-2026.4.29 behavior on identical Pi 400 hardware (2026.4.27 from 2026-04-30): plugin-runtime-deps were installed on first spawn and reliably cache-hit (sub-100ms) on subsequent spawns. Staging events appeared rarely (1-2 per day, only when manifest changed) rather than per embedded-run. CHANGELOG 2026.4.29 entry "Plugins/runtime-deps: replace stale symlinked mirror target roots ... so cross-version container upgrades no longer crash-loop on read-only image-layer paths while warm mirrors do less churn" (Fixes #75108; refs #75069) suggests cache should still be honored for warm mirrors after the refactor.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: runtime-deps cache hit-rate ~13% on Pi 400 — staging blocks event-loop on every embedded-run (regression in 2026.4.29) [1 comments, 2 participants]