openclaw - ✅(Solved) Fix Gateway RSS regression on 2026.4.15 — fresh cold-start baseline 700MB+ on macOS ARM64, steady climb regardless of workload [1 pull requests, 4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70717Fetched 2026-04-24 05:54:22
View on GitHub
Comments
4
Participants
3
Timeline
10
Reactions
0
Timeline (top)
commented ×4mentioned ×2subscribed ×2cross-referenced ×1

After upgrading from 2026.4.1 to 2026.4.15, gateway RSS baseline climbed from ~400MB fresh-start to 700MB+ fresh-start, with no change in configuration, workload, or agent definitions. The leak is steady and observable regardless of subagent activity.

This appears related to #63526 (regression reported on 2026.4.9) and #13758 (long-running gateway memory accumulation), but the specific baseline jump on 2026.4.15 has not been quantified with cold-start data.

Root Cause

After upgrading from 2026.4.1 to 2026.4.15, gateway RSS baseline climbed from ~400MB fresh-start to 700MB+ fresh-start, with no change in configuration, workload, or agent definitions. The leak is steady and observable regardless of subagent activity.

This appears related to #63526 (regression reported on 2026.4.9) and #13758 (long-running gateway memory accumulation), but the specific baseline jump on 2026.4.15 has not been quantified with cold-start data.

Fix Action

Workaround

Currently considering rollback to 2026.4.1. Would appreciate confirmation of whether this regression is planned to be addressed, or whether 4.1 is the stable long-term line for memory-constrained hardware.

PR fix notes

PR #70730: fix: mitigate gateway RSS baseline regression in v2026.4.15 (#70717)

Description (problem / solution / changelog)

Summary

  • Problem: Gateway RSS baseline increased from ~400MB to 700MB+ on cold start due to excessive plugin discovery and eager loading.
  • Why it matters: Causes significant memory pressure on constrained hardware (e.g., Mac Mini M4), reducing stability and limiting concurrent workloads.
  • What changed: Restored plugin discovery order so scanDir is treated as a fallback instead of a primary source, preventing unintended plugin loading.
  • What did NOT change (scope boundary): No changes to plugin runtime logic, execution model, or lazy-loading behavior—only discovery priority.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Gateway / orchestration

Linked Issue/PR

  • Closes #70717
  • This PR fixes a bug or regression

Root Cause

  • Root cause: Commit b2974da33a changed plugin discovery priority to prefer scanDir overrides, causing unintended plugin directories to be scanned first.
  • Missing detection / guardrail: No constraint to prevent excessive plugin discovery from override paths.
  • Contributing context: Plugin loader eagerly initialized discovered plugins, amplifying memory usage when discovery scope expanded.

Technical Breakdown

  • Previous (regressed behavior):

    • Custom directories (scanDir) were scanned first
    • Result: 50+ additional plugins discovered and eagerly loaded
  • Fixed behavior (restored):

    • dist/extensions → extensions → scanDir
    • Default bundled plugins prioritized
    • scanDir used only as fallback
  • Changes applied in:

    • src/plugins/bundled-plugin-metadata.ts
      • Reordered baseDirs to move scanDir to lowest priority
    • src/plugins/bundled-plugin-metadata.test.ts
      • Updated expectations to reflect fallback behavior

Regression Test Plan

  • Coverage level that should have caught this:

    • Seam / integration test
  • Target test or file:

    • src/plugins/bundled-plugin-metadata.test.ts
  • Scenario the test should lock in:

    • Plugin discovery prioritizes bundled directories over override paths
  • Why this is the smallest reliable guardrail:

    • Issue occurs at plugin discovery boundary, not individual plugin logic
  • Existing test that already covers this (if any):

    • Updated metadata tests now validate correct order

User-visible / Behavior Changes

  • Reduced gateway memory usage on startup (~700MB → ~400MB baseline)
  • Fewer plugins loaded by default
  • Improved stability on memory-constrained systems

Diagram

Before:
Startup → scanDir prioritized → 59 plugins loaded → high RSS (~700MB)

After:
Startup → bundled dirs prioritized → ~5 plugins loaded → lower RSS (~400MB)

## Security Impact (required)

- New permissions/capabilities? No  
- Secrets/tokens handling changed? No  
- New/changed network calls? No  
- Command/tool execution surface changed? No  
- Data access scope changed? No  

---

## Repro + Verification

### Environment

- OS: Arch Linux  
- Runtime/container: Node v22.22.2  
- Model/provider: N/A  
- Integration/channel (if any): Gateway  
- Relevant config (redacted): default setup with `scanDir` present  

### Steps

1. Cold start gateway  
2. Observe plugin load count in logs  
3. Measure RSS memory usage  

### Expected

- Limited plugin load (~5 plugins)  
- RSS baseline ~400MB  

### Actual (before fix)

- ~59 plugins loaded  
- RSS baseline 700MB+  

---

## Evidence

- [x] Plugin load count reduced (59 → ~5)  
- [x] RSS baseline reduced (~700MB → ~400MB)  
- [x] Gateway logs confirm expected plugin set  

---

## Human Verification (required)

- Verified scenarios:
  - Cold start memory baseline reduced  
  - Plugin load count matches expected minimal set  

- Edge cases checked:
  - Presence of `scanDir` override paths  
  - Mixed plugin directory configurations  

- What you did **not** verify:
  - All plugin combinations across environments  
  - Long-running memory accumulation (separate issue)  

---

## Review Conversations

- [x] I replied to or resolved every bot review conversation I addressed in this PR.  
- [x] I left unresolved only the conversations that still need reviewer or maintainer judgment.  

---

## Compatibility / Migration

- Backward compatible? Yes  
- Config/env changes? No  
- Migration needed? No  

---

## Risks and Mitigations

- Risk:
  - Custom plugins in `scanDir` may not be discovered as early as before  

- Mitigation:
  - `scanDir` remains supported as fallback  
  - Behavior aligns with expected plugin isolation and avoids unintended eager loading

## Changed files

- `src/plugins/bundled-plugin-metadata.test.ts` (modified, +4/-4)
- `src/plugins/bundled-plugin-metadata.ts` (modified, +1/-1)

Code Example

2026-04-23 03:30:00 | pid=95676 | rss=593MB
2026-04-23 03:45:00 | pid=95676 | rss=594MB
2026-04-23 04:00:00 | pid=95676 | rss=592MB
2026-04-23 05:00:00 | pid=95676 | rss=589MB
2026-04-23 05:45:00 | pid=95676 | rss=586MB
2026-04-23 06:00:00 | pid=95676 | rss=597MB
2026-04-23 06:15:00 | pid=95676 | rss=618MB
2026-04-23 07:00:00 | pid=95676 | rss=610MB
2026-04-23 08:00:00 | pid=19265 | rss=723MB  ← restart, fresh baseline 723MB
2026-04-23 08:45:00 | pid=95676 | rss=651MB
2026-04-23 09:00:00 | pid=95676 | rss=650MB
2026-04-23 10:00:00 | pid=95676 | rss=642MB
2026-04-23 11:00:00 | pid=95676 | rss=646MB
2026-04-23 12:00:00 | pid=95676 | rss=650MB
2026-04-23 12:30:00 | pid=95676 | rss=673MB
2026-04-23 13:00:00 | pid=95676 | rss=714MB
2026-04-23 13:15:00 | pid=95676 | rss=734MB
2026-04-23 13:30:00 | pid=95676 | rss=745MB
2026-04-23 13:45:01 | pid=459   | rss=758MB  ← reboot, fresh baseline 758MB
RAW_BUFFERClick to expand / collapse

title: "Gateway RSS regression on 2026.4.15 — fresh cold-start baseline 700MB+ on macOS ARM64, steady climb regardless of workload" labels: bug, regression, gateway, memory

Summary

After upgrading from 2026.4.1 to 2026.4.15, gateway RSS baseline climbed from ~400MB fresh-start to 700MB+ fresh-start, with no change in configuration, workload, or agent definitions. The leak is steady and observable regardless of subagent activity.

This appears related to #63526 (regression reported on 2026.4.9) and #13758 (long-running gateway memory accumulation), but the specific baseline jump on 2026.4.15 has not been quantified with cold-start data.

Environment

  • OpenClaw: 2026.4.15 (041266a)
  • Host: Mac Mini M4, macOS 26.3
  • Node: v22.22.2 (via homebrew)
  • Install: /opt/homebrew/lib/node_modules/openclaw
  • Service: LaunchAgent (~/Library/LaunchAgents/ai.openclaw.gateway.plist), loopback bind
  • Agents configured: 5 (main, deployer, browser, receipts, nyc311)
  • Loaded plugins: 59 of 98
  • Active Memory plugin: disabled (ships with 2026.4.15 but status=disabled in this config)
  • Cron jobs: 14 OpenClaw + 5 system crontab

Reproducer

  1. Cold reboot macOS
  2. LaunchAgent auto-starts gateway on boot
  3. Measure RSS of gateway process within 2 minutes of login, before any Telegram activity, before any cron fires

Observed

Post-nightly-bounce RSS baseline: 594 MB (measured 3:45am, no workload, PID 95676) Mid-session restart baseline: 723 MB (measured 8:00am after restart, PID 19265)

Gateway memory log excerpt from the past 24 hours (measured every 15 min via launchd, script at ~/scripts/gateway-memory-monitor.sh):

2026-04-23 03:30:00 | pid=95676 | rss=593MB
2026-04-23 03:45:00 | pid=95676 | rss=594MB
2026-04-23 04:00:00 | pid=95676 | rss=592MB
2026-04-23 05:00:00 | pid=95676 | rss=589MB
2026-04-23 05:45:00 | pid=95676 | rss=586MB
2026-04-23 06:00:00 | pid=95676 | rss=597MB
2026-04-23 06:15:00 | pid=95676 | rss=618MB
2026-04-23 07:00:00 | pid=95676 | rss=610MB
2026-04-23 08:00:00 | pid=19265 | rss=723MB  ← restart, fresh baseline 723MB
2026-04-23 08:45:00 | pid=95676 | rss=651MB
2026-04-23 09:00:00 | pid=95676 | rss=650MB
2026-04-23 10:00:00 | pid=95676 | rss=642MB
2026-04-23 11:00:00 | pid=95676 | rss=646MB
2026-04-23 12:00:00 | pid=95676 | rss=650MB
2026-04-23 12:30:00 | pid=95676 | rss=673MB
2026-04-23 13:00:00 | pid=95676 | rss=714MB
2026-04-23 13:15:00 | pid=95676 | rss=734MB
2026-04-23 13:30:00 | pid=95676 | rss=745MB
2026-04-23 13:45:01 | pid=459   | rss=758MB  ← reboot, fresh baseline 758MB

Yesterday's peak: 826 MB (2026-04-22 17:15, PID 56795) under normal daytime workload (digest crons + Telegram session + subagents).

Observed climb rates:

  • Morning cron window (5:45am-11:00am, 8 digest crons): ~11 MB/hr
  • Heavy interactive workload (12:00-13:45, multiple subagent spawns + Whisper): ~62 MB/hr

Nightly bounce (3:30am via pkill -f Chrome && launchctl kickstart -k) brought RSS from 826 MB down to 594 MB — not the ~400 MB that was baseline on 2026.4.1, suggesting the regression is in baseline allocation, not just accumulated leak.

Whisper transcription (not gateway-related, but relevant): macOS OOM killer terminated a whisper subprocess mid-transcription with ~2.4GB resident while gateway was at ~740MB and normal userland services were running. This suggests total system memory pressure from the gateway baseline is limiting headroom for legitimate coexistent workloads on this hardware (16GB Mac Mini M4).

Expected

Baseline on 2026.4.1 for this same configuration was ~400 MB fresh cold-start. Climb rate under the same morning cron load was ~14 MB/hr (per my own monitoring over the 2026.4.1 period).

Data points across versions

VersionPost-bounce baselineRestart baselineClimb rate (cron)Climb rate (heavy)
2026.4.1~400 MB~400 MB~14 MB/hrnot measured
2026.4.15594 MB723 MB~11 MB/hr~62 MB/hr

Hypothesis

Although Active Memory plugin is disabled in this configuration, the baseline jump from 400→594-723 MB suggests something else in 2026.4.15 is allocating significantly more at startup. Given that 59 of 98 plugins are loaded (vs likely fewer on 2026.4.1), plugin initialization overhead is a candidate. Given #68825 shows qmd update chains hanging at 120s on 2026.4.15, there may also be retained promise chains or uncollected async state.

Happy to disable plugins selectively and re-measure if that would be useful for bisection.

Mitigations attempted

  • Nightly bounce (3:30am via pkill -f Chrome && launchctl kickstart -k gui/$(id -u)/ai.openclaw.gateway): works but masks the underlying regression
  • SOUL.md triage from 26K to 16K chars: reduced per-session context load, did not reduce baseline
  • File-first exec pattern (all long commands written to temp files first): unrelated to baseline

Workaround

Currently considering rollback to 2026.4.1. Would appreciate confirmation of whether this regression is planned to be addressed, or whether 4.1 is the stable long-term line for memory-constrained hardware.

Related

  • #63526 (2026.4.9 regression, still open)
  • #13758 (long-running accumulation, still open)
  • #11257 (orphaned Chrome processes, still open)
  • #68825 (Active Memory qmd timeouts on 2026.4.15)

extent analysis

TL;DR

The most likely fix for the gateway RSS regression is to rollback to version 2026.4.1, as the current version 2026.4.15 has a significantly higher memory baseline.

Guidance

  • Verify the memory usage by measuring the RSS of the gateway process within 2 minutes of login, before any workload, to confirm the baseline jump.
  • Disable plugins selectively and re-measure the memory usage to identify potential causes of the regression.
  • Consider reducing the number of loaded plugins (currently 59 of 98) to mitigate the plugin initialization overhead.
  • Monitor the system memory pressure and adjust the workload accordingly to prevent OOM killer termination of coexistent processes.

Example

No specific code snippet is provided, but the issue suggests modifying the plugin loading mechanism or adjusting the system configuration to reduce memory usage.

Notes

The issue lacks information on the specific changes made in version 2026.4.15 that could be causing the regression. Further investigation is needed to identify the root cause.

Recommendation

Apply workaround: Rollback to version 2026.4.1, as it has a known stable memory baseline, until the regression is addressed in a future version.

FAIL-SAFE

If the issue persists, consider reducing the system workload or upgrading the hardware to increase the available memory.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Gateway RSS regression on 2026.4.15 — fresh cold-start baseline 700MB+ on macOS ARM64, steady climb regardless of workload [1 pull requests, 4 comments, 3 participants]