openclaw - ✅(Solved) Fix Idle gateway high CPU/RSS on VPS; CPU profile points to repeated plugin/model registry + bundled runtime mirror work [2 pull requests, 4 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73835Fetched 2026-04-29 06:14:26
View on GitHub
Comments
4
Participants
1
Timeline
10
Reactions
0
Participants
Timeline (top)
commented ×4cross-referenced ×4mentioned ×1subscribed ×1

My OpenClaw gateway becomes almost unusable on a VPS because openclaw-gateway idles at roughly one CPU core and RSS grows to around 1.3–1.6GB. RPC/control operations become slow; for example node.list calls were taking ~7.8s.

A CPU profile points at repeated model/plugin registry work:

prewarmConfiguredPrimaryModel
→ resolveModel / resolveModelWithRegistry
→ provider/plugin discovery
→ loadOpenClawPlugins
→ resolveRuntimePluginRegistry
→ mirrorBundledPluginRuntimeRoot
→ copyFile / readFile / JSON5 manifest parsing

This looks like repeated plugin/provider/model registry rebuilding or bundled runtime mirror work during/after startup.

Root Cause

Summary

My OpenClaw gateway becomes almost unusable on a VPS because openclaw-gateway idles at roughly one CPU core and RSS grows to around 1.3–1.6GB. RPC/control operations become slow; for example node.list calls were taking ~7.8s.

Fix Action

Fixed

PR fix notes

PR #73847: fix(plugins): key web-provider snapshot cache on config-content fingerprint (#73730)

Description (problem / solution / changelog)

Fixes #73730. Refs #73729 and #73835 (sister/corroborating reports in the same lane).

Problem

`resolvePluginWebProviders` previously kept its snapshot cache as:

```ts WeakMap<OpenClawConfig, WeakMap<NodeJS.ProcessEnv, Map<string, Entry>>> ```

— keyed on `OpenClawConfig` object identity at the outer level. As reported in #73730 with full instrumentation, callers like `resolveWebSearchRuntimeConfig` and `resolveWebFetchRuntimeConfig` build a fresh `config` object per dispatch, so the outer `WeakMap.get(cacheOwnerConfig)` always missed even though the inner `cacheKey` string was identical (`load-miss key=0afb40389a fields={ws:".../workspace",scope:"b623e8",plg:"85d4c2",...}` repeated message-after-message). Every dispatch paid the full ~30s `loadOpenClawPlugins` cycle.

Three users reported variants of the same root cause:

  • #73730 poolside-ventures: web-provider snapshot WeakMap miss (this PR)
  • #73729 poolside-ventures: capability-provider full reload per message (sister bug, same root cause shape)
  • #73835 brokemac79: idle gateway high CPU/RSS, CPU profile points to repeated `loadOpenClawPlugins` → `mirrorBundledPluginRuntimeRoot`

Fix

Switch the snapshot cache from an identity-keyed nested `WeakMap` to a flat `Map<string, Entry>` keyed entirely on `buildWebProviderSnapshotCacheKey`. The cache key is extended to include a stable content fingerprint of the resolution-relevant `config.plugins` subset (allowlist, entries enabled state, per-plugin config — exactly what `loadPluginManifestRegistryForPluginRegistry` and `loadInstalledWebProviderManifestRecords` actually consume).

Equal-content fresh config objects now produce the same cache key and hit. Genuinely different configs produce different keys and stay isolated — no false-positive collisions.

The fingerprint computation itself is memoized by config-object identity (`WeakMap<config, hashString>`), so callers that share a reference pay the hash cost only once. Callers that build a fresh config per dispatch (the original failure mode) still pay one `hashJson` per call, but `hashJson` runs in microseconds versus `loadOpenClawPlugins` running in seconds — net wall-clock win is the same ~30s saved per dispatch the issue measured.

What changed

FileChange
`web-provider-resolution-shared.ts`Added `fingerprintWebProviderResolutionConfig` helper + extended `buildWebProviderSnapshotCacheKey` to include the fingerprint
`web-provider-runtime-shared.ts`Changed `WebProviderSnapshotCache` type from `WeakMap<config, WeakMap<env, Map<key, Entry>>>` to `Map<string, Entry>`, simplified the lookup/store sites accordingly. Dropped the no-longer-needed `OpenClawConfig` type import.
`web-provider-runtime-shared.test.ts`Two new regression tests
`CHANGELOG.md`Unreleased Fixes line citing #73730 + #73729 + #73835

Tests

``` pnpm vitest run src/plugins/web-provider-runtime-shared.test.ts → 5 passed (3 existing + 2 new)

pnpm vitest run src/plugins/web-provider-runtime-shared.test.ts \ src/plugins/web-provider-resolution-shared.test.ts \ src/plugins/web-fetch-providers.runtime.test.ts \ src/plugins/web-search-providers.runtime.test.ts → 32 passed (30 existing + 2 new, no regressions across the four files) ```

The two new regression tests:

  1. Fresh-but-equal-content configs hit the cache — exercises the exact #73730 path: build a new `config` object reference per call with identical content; assert `loadOpenClawPlugins` is invoked once across two calls (pre-fix: twice).
  2. Content-different configs miss the cache — invariant guard: `{ plugins: { entries: { brave: { enabled: true } } } }` and `{ plugins: { entries: { brave: { enabled: false } } } }` produce different fingerprints and both calls miss the cache.

Why this shape over the alternatives in my earlier triage comment

In #73730 I proposed two shapes: (1) identity-intern the resolved config, (2) hash config content into the cache key. This PR implements (2) because:

  • Interning would fight `OpenClawConfig` mutation, which several gateway paths perform (config reloads, cron edits)
  • The fingerprint approach has a clean invalidation story: edit any `config.plugins.entries[*]` field → hash differs → next call misses → cache repopulates with the new state
  • TTL eviction (`resolvePluginSnapshotCacheTtlMs`) was already part of the contract, so the `Map` size is already bounded

If maintainer prefers shape (1) instead, happy to rebase. The diff for shape (1) would be smaller but trickier to invalidate.

Why this is one-shot for #73730 only

#73729 (capability-provider) and #73835 (gateway prewarm) share the same root-cause shape but live in different files. Folding all three into one PR would cross 400 LOC and 3+ separable concerns. This PR fixes #73730 with the smallest possible scope; the same content-fingerprint pattern can be applied to the capability-provider cache (`capability-provider-runtime.ts`) and the gateway prewarm path as follow-ups if the maintainer agrees with the direction here.

🦞 lobster-biscuit


Sign-Off: hclsys

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/plugins/web-provider-resolution-shared.test.ts (modified, +51/-0)
  • src/plugins/web-provider-resolution-shared.ts (modified, +63/-0)
  • src/plugins/web-provider-runtime-shared.test.ts (modified, +166/-0)
  • src/plugins/web-provider-runtime-shared.ts (modified, +66/-30)

PR #73853: [AI-assisted] fix(plugins): reduce startup provider registry reloads

Description (problem / solution / changelog)

Fixes #73835. Fixes #73729. Refs #73730. Refs #73847.

Summary

This follows @hclsys's #73847, which covers the #73730 web-provider snapshot cache path. This PR intentionally leaves that web-provider work out and focuses on the remaining repeated plugin-registry load surfaces reported in #73835 and #73729.

  • Keep gateway startup primary-model prewarm on provider-discovery entries only, with the active workspace passed through so startup metadata snapshots can be reused instead of falling through to full plugin runtime loads.
  • Thread the entry-only provider discovery mode through models.json planning and fingerprinting so cache entries remain distinct from full discovery.
  • Scope capability-provider fallback registry loads to the manifest-derived bundled owner plugins, avoiding broad image/video/music snapshot loads during tool setup.

Issue Context

#73835's CPU profile points at startup/model prewarm repeatedly reaching loadOpenClawPlugins and bundled runtime mirror refresh work. #73729 reports the related capability-provider path where image, video, and music provider listing can trigger repeated full registry loads. #73730 is covered by @hclsys's #73847, and this PR is meant to complement that teamwork effort rather than duplicate it.

AI Assistance

AI-assisted with Codex. The implementation was driven from #73835, the reporter-provided Discord guidance, and the linked #73729/#73730 discussion.

Tests

  • node scripts/test-projects.mjs src/gateway/server-startup.test.ts src/gateway/server-startup-post-attach.test.ts src/agents/models-config.providers.implicit.discovery-scope.test.ts src/plugins/provider-discovery.runtime.test.ts src/plugins/capability-provider-runtime.test.ts src/plugins/web-provider-runtime-shared.test.ts src/plugins/web-search-providers.runtime.test.ts src/plugins/web-fetch-providers.runtime.test.ts -> passed 3 Vitest shards, 8 files, 72 tests
  • corepack pnpm test:contracts:plugins -> 56 files, 751 tests passed
  • node .\node_modules\@typescript\native-preview\bin\tsgo.js -p tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test-cpu-prewarm.tsbuildinfo -> passed
  • git diff --check -> passed

Contributor Checklist Notes

Update: addressed Greptile P2 in 66e9f93f.

  • Backend/runtime change; no screenshots applicable.
  • I did not run full pnpm build && pnpm check && pnpm test; instead I ran the focused affected shards, plugin contract lane, and TypeScript test project check above.
  • Attempted local Codex review per CONTRIBUTING.md, but the Windows app execution alias failed with Access is denied for both codex review --base origin/main and codex review --uncommitted.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/models-config.plan.ts (modified, +7/-0)
  • src/agents/models-config.providers.implicit.discovery-scope.test.ts (modified, +16/-0)
  • src/agents/models-config.providers.implicit.ts (modified, +2/-0)
  • src/agents/models-config.ts (modified, +10/-0)
  • src/gateway/server-startup-post-attach.test.ts (modified, +10/-0)
  • src/gateway/server-startup-post-attach.ts (modified, +11/-0)
  • src/gateway/server-startup.test.ts (modified, +9/-0)
  • src/plugins/capability-provider-runtime.test.ts (modified, +77/-0)
  • src/plugins/capability-provider-runtime.ts (modified, +32/-3)
  • src/plugins/provider-discovery.runtime.test.ts (modified, +11/-0)

Code Example

prewarmConfiguredPrimaryModel
→ resolveModel / resolveModelWithRegistry
→ provider/plugin discovery
→ loadOpenClawPlugins
→ resolveRuntimePluginRegistry
→ mirrorBundledPluginRuntimeRoot
→ copyFile / readFile / JSON5 manifest parsing

---

/usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789

---

openai-codex/gpt-5.5

---

google/gemini-2.5-flash
  ollama/qwen3.5:9b
  ollama/gemma4:e4b

---

http://100.96.199.51:11434

---

before refresh: manifestPaths 117, missing 115, nvm 115, usr 0
after refresh:  manifestPaths 117, missing 0,   nvm 0,   usr 115

---

/usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789

---

PATH=/usr/bin:/bin

---

channels.telegram.enabled=false

---

OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS=3600000
OPENCLAW_PLUGIN_MANIFEST_CACHE_MS=3600000

---

const DERIVED_SNAPSHOT_CACHE_MS = 1e3;

---

/usr/lib/node_modules/openclaw/dist/plugin-registry-CZ8QXP5l.js

---

agents.defaults.model.fallbacks=[]

---

plugins.entries.ollama.enabled=false

---

/home/ubuntu/CPU.20260428.203112.1779279.0.001.cpuprofile

---

prewarmConfiguredPrimaryModel
→ resolveModel / resolveModelWithRegistry
→ provider/plugin discovery
→ loadOpenClawPlugins
→ resolveRuntimePluginRegistry
→ mirrorBundledPluginRuntimeRoot
→ copyFile / readFile / JSON5 manifest parsing
RAW_BUFFERClick to expand / collapse

Summary

My OpenClaw gateway becomes almost unusable on a VPS because openclaw-gateway idles at roughly one CPU core and RSS grows to around 1.3–1.6GB. RPC/control operations become slow; for example node.list calls were taking ~7.8s.

A CPU profile points at repeated model/plugin registry work:

prewarmConfiguredPrimaryModel
→ resolveModel / resolveModelWithRegistry
→ provider/plugin discovery
→ loadOpenClawPlugins
→ resolveRuntimePluginRegistry
→ mirrorBundledPluginRuntimeRoot
→ copyFile / readFile / JSON5 manifest parsing

This looks like repeated plugin/provider/model registry rebuilding or bundled runtime mirror work during/after startup.

Environment

  • OpenClaw: 2026.4.26 (be8c246)
  • Host: Ubuntu VPS, systemd user service
  • Runtime: Node v22.22.0
  • Gateway command after cleanup:
    /usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789
  • Gateway bind: loopback, port 18789
  • Primary model:
    openai-codex/gpt-5.5
  • Normal fallbacks:
    google/gemini-2.5-flash
    ollama/qwen3.5:9b
    ollama/gemma4:e4b
  • Ollama endpoint:
    http://100.96.199.51:11434
    via Tailscale
  • Enabled plugins observed during normal operation:
    • anthropic
    • brave
    • google
    • memory-core
    • memory-wiki
    • ollama
    • telegram
    • tokenjuice
    • openclaw-web-search

Symptoms

ps -C openclaw-gateway -o pid,stat,etime,%cpu,%mem,rss,cmd repeatedly shows the gateway using most/all of one CPU core while apparently idle.

Examples:

  • Initially: ~100–115% CPU, RSS around 1.1–1.5GB
  • After multiple restarts/tests: still ~100–107% CPU
  • node.list remained slow, around ~7.8s
  • Memory climbed during tests, e.g. ~751MB → ~1.3GB → ~1.58GB

Things ruled out / tested

Split install / stale service

There was initially stale/split plugin registry state pointing at a removed NVM install. After openclaw plugins registry --refresh, registry state changed from stale/missing NVM paths to fresh /usr/lib/node_modules/openclaw paths:

before refresh: manifestPaths 117, missing 115, nvm 115, usr 0
after refresh:  manifestPaths 117, missing 0,   nvm 0,   usr 115

The live service currently runs from:

/usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789

and the live process env has a minimal PATH:

PATH=/usr/bin:/bin

This did not fix the high CPU.

Telegram

Tested disabling Telegram:

channels.telegram.enabled=false

After restart and waiting, CPU stayed around ~105–107%. So Telegram does not appear to be the main cause.

Mission Control / external dashboard

I also suspected a separate Mission Control dashboard. I stopped mission-control.service temporarily. Gateway socket count dropped, but openclaw-gateway CPU stayed pinned around ~103–104%. Port 5188 standalone Mission Control was not running/listening. So this does not appear to be caused by that dashboard.

Plugin discovery cache env

Tried setting these in systemd so they are visible in /proc/<pid>/environ:

OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS=3600000
OPENCLAW_PLUGIN_MANIFEST_CACHE_MS=3600000

The vars stuck, but after restart CPU still stayed around ~104–111%. So cache TTL alone did not solve it.

I also could not find these env var names referenced in the installed OpenClaw dist, but did notice a hard-coded derived snapshot cache value:

const DERIVED_SNAPSHOT_CACHE_MS = 1e3;

in:

/usr/lib/node_modules/openclaw/dist/plugin-registry-CZ8QXP5l.js

Model fallbacks / Ollama fallback

Temporarily removed all model fallbacks:

agents.defaults.model.fallbacks=[]

After restart and 120s:

  • CPU stayed around ~105%
  • pidstat average around ~104.45%
  • RSS continued rising
  • node.list still around ~7.8s

So model fallbacks are not the main cause.

Ollama plugin

Temporarily disabled only the Ollama plugin:

plugins.entries.ollama.enabled=false

After restart:

  • Gateway still around ~100% CPU
  • pidstat average around ~104.35%

So the Ollama plugin alone does not appear to be the main cause.

CPU profile evidence

A valid CPU profile was captured at:

/home/ubuntu/CPU.20260428.203112.1779279.0.001.cpuprofile

The hot path observed:

prewarmConfiguredPrimaryModel
→ resolveModel / resolveModelWithRegistry
→ provider/plugin discovery
→ loadOpenClawPlugins
→ resolveRuntimePluginRegistry
→ mirrorBundledPluginRuntimeRoot
→ copyFile / readFile / JSON5 manifest parsing

Please see attached zipped .cpuprofile.

Expected behavior

An idle gateway should not continuously burn ~100% of one CPU core or grow RSS toward ~1.5GB.

Plugin/model/provider registry discovery should either complete once, cache effectively, or fail cheaply.

Actual behavior

Gateway repeatedly consumes ~100% CPU while idle, with profile evidence pointing at plugin/model registry and bundled runtime mirror work.

Extra notes

This happened before/independent of 2026.4.26; rolling back previously did not resolve it.

This may be a config-triggered bug rather than a universal regression, but the gateway behavior still seems bug-shaped because an edge config should not cause repeated expensive plugin registry/runtime mirror work.

extent analysis

TL;DR

The OpenClaw gateway's high CPU usage and memory growth are likely caused by repeated plugin/model registry rebuilding or bundled runtime mirror work, and setting an effective cache TTL for plugin discovery and manifest caching may help mitigate the issue.

Guidance

  • Review the CPU profile to understand the specific hot paths and identify potential optimization opportunities.
  • Investigate the OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS and OPENCLAW_PLUGIN_MANIFEST_CACHE_MS environment variables to ensure they are being used effectively, despite not being referenced in the installed OpenClaw dist.
  • Consider increasing the DERIVED_SNAPSHOT_CACHE_MS value to reduce the frequency of plugin registry rebuilding and bundled runtime mirror work.
  • Verify that the gateway's configuration is correct and not triggering unnecessary plugin/model registry discovery or runtime mirror work.

Example

No code snippet is provided as the issue is more related to configuration and environment variables.

Notes

The issue may be specific to the user's configuration, and rolling back to a previous version did not resolve it. The gateway's behavior seems bug-shaped, but an edge config might be causing the repeated expensive plugin registry/runtime mirror work.

Recommendation

Apply a workaround by setting effective cache TTLs for plugin discovery and manifest caching, and monitor the gateway's behavior to see if it improves. This is because the issue seems to be related to repeated plugin/model registry rebuilding or bundled runtime mirror work, and setting cache TTLs may help reduce the frequency of these operations.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

An idle gateway should not continuously burn ~100% of one CPU core or grow RSS toward ~1.5GB.

Plugin/model/provider registry discovery should either complete once, cache effectively, or fail cheaply.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Idle gateway high CPU/RSS on VPS; CPU profile points to repeated plugin/model registry + bundled runtime mirror work [2 pull requests, 4 comments, 1 participants]