openclaw - ✅(Solved) Fix [Bug]: openclaw doctor hangs at 100% CPU after Plugins step with large agents.list containing per-agent model overrides [2 pull requests, 1 participants]

Q: Expected behavior

`openclaw doctor` completes all steps and exits within a reasonable time regardless of the number of configured agents.

openclaw2026-04-13 20:21:16

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#66159•Fetched 2026-04-14 05:39:00

View on GitHub

Comments

Participants

Timeline

Reactions

Author

lmagitem

Participants

lmagitem

Timeline (top)

referenced ×6cross-referenced ×3labeled ×2closed ×1

resolveExternalCatalogPreferOver performs uncached synchronous disk reads (3 files per call) inside an O(N^2) loop over plugin auto-enable candidates, causing openclaw doctor to hang indefinitely at 100% CPU when agents.list contains ~50+ agents with per-agent model overrides.

Root Cause

Root cause traced to plugin-auto-enable-rMc8VJBA.js:

materializePluginAutoEnableCandidatesInternal iterates all candidates (line ~587)
For each candidate, shouldSkipPreferredPluginAutoEnable iterates all other candidates (line ~235)
For each pair, resolvePreferredOverIds calls resolveExternalCatalogPreferOver (line ~232)
resolveExternalCatalogPreferOver performs 3 synchronous fs.readFileSync + fs.existsSync calls with no caching (lines 207-216)
Total file reads: O(N^2 * 3) where N = number of auto-enable candidates derived from model refs

Fix Action

Fix / Workaround

Confirmed workaround: adding a Map-based memo cache on channelId to resolveExternalCatalogPreferOver resolves the hang completely. Doctor completes in seconds with the full 130-agent config.


Suggested fix: memoise `resolveExternalCatalogPreferOver` by `channelId`. The external catalog files do not change during a single process invocation, so caching is safe. Patch applied locally and confirmed working:

PR fix notes

PR #66244: fix: cache external plugin catalog lookups in auto-enable

Repository: openclaw/openclaw
Author: yfge
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/66244

Description (problem / solution / changelog)

Summary

Cache preferOver metadata lookups during a single plugin auto-enable pass.
Prevent repeated external channel catalog loads when many configured candidates are compared.

Changes

Added a per-run memoized resolver for resolvePreferredOverIds in src/config/plugin-auto-enable.ts.
Added a focused regression test that verifies repeated external catalog lookups are collapsed to a single lookup per plugin id.

Testing

pnpm exec vitest run --config vitest.unit.config.ts src/config/plugin-auto-enable.test.ts

Fixes openclaw/openclaw#66159

Changed files

src/commands/gateway-status.ts (modified, +4/-2)
src/commands/gateway-status/helpers.ts (modified, +26/-14)
src/commands/status.command.ts (modified, +38/-2)
src/commands/status.scan.ts (modified, +11/-0)
src/config/plugin-auto-enable.test.ts (modified, +75/-1)
src/config/plugin-auto-enable.ts (modified, +20/-3)
src/cron/delivery.test.ts (modified, +16/-0)
src/cron/delivery.ts (modified, +18/-2)
src/gateway/probe-auth.ts (modified, +28/-1)
src/gateway/probe.ts (modified, +43/-29)

PR #66246: fix: cache external plugin catalog lookups in auto-enable

Repository: openclaw/openclaw
Author: yfge
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/66246

Description (problem / solution / changelog)

Summary

Cache external preferOver metadata lookups across a single plugin auto-enable/materialization pass.
Avoid repeated synchronous catalog file reads when many channel candidates are compared.

Changes

Threaded a shared preferOver cache through plugin auto-enable materialization.
Reused cached preferOver results in the prefer-over skip logic instead of re-reading external catalogs for each comparison.
Added a focused regression test covering repeated external catalog candidates in one pass.

Testing

pnpm exec vitest run src/config/plugin-auto-enable.channels.test.ts

Fixes openclaw/openclaw#66159

Changed files

CHANGELOG.md (modified, +1/-0)
src/config/plugin-auto-enable.channels.test.ts (modified, +78/-1)
src/config/plugin-auto-enable.prefer-over.ts (modified, +17/-3)
src/config/plugin-auto-enable.shared.ts (modified, +3/-0)

Code Example

strace output showing the tight read loop:

access("/home/user/.openclaw/mpm/plugins.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/mpm/plugins.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "[]\n", 8192)                  = 3
read(24, "", 8192)                      = 0
close(24)                               = 0
access("/home/user/.openclaw/mpm/catalog.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/mpm/catalog.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "{}\n", 8192)                  = 3
read(24, "", 8192)                      = 0
close(24)                               = 0
access("/home/user/.openclaw/plugins/catalog.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/plugins/catalog.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "{}\n", 8192)                  = 3
read(24, "", 8192)                      = 0
close(24)                               = 0
[repeats indefinitely]


Root cause traced to `plugin-auto-enable-rMc8VJBA.js`:
- `materializePluginAutoEnableCandidatesInternal` iterates all candidates (line ~587)
- For each candidate, `shouldSkipPreferredPluginAutoEnable` iterates all *other* candidates (line ~235)
- For each pair, `resolvePreferredOverIds` calls `resolveExternalCatalogPreferOver` (line ~232)
- `resolveExternalCatalogPreferOver` performs 3 synchronous `fs.readFileSync` + `fs.existsSync` calls with no caching (lines 207-216)
- Total file reads: O(N^2 * 3) where N = number of auto-enable candidates derived from model refs

Confirmed workaround: adding a `Map`-based memo cache on `channelId` to `resolveExternalCatalogPreferOver` resolves the hang completely. Doctor completes in seconds with the full 130-agent config.

---

const _externalCatalogPreferOverCache = new Map();
function resolveExternalCatalogPreferOver(channelId, env) {
    if (_externalCatalogPreferOverCache.has(channelId)) return _externalCatalogPreferOverCache.get(channelId);
    for (const rawPath of resolveExternalCatalogPaths(env)) {
        const resolved = resolveUserPath(rawPath, env);
        if (!fs.existsSync(resolved)) continue;
        try {
            const channel = parseExternalCatalogChannelEntries(JSON.parse(fs.readFileSync(resolved, "utf-8"))).find((entry) => entry.id === channelId);
            if (channel) { _externalCatalogPreferOverCache.set(channelId, channel.preferOver); return channel.preferOver; }
        } catch {}
    }
    const _result = []; _externalCatalogPreferOverCache.set(channelId, _result); return _result;
}

RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

Summary

Steps to reproduce

Configure openclaw.json with:
- Multiple auth profiles (synthetic, openrouter, together, zai, minimax)
- Corresponding models.providers entries
- agents.list with ~130 agents, each declaring model: { primary: "provider/model", fallbacks: [...] }
- plugins.entries enabling telegram, discord, and provider plugins
- channels with telegram and discord enabled
Run openclaw doctor
Observe: process hangs after displaying the "Plugins" box, consuming 100% CPU indefinitely.

Minimal repro: any config with ~50+ agents.list entries each declaring model overrides using 3+ different provider prefixes (e.g. synthetic/, openrouter/, together/) should trigger the issue. Removing all agents.list[].model fields resolves the hang.

Expected behavior

openclaw doctor completes all steps and exits within a reasonable time regardless of the number of configured agents.

Actual behavior

Process hangs after the Plugins step. strace shows a tight loop of synchronous reads of ~/.openclaw/mpm/plugins.json, ~/.openclaw/mpm/catalog.json, and ~/.openclaw/plugins/catalog.json repeating indefinitely. Process must be killed manually. On a 16 GB machine, the process reached 682 MB RSS before being killed, suggesting a possible memory leak in the loop as well.

OpenClaw version

2026.4.11

Operating system

Linux Mint 22.3 (x86_64), Node v22.22.0

Install method

npm global

Model

N/A (bug is in config resolution, not model calls)

Provider / routing chain

N/A (hang occurs before any provider calls)

Additional provider/model setup details

Config uses 5 custom provider prefixes across agents:

synthetic (OpenAI-compat, api.synthetic.new)
openrouter
together
zai (OpenAI-compat, api.z.ai)
minimax (Anthropic-compat, api.minimax.io)

Each of ~130 agents declares model.primary + 1-2 fallbacks using these providers. agents.defaults.model also declares a primary + fallbacks. plugins.entries explicitly enables telegram, discord, minimax, synthetic, openrouter, together, zai.

Logs, screenshots, and evidence

strace output showing the tight read loop:

access("/home/user/.openclaw/mpm/plugins.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/mpm/plugins.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "[]\n", 8192)                  = 3
read(24, "", 8192)                      = 0
close(24)                               = 0
access("/home/user/.openclaw/mpm/catalog.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/mpm/catalog.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "{}\n", 8192)                  = 3
read(24, "", 8192)                      = 0
close(24)                               = 0
access("/home/user/.openclaw/plugins/catalog.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/plugins/catalog.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "{}\n", 8192)                  = 3
read(24, "", 8192)                      = 0
close(24)                               = 0
[repeats indefinitely]


Root cause traced to `plugin-auto-enable-rMc8VJBA.js`:
- `materializePluginAutoEnableCandidatesInternal` iterates all candidates (line ~587)
- For each candidate, `shouldSkipPreferredPluginAutoEnable` iterates all *other* candidates (line ~235)
- For each pair, `resolvePreferredOverIds` calls `resolveExternalCatalogPreferOver` (line ~232)
- `resolveExternalCatalogPreferOver` performs 3 synchronous `fs.readFileSync` + `fs.existsSync` calls with no caching (lines 207-216)
- Total file reads: O(N^2 * 3) where N = number of auto-enable candidates derived from model refs

Confirmed workaround: adding a `Map`-based memo cache on `channelId` to `resolveExternalCatalogPreferOver` resolves the hang completely. Doctor completes in seconds with the full 130-agent config.

Impact and severity

Affected: Any user with a large multi-agent config using per-agent model overrides across multiple providers.
Severity: Blocks workflow. openclaw doctor and openclaw gateway restart both hang, making the system unusable until model overrides are removed.
Frequency: 100% reproducible with ~50+ agents declaring model overrides.
Consequence: Unable to start or restart the gateway, run doctor, or use the system at all without stripping per-agent model config.

Additional information

Suggested fix: memoise resolveExternalCatalogPreferOver by channelId. The external catalog files do not change during a single process invocation, so caching is safe. Patch applied locally and confirmed working:

const _externalCatalogPreferOverCache = new Map();
function resolveExternalCatalogPreferOver(channelId, env) {
    if (_externalCatalogPreferOverCache.has(channelId)) return _externalCatalogPreferOverCache.get(channelId);
    for (const rawPath of resolveExternalCatalogPaths(env)) {
        const resolved = resolveUserPath(rawPath, env);
        if (!fs.existsSync(resolved)) continue;
        try {
            const channel = parseExternalCatalogChannelEntries(JSON.parse(fs.readFileSync(resolved, "utf-8"))).find((entry) => entry.id === channelId);
            if (channel) { _externalCatalogPreferOverCache.set(channelId, channel.preferOver); return channel.preferOver; }
        } catch {}
    }
    const _result = []; _externalCatalogPreferOverCache.set(channelId, _result); return _result;
}

An additional improvement would be to also cache resolveExternalCatalogPaths and the parsed file contents, since those are invariant within a process run and currently re-read for every unique channelId.

Bisection results confirming the trigger:

plugins.enabled = false -> doctor completes (auto-enable skipped entirely)
agents.list = [] + del(agents.defaults.model) -> doctor completes
agents.defaults.model alone (no agents.list models) -> doctor completes
agents.defaults.models catalog alone (2 entries) -> doctor completes
Full agents.list with per-agent models -> hangs

extent analysis

TL;DR

The most likely fix is to memoize the resolveExternalCatalogPreferOver function by channelId to prevent excessive synchronous disk reads.

Guidance

Implement memoization: Cache the results of resolveExternalCatalogPreferOver by channelId to avoid redundant computations and disk reads.
Verify the fix: Run openclaw doctor with the full 130-agent config after applying the memoization patch to ensure it completes within a reasonable time.
Optimize further: Consider caching resolveExternalCatalogPaths and parsed file contents as well, since they are invariant within a process run.
Test edge cases: Validate the fix with different configurations, including various numbers of agents and model overrides, to ensure the solution is robust.

Example

The provided patch demonstrates how to memoize resolveExternalCatalogPreferOver using a Map:

const _externalCatalogPreferOverCache = new Map();
function resolveExternalCatalogPreferOver(channelId, env) {
    if (_externalCatalogPreferOverCache.has(channelId)) return _externalCatalogPreferOverCache.get(channelId);
    // ...
}

This example shows how to cache the results of the function by channelId and return the cached value if it exists.

Notes

The fix assumes that the external catalog files do not change during a single process invocation, making caching safe. However, if the files can change, additional considerations may be necessary.

Recommendation

Apply the workaround by memoizing resolveExternalCatalogPreferOver to prevent the hang and ensure openclaw doctor completes within a reasonable time. This fix is chosen because it directly addresses the root cause of the issue, which is the excessive synchronous disk reads.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

openclaw doctor completes all steps and exits within a reasonable time regardless of the number of configured agents.

#api #database connection #vector store #embedding generation #memory leak

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: openclaw doctor hangs at 100% CPU after Plugins step with large agents.list containing per-agent model overrides [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #66244: fix: cache external plugin catalog lookups in auto-enable

Description (problem / solution / changelog)

Changed files

PR #66246: fix: cache external plugin catalog lookups in auto-enable

Description (problem / solution / changelog)

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING