openclaw - ✅(Solved) Fix [Bug]: openclaw doctor hangs at 100% CPU after Plugins step with large agents.list containing per-agent model overrides [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#66159Fetched 2026-04-14 05:39:00
View on GitHub
Comments
0
Participants
1
Timeline
12
Reactions
0
Author
Participants
Timeline (top)
referenced ×6cross-referenced ×3labeled ×2closed ×1

resolveExternalCatalogPreferOver performs uncached synchronous disk reads (3 files per call) inside an O(N^2) loop over plugin auto-enable candidates, causing openclaw doctor to hang indefinitely at 100% CPU when agents.list contains ~50+ agents with per-agent model overrides.

Root Cause

Root cause traced to plugin-auto-enable-rMc8VJBA.js:

  • materializePluginAutoEnableCandidatesInternal iterates all candidates (line ~587)
  • For each candidate, shouldSkipPreferredPluginAutoEnable iterates all other candidates (line ~235)
  • For each pair, resolvePreferredOverIds calls resolveExternalCatalogPreferOver (line ~232)
  • resolveExternalCatalogPreferOver performs 3 synchronous fs.readFileSync + fs.existsSync calls with no caching (lines 207-216)
  • Total file reads: O(N^2 * 3) where N = number of auto-enable candidates derived from model refs

Fix Action

Fix / Workaround

Confirmed workaround: adding a Map-based memo cache on channelId to resolveExternalCatalogPreferOver resolves the hang completely. Doctor completes in seconds with the full 130-agent config.


Suggested fix: memoise `resolveExternalCatalogPreferOver` by `channelId`. The external catalog files do not change during a single process invocation, so caching is safe. Patch applied locally and confirmed working:

PR fix notes

PR #66244: fix: cache external plugin catalog lookups in auto-enable

Description (problem / solution / changelog)

Summary

  • Cache preferOver metadata lookups during a single plugin auto-enable pass.
  • Prevent repeated external channel catalog loads when many configured candidates are compared.

Changes

  • Added a per-run memoized resolver for resolvePreferredOverIds in src/config/plugin-auto-enable.ts.
  • Added a focused regression test that verifies repeated external catalog lookups are collapsed to a single lookup per plugin id.

Testing

  • pnpm exec vitest run --config vitest.unit.config.ts src/config/plugin-auto-enable.test.ts

Fixes openclaw/openclaw#66159

Changed files

  • src/commands/gateway-status.ts (modified, +4/-2)
  • src/commands/gateway-status/helpers.ts (modified, +26/-14)
  • src/commands/status.command.ts (modified, +38/-2)
  • src/commands/status.scan.ts (modified, +11/-0)
  • src/config/plugin-auto-enable.test.ts (modified, +75/-1)
  • src/config/plugin-auto-enable.ts (modified, +20/-3)
  • src/cron/delivery.test.ts (modified, +16/-0)
  • src/cron/delivery.ts (modified, +18/-2)
  • src/gateway/probe-auth.ts (modified, +28/-1)
  • src/gateway/probe.ts (modified, +43/-29)

PR #66246: fix: cache external plugin catalog lookups in auto-enable

Description (problem / solution / changelog)

Summary

  • Cache external preferOver metadata lookups across a single plugin auto-enable/materialization pass.
  • Avoid repeated synchronous catalog file reads when many channel candidates are compared.

Changes

  • Threaded a shared preferOver cache through plugin auto-enable materialization.
  • Reused cached preferOver results in the prefer-over skip logic instead of re-reading external catalogs for each comparison.
  • Added a focused regression test covering repeated external catalog candidates in one pass.

Testing

  • pnpm exec vitest run src/config/plugin-auto-enable.channels.test.ts

Fixes openclaw/openclaw#66159

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/config/plugin-auto-enable.channels.test.ts (modified, +78/-1)
  • src/config/plugin-auto-enable.prefer-over.ts (modified, +17/-3)
  • src/config/plugin-auto-enable.shared.ts (modified, +3/-0)

Code Example

strace output showing the tight read loop:

access("/home/user/.openclaw/mpm/plugins.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/mpm/plugins.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "[]\n", 8192)                  = 3
read(24, "", 8192)                      = 0
close(24)                               = 0
access("/home/user/.openclaw/mpm/catalog.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/mpm/catalog.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "{}\n", 8192)                  = 3
read(24, "", 8192)                      = 0
close(24)                               = 0
access("/home/user/.openclaw/plugins/catalog.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/plugins/catalog.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "{}\n", 8192)                  = 3
read(24, "", 8192)                      = 0
close(24)                               = 0
[repeats indefinitely]


Root cause traced to `plugin-auto-enable-rMc8VJBA.js`:
- `materializePluginAutoEnableCandidatesInternal` iterates all candidates (line ~587)
- For each candidate, `shouldSkipPreferredPluginAutoEnable` iterates all *other* candidates (line ~235)
- For each pair, `resolvePreferredOverIds` calls `resolveExternalCatalogPreferOver` (line ~232)
- `resolveExternalCatalogPreferOver` performs 3 synchronous `fs.readFileSync` + `fs.existsSync` calls with no caching (lines 207-216)
- Total file reads: O(N^2 * 3) where N = number of auto-enable candidates derived from model refs

Confirmed workaround: adding a `Map`-based memo cache on `channelId` to `resolveExternalCatalogPreferOver` resolves the hang completely. Doctor completes in seconds with the full 130-agent config.

---

const _externalCatalogPreferOverCache = new Map();
function resolveExternalCatalogPreferOver(channelId, env) {
    if (_externalCatalogPreferOverCache.has(channelId)) return _externalCatalogPreferOverCache.get(channelId);
    for (const rawPath of resolveExternalCatalogPaths(env)) {
        const resolved = resolveUserPath(rawPath, env);
        if (!fs.existsSync(resolved)) continue;
        try {
            const channel = parseExternalCatalogChannelEntries(JSON.parse(fs.readFileSync(resolved, "utf-8"))).find((entry) => entry.id === channelId);
            if (channel) { _externalCatalogPreferOverCache.set(channelId, channel.preferOver); return channel.preferOver; }
        } catch {}
    }
    const _result = []; _externalCatalogPreferOverCache.set(channelId, _result); return _result;
}
RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

resolveExternalCatalogPreferOver performs uncached synchronous disk reads (3 files per call) inside an O(N^2) loop over plugin auto-enable candidates, causing openclaw doctor to hang indefinitely at 100% CPU when agents.list contains ~50+ agents with per-agent model overrides.

Steps to reproduce

  1. Configure openclaw.json with:

    • Multiple auth profiles (synthetic, openrouter, together, zai, minimax)
    • Corresponding models.providers entries
    • agents.list with ~130 agents, each declaring model: { primary: "provider/model", fallbacks: [...] }
    • plugins.entries enabling telegram, discord, and provider plugins
    • channels with telegram and discord enabled
  2. Run openclaw doctor

  3. Observe: process hangs after displaying the "Plugins" box, consuming 100% CPU indefinitely.

Minimal repro: any config with ~50+ agents.list entries each declaring model overrides using 3+ different provider prefixes (e.g. synthetic/, openrouter/, together/) should trigger the issue. Removing all agents.list[].model fields resolves the hang.

Expected behavior

openclaw doctor completes all steps and exits within a reasonable time regardless of the number of configured agents.

Actual behavior

Process hangs after the Plugins step. strace shows a tight loop of synchronous reads of ~/.openclaw/mpm/plugins.json, ~/.openclaw/mpm/catalog.json, and ~/.openclaw/plugins/catalog.json repeating indefinitely. Process must be killed manually. On a 16 GB machine, the process reached 682 MB RSS before being killed, suggesting a possible memory leak in the loop as well.

OpenClaw version

2026.4.11

Operating system

Linux Mint 22.3 (x86_64), Node v22.22.0

Install method

npm global

Model

N/A (bug is in config resolution, not model calls)

Provider / routing chain

N/A (hang occurs before any provider calls)

Additional provider/model setup details

Config uses 5 custom provider prefixes across agents:

  • synthetic (OpenAI-compat, api.synthetic.new)
  • openrouter
  • together
  • zai (OpenAI-compat, api.z.ai)
  • minimax (Anthropic-compat, api.minimax.io)

Each of ~130 agents declares model.primary + 1-2 fallbacks using these providers. agents.defaults.model also declares a primary + fallbacks. plugins.entries explicitly enables telegram, discord, minimax, synthetic, openrouter, together, zai.

Logs, screenshots, and evidence

strace output showing the tight read loop:

access("/home/user/.openclaw/mpm/plugins.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/mpm/plugins.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "[]\n", 8192)                  = 3
read(24, "", 8192)                      = 0
close(24)                               = 0
access("/home/user/.openclaw/mpm/catalog.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/mpm/catalog.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "{}\n", 8192)                  = 3
read(24, "", 8192)                      = 0
close(24)                               = 0
access("/home/user/.openclaw/plugins/catalog.json", F_OK) = 0
openat(AT_FDCWD, "/home/user/.openclaw/plugins/catalog.json", O_RDONLY|O_CLOEXEC) = 24
read(24, "{}\n", 8192)                  = 3
read(24, "", 8192)                      = 0
close(24)                               = 0
[repeats indefinitely]


Root cause traced to `plugin-auto-enable-rMc8VJBA.js`:
- `materializePluginAutoEnableCandidatesInternal` iterates all candidates (line ~587)
- For each candidate, `shouldSkipPreferredPluginAutoEnable` iterates all *other* candidates (line ~235)
- For each pair, `resolvePreferredOverIds` calls `resolveExternalCatalogPreferOver` (line ~232)
- `resolveExternalCatalogPreferOver` performs 3 synchronous `fs.readFileSync` + `fs.existsSync` calls with no caching (lines 207-216)
- Total file reads: O(N^2 * 3) where N = number of auto-enable candidates derived from model refs

Confirmed workaround: adding a `Map`-based memo cache on `channelId` to `resolveExternalCatalogPreferOver` resolves the hang completely. Doctor completes in seconds with the full 130-agent config.

Impact and severity

  • Affected: Any user with a large multi-agent config using per-agent model overrides across multiple providers.
  • Severity: Blocks workflow. openclaw doctor and openclaw gateway restart both hang, making the system unusable until model overrides are removed.
  • Frequency: 100% reproducible with ~50+ agents declaring model overrides.
  • Consequence: Unable to start or restart the gateway, run doctor, or use the system at all without stripping per-agent model config.

Additional information

Suggested fix: memoise resolveExternalCatalogPreferOver by channelId. The external catalog files do not change during a single process invocation, so caching is safe. Patch applied locally and confirmed working:

const _externalCatalogPreferOverCache = new Map();
function resolveExternalCatalogPreferOver(channelId, env) {
    if (_externalCatalogPreferOverCache.has(channelId)) return _externalCatalogPreferOverCache.get(channelId);
    for (const rawPath of resolveExternalCatalogPaths(env)) {
        const resolved = resolveUserPath(rawPath, env);
        if (!fs.existsSync(resolved)) continue;
        try {
            const channel = parseExternalCatalogChannelEntries(JSON.parse(fs.readFileSync(resolved, "utf-8"))).find((entry) => entry.id === channelId);
            if (channel) { _externalCatalogPreferOverCache.set(channelId, channel.preferOver); return channel.preferOver; }
        } catch {}
    }
    const _result = []; _externalCatalogPreferOverCache.set(channelId, _result); return _result;
}

An additional improvement would be to also cache resolveExternalCatalogPaths and the parsed file contents, since those are invariant within a process run and currently re-read for every unique channelId.

Bisection results confirming the trigger:

  • plugins.enabled = false -> doctor completes (auto-enable skipped entirely)
  • agents.list = [] + del(agents.defaults.model) -> doctor completes
  • agents.defaults.model alone (no agents.list models) -> doctor completes
  • agents.defaults.models catalog alone (2 entries) -> doctor completes
  • Full agents.list with per-agent models -> hangs

extent analysis

TL;DR

The most likely fix is to memoize the resolveExternalCatalogPreferOver function by channelId to prevent excessive synchronous disk reads.

Guidance

  1. Implement memoization: Cache the results of resolveExternalCatalogPreferOver by channelId to avoid redundant computations and disk reads.
  2. Verify the fix: Run openclaw doctor with the full 130-agent config after applying the memoization patch to ensure it completes within a reasonable time.
  3. Optimize further: Consider caching resolveExternalCatalogPaths and parsed file contents as well, since they are invariant within a process run.
  4. Test edge cases: Validate the fix with different configurations, including various numbers of agents and model overrides, to ensure the solution is robust.

Example

The provided patch demonstrates how to memoize resolveExternalCatalogPreferOver using a Map:

const _externalCatalogPreferOverCache = new Map();
function resolveExternalCatalogPreferOver(channelId, env) {
    if (_externalCatalogPreferOverCache.has(channelId)) return _externalCatalogPreferOverCache.get(channelId);
    // ...
}

This example shows how to cache the results of the function by channelId and return the cached value if it exists.

Notes

The fix assumes that the external catalog files do not change during a single process invocation, making caching safe. However, if the files can change, additional considerations may be necessary.

Recommendation

Apply the workaround by memoizing resolveExternalCatalogPreferOver to prevent the hang and ensure openclaw doctor completes within a reasonable time. This fix is chosen because it directly addresses the root cause of the issue, which is the excessive synchronous disk reads.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

openclaw doctor completes all steps and exits within a reasonable time regardless of the number of configured agents.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING