openclaw - ✅(Solved) Fix [Bug]: openclaw infer hangs indefinitely on 2026.4.27 — openclaw-infer child spins at 100% CPU with zero network I/O [1 pull requests, 3 comments, 3 participants]

Q: Expected behavior

`openclaw infer model list` returns a JSON list of available models. `openclaw infer model run --local --model ollama/qwen3.5:397b-cloud --prompt "..."` returns the model response.

openclaw2026-04-30 08:50:57

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#74986•Fetched 2026-05-01 05:39:14

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

referenced ×5commented ×3cross-referenced ×1subscribed ×1

openclaw infer model run hangs indefinitely on OpenClaw 2026.4.27. The grandchild openclaw-infer Node.js process consumes 100% CPU but makes zero network connections and produces no output, eventually getting killed by the process timeout. The CLI never reaches the gateway and no request is logged.

This reproduces on both local Ollama models and remote API models, suggesting a pre-execution initialization regression.

Root Cause

This reproduces on both local Ollama models and remote API models, suggesting a pre-execution initialization regression.

Fix Action

Fixed

Fixed by PR: fix(infer): load model catalog metadata-only for list/inspect/providers (https://github.com/openclaw/openclaw/pull/75022)

PR fix notes

PR #75022: fix(infer): load model catalog metadata-only for list/inspect/providers

Repository: openclaw/openclaw
Author: openperf
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/75022

Description (problem / solution / changelog)

Summary

Problem: openclaw infer model list, openclaw infer model inspect, and openclaw infer model providers hang indefinitely on 2026.4.27, with the child Node process spinning at 100% CPU, no TCP connections, and no output until timeout. Reported in #74986; the user verified --version, --help, and gateway status still work, while the catalog-listing commands wedge before any I/O lands.
Root Cause: All three handlers funnel into loadModelCatalog(...) in src/agents/model-catalog.ts. Even with readOnly: true (added in this PR's first revision, which skips the ensureOpenClawModelsJson mutation path at line 142–145), the function still synchronously enters augmentModelCatalogWithProviderPlugins at line 194 — that path goes through src/plugins/provider-runtime.ts:resolveProviderPluginsForCatalogHooks → resolveProviderPluginsForHooks(...), which loads each provider plugin's runtime module so it can invoke the plugin's augmentModelCatalog hook. With the 2026.4.27 manifest-driven catalog/auth refactors (commits 8a06db084, 13757465b, 20c7a98fb, b7a1bfd2d, 947aae5a9, d014b3634), that path now also fans out into the cached installed-manifest registry (b7a1bfd2d's synchronous fs.statSync per plugin + hashJson(...)) and the bundled-plugin runtime imports per provider, which is where the user's run hot-spins on configs that pin a custom provider (the reporter's models.providers.ollama block). The exact symptom — 100% CPU, zero TCP, zero stdout — matches synchronous CPU work inside provider-plugin runtime resolution rather than any network probe. The same regression class was already fixed for agents list / status in 2026.4.29 by replacing getChannelPlugin(...) with listReadOnlyChannelPluginsForConfig (8fe449c88, d5eae0d95) — the established remediation pattern is "avoid loading any plugin runtime on read-only metadata paths," not "skip one mutation function." The first revision of this PR (commit 6abe657) addressed only the mutation path; this PR's second commit closes the remaining plugin-runtime path so the read-only contract is actually metadata-only.
Fix: Add a new orthogonal option skipProviderPluginAugmentation?: boolean to loadModelCatalog. When true, the function returns the catalog assembled from PI SDK static rows + manifest static rows + cfg.models.providers configured rows (the same data sources that already work today) and skips the augmentModelCatalogWithProviderPlugins(...) call at line 194. Three CLI inspection commands — infer model list, infer model inspect, and infer model providers (buildModelProviders) — pass readOnly: true and skipProviderPluginAugmentation: true, mirroring the 2026.4.29 agents list "no plugin runtime on read-only paths" pattern. The option is opt-in to preserve existing readOnly: true callers (models list --all via appendCatalogSupplementRows, cli-auth-epoch.ts, etc.) which still want dynamic plugin-derived rows.
What changed:
- src/agents/model-catalog.ts: add skipProviderPluginAugmentation?: boolean to loadModelCatalog's param type with a doc comment that names #74986 and explains the contract; gate the augmentModelCatalogWithProviderPlugins(...) call behind the new flag. No change to type signatures of any other export.
- src/cli/capability-cli.ts: buildModelProviders (used by infer model providers), infer model list, and infer model inspect pass readOnly: true, skipProviderPluginAugmentation: true. Comment cites #74986 and the agents list 2026.4.29 fix pattern.
- src/cli/capability-cli.test.ts: the three existing #74986 cases now assert that loadModelCatalog is called with both readOnly: true and skipProviderPluginAugmentation: true (not just readOnly: true).
- src/agents/model-catalog.test.ts: one new it(...) case in the existing describe("loadModelCatalog", ...) block that primes the augmentModelCatalogWithProviderPlugins mock to return a synthetic ollama-live-only row, calls loadModelCatalog({ readOnly: true, skipProviderPluginAugmentation: true }), and asserts the synthetic row is absent and the augmentation mock was never called. Reuses the existing harness; no new fixtures.
What did NOT change (scope boundary):
- CHANGELOG.md — left untouched; release-note wording is the maintainer's call.
- Default behavior of loadModelCatalog: when skipProviderPluginAugmentation is omitted/false, the augmentation step still runs exactly as before, so models list --all (src/commands/models/list.rows.ts:347) and every other current readOnly: true caller keeps the same catalog contents.
- ensureOpenClawModelsJson, buildShouldSuppressBuiltInModel (manifest registry resolver), the manifest planner, and the model-catalog cache: untouched.
- infer model run (local + gateway), infer model auth, image/audio/tts/embedding subcommands: out of scope; they are write/run paths, do not go through the read-only catalog read, and any hang there needs a separate fix.
- No new exports, no plugin-SDK / public-surface contract changes, no any introduced.

Reproduction

On 2026.4.27 (or current main), with a ~/.openclaw/config.yaml similar to the reporter's:

agents:
  defaults:
    llm: { idleTimeoutSeconds: 600 }
    model: { primary: ollama/qwen3.5:397b-cloud }
models:
  providers:
    ollama:
      baseUrl: http://winhost:11434
      apiKey: ollama-local
      api: ollama

openclaw gateway status                                  # works
openclaw infer model list                                # before fix: hangs at 100% CPU until timeout
                                                          # after fix: returns the catalog and exits
openclaw infer model inspect --model openai/gpt-5.4      # same
openclaw infer model providers --json                    # same

The hung process can be confirmed with ps -o pcpu,etimes,wchan,comm -p <pid> (CPU pegged at ~100, no progress) and lsof -p <pid> (only the std{out,err} pipes, zero TCP — i.e., the work is happening before any provider network probe).

Risk / Mitigation

Risk 1 — different output for catalog list: Skipping augmentModelCatalogWithProviderPlugins means infer model list / inspect / providers no longer surface dynamic plugin-discovered models (e.g., live Ollama models from /api/tags). The output is now: PI SDK static rows + manifest-declared rows + cfg.models.providers configured rows.
- Mitigation: For inspection commands this is the right trade-off — the user wants "what does the catalog know about" to return promptly, not "what does the live Ollama daemon currently expose"; the latter is what models scan / models list --all are for, both of which still go through the dynamic path (their loadModelCatalog({ readOnly: true }) call sites do not pass the new flag). The hang the reporter sees is a strictly worse failure mode than slightly-less-fresh output. The new flag is opt-in, so no other call site changes.
Risk 2 — test coverage: Need to lock the new metadata-only contract so a future refactor doesn't silently regress.
- Mitigation: Three CLI tests assert the flag combination per command; one model-catalog unit test verifies that augmentModelCatalogWithProviderPlugins is genuinely not called when the flag is set, even when its mock would have produced a row. All four reuse existing harness/mocks; no new fixtures.
Risk 3 — typing/security review: No any introduced; only an existing optional parameter is added (skipProviderPluginAugmentation?: boolean) and consulted via a strict === true check. No change to data flow, secrets handling, plugin trust boundary, or external surface.

Update — incremental commit on this PR

The first revision of this PR (6abe657) only added readOnly: true, which skips the ensureOpenClawModelsJson mutation branch. After re-review I confirmed that loadModelCatalog's remaining call into augmentModelCatalogWithProviderPlugins (line 194 of model-catalog.ts) still synchronously loads provider plugin runtime — i.e., the very class of work the reporter's lsof/ps evidence points at. The agents-list 2026.4.29 fix (8fe449c88, d5eae0d95) addressed the same regression class by routing channel queries through listReadOnlyChannelPluginsForConfig instead of getChannelPlugin(...); this commit applies the same "no plugin runtime on read-only paths" pattern to the model catalog by making readOnly truly metadata-only via the new opt-in skipProviderPluginAugmentation flag, gated to only the three infer inspection commands.

Update 2 — read-only catalog cache

A reviewer noted that the read-only path of loadModelCatalog had no cache reuse — every call rebuilt the catalog from scratch because only the non-readOnly slot (modelCatalogPromise) was ever populated. For one-shot CLI invocations (the issue scenario) this is harmless, but long-running hosts that hit the read-only path repeatedly (cli-auth-epoch.ts:171 refresh, appendCatalogSupplementRows for models list --all) would redo the PI SDK import / registry load / manifest suppression resolver / augmentModelCatalogWithProviderPlugins every time. This commit adds a parallel readOnlyModelCatalogPromise that caches the with-augmentation read-only result, mirroring every invariant of the original cache:

useCache: false invalidates both slots up-front.
empty results clear the matching slot so the next call retries (existing comment kept).
catch handlers null out the matching slot so transient dynamic-import / filesystem failures don't poison the cache.
resetModelCatalogCache() and the test reset both clear both slots.

skipProviderPluginAugmentation callers (the #74986 inspection commands) deliberately stay uncached: their result is a strict subset of the with-augmentation result, so caching it would let a later non-skip caller silently receive the smaller set. Their rebuild is cheap because the heavy provider-runtime fan-out is bypassed. Two new unit tests in model-catalog.test.ts lock this contract: ① two consecutive readOnly: true (without skipProviderPluginAugmentation) calls reuse the cache (registry.getAll() runs once, second result === first), and ② a readOnly: true, skipProviderPluginAugmentation: true call followed by a non-skip readOnly: true call rebuilds and includes the augmentation row.

Out of scope (tracked separately)

Reviewer also flagged two further items that this PR intentionally does not address:

loadOpenClawPlugins performance regression in 2026.4.27 — the synchronous fs.statSync per plugin + hashJson(...) introduced by b7a1bfd2d. This is the underlying engine that any non-skip catalog refresh still hits. Worth a focused follow-up with profiler data — the symptom-to-root-cause mapping ("hot-spin" vs "slow") is not directly observable from the reporter's ps/lsof evidence alone.
infer model run hang — separate code path (prepareSimpleCompletionModelForAgent → resolveModelAsync); even with skipPiDiscovery: true it can still enter prepareProviderRuntimeAuth. Should be a dedicated issue.

Change Type (select all)

Bug fix

Scope (select all touched areas)

CLI
Agents/models
Tests

Linked Issue/PR

Fixes #74986

Changed files

src/agents/model-catalog.test.ts (modified, +133/-0)
src/agents/model-catalog.ts (modified, +51/-13)
src/cli/capability-cli.test.ts (modified, +54/-0)
src/cli/capability-cli.ts (modified, +27/-6)

Code Example

openclaw infer model list
   openclaw infer model run --local --model ollama/qwen3.5:397b-cloud --prompt "Reply with: ok"

---

UID        PID  PPID  CMD
mlaih   253345 253344  timeout 15 node .../openclaw.mjs infer model run ...
mlaih   253347 253345  openclaw
mlaih   253354 253347  99  openclaw-infer   <-- 100% CPU, 0 network connections

---

{
  "agents": {
    "defaults": {
      "llm": { "idleTimeoutSeconds": 600 },
      "model": { "primary": "ollama/qwen3.5:397b-cloud" }
    }
  },
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://winhost:11434",
        "apiKey": "ollama-local",
        "api": "ollama"
      }
    }
  }
}

RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

Summary

This reproduces on both local Ollama models and remote API models, suggesting a pre-execution initialization regression.

Environment

OpenClaw: 2026.4.27 (installed via npm global — /home/mlaih/.npm-global/lib/node_modules/openclaw/)
Node.js: v24.14.1
Deployment: WSL2 (Linux 6.6.87.2-microsoft-standard-WSL2 on Windows)
Gateway: running at ws://127.0.0.1:18789 (healthy, responding to probe)
Ollama: http://winhost:11434 (healthy — direct curl to /api/chat works fine)

Steps to reproduce

Ensure gateway is running: openclaw gateway status — confirms healthy

Run any openclaw infer command, e.g.:

openclaw infer model list
openclaw infer model run --local --model ollama/qwen3.5:397b-cloud --prompt "Reply with: ok"

Observe: command hangs indefinitely, eventually killed by internal timeout

Expected behavior

openclaw infer model list returns a JSON list of available models. openclaw infer model run --local --model ollama/qwen3.5:397b-cloud --prompt "..." returns the model response.

Actual behavior

Both commands hang. Process inspection during the hang reveals:

UID        PID  PPID  CMD
mlaih   253345 253344  timeout 15 node .../openclaw.mjs infer model run ...
mlaih   253347 253345  openclaw
mlaih   253354 253347  99  openclaw-infer   <-- 100% CPU, 0 network connections

The grandchild openclaw-infer process:

Has only 2 file descriptors open (stdout socket + stderr socket)
Makes zero TCP connections (verified via /proc/<pid>/net/tcp)
Produces no output to either fd
Spins at 100% CPU until killed

Gateway log shows zero requests from the infer CLI — it never connects.

Commands tested and their results

Command	Result
`openclaw --version`	Works
`openclaw --help`	Works
`openclaw gateway status`	Works
`openclaw infer model list`	Hangs (SIGKILL after ~20s)
`openclaw infer model run --local --model ollama/qwen3.5:397b-cloud --prompt "..."`	Hangs (SIGKILL after ~20s)
`openclaw infer model run --model minimax/MiniMax-M2.7 --prompt "..."`	Hangs (SIGKILL after ~20s)
`openclaw agents list`	Hangs (SIGKILL after ~10s)
`curl -s http://winhost:11434/api/chat -d '{"model":"qwen3.5:397b-cloud",...}'`	Works (~6s response)
OpenClaw `image` tool (via gateway)	Works
Gateway WebSocket probe	Works

Relevant config

{
  "agents": {
    "defaults": {
      "llm": { "idleTimeoutSeconds": 600 },
      "model": { "primary": "ollama/qwen3.5:397b-cloud" }
    }
  },
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://winhost:11434",
        "apiKey": "ollama-local",
        "api": "ollama"
      }
    }
  }
}

Related issues

#72851 ("[Bug]: Ollama provider hangs on infer model run --local across 2026.4.20 / 2026.4.24 / 2026.4.25") — reported as fixed in 2026.4.26 with commit adding "lean provider path" for local Ollama probes. This system is on 2026.4.27 and still exhibits the hang, suggesting either an incomplete fix or a regression introduced by subsequent model catalog/provider index refactors in 2026.4.27.
The 2026.4.27 CHANGELOG shows large-scale model catalog refactors (moving Fireworks, Together AI, Qianfan, Xiaomi, NVIDIA, Cerebras, Mistral, Chutes, Kilo, OpenAI, OpenCode Go to plugin manifest modelCatalog rows) which may have touched the code path for CLI model probes.
The CHANGELOG entry for 2026.4.27 says: "CLI/models: keep default-model and allowlist pickers on explicit models.providers.*.models entries when models.mode is replace instead of loading the full built-in catalog. Fixes #64950." — this may be related.

Notes

Direct curl to Ollama API works reliably — Ollama itself is healthy
The image tool works correctly (it routes through the gateway internal Ollama integration, not the broken openclaw-infer CLI)
This suggests the bug is in the openclaw-infer CLI binary entry path, not in the Ollama provider or gateway integration
The openclaw-infer binary is not a standalone file — it is spawned as a Node.js child process of the openclaw CLI wrapper

TODO

Confirm whether this reproduces on clean 2026.4.27 install
Check if downgrading to 2026.4.26 resolves the issue
Identify which 2026.4.27 change introduced the regression

extent analysis

TL;DR

The most likely fix is to downgrade OpenClaw to version 2026.4.26, as the issue seems to be a regression introduced in version 2026.4.27.

Guidance

Verify the issue on a clean 2026.4.27 install: Confirm whether the problem reproduces on a fresh installation of OpenClaw 2026.4.27 to rule out any environmental or configuration issues.
Downgrade to 2026.4.26: Attempt to resolve the issue by downgrading OpenClaw to version 2026.4.26, as the problem may have been introduced in the 2026.4.27 update.
Investigate the 2026.4.27 changelog: Examine the changes made in the 2026.4.27 release, particularly the model catalog refactors, to identify the potential cause of the regression.
Check for related issues: Review related issues, such as #72851, to see if they provide any insight into the problem or potential solutions.

Example

No code snippet is provided, as the issue seems to be related to a specific version of OpenClaw and its internal workings.

Notes

The issue appears to be specific to the openclaw-infer CLI binary entry path and not related to the Ollama provider or gateway integration. The fact that direct curl requests to the Ollama API work reliably and the image tool functions correctly suggests that the problem lies within the OpenClaw CLI.

Recommendation

Apply the workaround of downgrading to OpenClaw version 2026.4.26, as it is likely to resolve the issue until a permanent fix is available for version 2026.4.27.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

openclaw infer model list returns a JSON list of available models. openclaw infer model run --local --model ollama/qwen3.5:397b-cloud --prompt "..." returns the model response.

#api #network issue #logging issue #authentication issue #prompt issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: openclaw infer hangs indefinitely on 2026.4.27 — openclaw-infer child spins at 100% CPU with zero network I/O [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #75022: fix(infer): load model catalog metadata-only for list/inspect/providers

Description (problem / solution / changelog)

Summary

Reproduction

Risk / Mitigation

Update — incremental commit on this PR

Update 2 — read-only catalog cache

Out of scope (tracked separately)

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Changed files

Code Example

Bug type

Beta release blocker

Summary

Environment

Steps to reproduce

Expected behavior

Actual behavior

Commands tested and their results

Relevant config

Related issues

Notes

TODO

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING