openclaw - 💡(How to fix) Fix openclaw infer model run (without --gateway) hangs indefinitely on 4.26; same prompt completes immediately in 4.25 [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73881Fetched 2026-04-29 06:13:41
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
0
Timeline (top)
closed ×1commented ×1

On 4.26, openclaw infer model run --model <provider/model> --prompt '...' hangs indefinitely after invocation. The only termination is via external timeout — the parent process kills the hung child with SIGKILL. The OS does not kill the process; the kill comes from the caller's exec timeout policy.

Error Message

  • Not session-disk-budget enforcement: session.maintenance.mode: "warn" (no enforcement); session-disk-budget warnings firing during the same window are correlated noise, not causal
  • Not the modelRun:true allowlist precheck bug (filed separately — that bug produces an immediate fatal error, not a hang; both --gateway and bare CLI exhibit the hang on 4.26)

Root Cause

We did not retest with --gateway on 4.26 because the bare CLI was already hanging; the prior --gateway issue on 4.25 was a separate precheck bug (filed separately). Both --gateway and bare invocations exhibit hangs on 4.26 but the underlying failure modes may differ.

Code Example

# Both forms hang on 4.26
openclaw infer model run --model openai-codex/gpt-5.5 --prompt 'Reply with exactly: smoke-ok'
openclaw infer model run --local --model openrouter/xiaomi/mimo-v2-pro --prompt 'Reply: ok'
RAW_BUFFERClick to expand / collapse

Upstream bug report DRAFT — pending Trent blind-pass

Target: github.com/openclaw/openclaw/issues Status: DRAFT v0 — not filed.


Title

openclaw infer model run hangs on 4.26; child is killed only by caller timeout

Environment

  • OpenClaw 2026.4.26 (be8c246)
  • Install: npm-global at /home/openclawops/.npm-global/lib/node_modules/openclaw/openclaw.mjs (despite update status reporting "pnpm")
  • ExecStart: /usr/bin/node /home/openclawops/.npm-global/lib/node_modules/openclaw/openclaw.mjs gateway run --port 43127 --bind loopback
  • Production Telegram lane via gateway: works fine (so model lane via gateway is alive — only CLI bare-invocation path affected)

Summary

On 4.26, openclaw infer model run --model <provider/model> --prompt '...' hangs indefinitely after invocation. The only termination is via external timeout — the parent process kills the hung child with SIGKILL. The OS does not kill the process; the kill comes from the caller's exec timeout policy.

Reproduction (on 4.26, both confirmed)

# Both forms hang on 4.26
openclaw infer model run --model openai-codex/gpt-5.5 --prompt 'Reply with exactly: smoke-ok'
openclaw infer model run --local --model openrouter/xiaomi/mimo-v2-pro --prompt 'Reply: ok'

Behavior:

  • Process starts
  • No model output produced
  • Process does not exit on its own
  • Holds resources for as long as the caller is willing to wait
  • Caller's exec timeout (we used 60s+) eventually fires SIGKILL

Verified NOT the cause

Exhaustive root-cause work via VPS-internal diagnostics:

  • Not kernel OOM: dmesg has zero OOM/killed entries during the hang window
  • Not cgroup memory enforcement:
    • memory.events: oom 0, oom_kill 0, max 0, high 0
    • memory.max=max (uncapped)
    • service memory peak ~2.48GB
  • Not systemd kill: systemctl status openclaw reports service healthy throughout, no kill/failure indication
  • Not session-disk-budget enforcement: session.maintenance.mode: "warn" (no enforcement); session-disk-budget warnings firing during the same window are correlated noise, not causal
  • Not the modelRun:true allowlist precheck bug (filed separately — that bug produces an immediate fatal error, not a hang; both --gateway and bare CLI exhibit the hang on 4.26)

The kill is purely the parent-process exec timeout firing on a hung child. The OpenClaw process is alive but never produces output or exits.

Production impact

LOW — production agent runs go through the Telegram/agent gateway path, which works correctly. CLI smoke gates and any user/script that invokes infer model run directly are broken. We marked our 4.26 cutover post-flight CLI smoke as failed/hung but proceeded anyway since the model lane through the gateway is unaffected.

What we tried (4.26)

  • bare openclaw infer model run --model openai-codex/gpt-5.5 --prompt ... — hung
  • bare openclaw infer model run --local --model openrouter/xiaomi/mimo-v2-pro --prompt ... — hung

We did not retest with --gateway on 4.26 because the bare CLI was already hanging; the prior --gateway issue on 4.25 was a separate precheck bug (filed separately). Both --gateway and bare invocations exhibit hangs on 4.26 but the underlying failure modes may differ.

Possible 4.26 contributors (from CHANGELOG inspection)

  • Line 182: massive deferred init — QMD, embedded-run activity reads, plugin HTTP routes, MCP loopback, channel runtime helpers, etc. all moved to lazy/on-request. Could mean the FIRST infer model run on a fresh boot triggers a heavy lazy-init burst that deadlocks
  • Line 115: session write lock acquired only after cold bootstrap, plugin, and tool setup — startup ordering change
  • Line 67: live-session model switch conflicts now classified as "unknown fallback failures" — could mask the real failure mode

We don't have a confirmed root cause; this report is filed pending upstream instrumentation / strace follow-up to narrow the deadlock site.

Suggested investigation

  1. Strace the hung child to identify what syscall it's blocked on
  2. Add a CLI-mode init timeout that surfaces the lazy-init step that deadlocked (instead of hanging silently)
  3. Verify the deferred-init code path doesn't have a circular dependency in CLI-direct invocation (vs gateway-mediated)

extent analysis

TL;DR

The most likely fix or workaround for the openclaw infer model run hang on version 4.26 is to investigate and address potential deadlocks caused by the massive deferred init and startup ordering changes.

Guidance

  • Investigate the hung child process using strace to identify the syscall it's blocked on, which can help narrow down the deadlock site.
  • Implement a CLI-mode init timeout to surface the lazy-init step that deadlocked, providing more insight into the issue.
  • Verify the deferred-init code path for circular dependencies in CLI-direct invocation, which may differ from the gateway-mediated invocation.
  • Review the changes in the 4.26 version, specifically the massive deferred init and startup ordering changes, to identify potential causes of the deadlock.

Example

No code snippet is provided as the issue requires further investigation and debugging.

Notes

The production impact is low since the issue only affects CLI smoke gates and direct invocations, while the Telegram/agent gateway path works correctly. The root cause is still unknown and requires further investigation using strace and debugging.

Recommendation

Apply a workaround by implementing a CLI-mode init timeout and investigating the potential deadlocks caused by the massive deferred init and startup ordering changes, as this will provide more insight into the issue and help identify the root cause.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix openclaw infer model run (without --gateway) hangs indefinitely on 4.26; same prompt completes immediately in 4.25 [1 comments, 2 participants]