openclaw - ✅(Solved) Fix [Bug]: 4.24版本全局安装后卡死,无法使用 [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72208Fetched 2026-04-27 05:33:14
View on GitHub
Comments
2
Participants
3
Timeline
5
Reactions
0
Author
Timeline (top)
commented ×2labeled ×2cross-referenced ×1

After upgrading to 2026.4.24, openclaw gateway hangs during startup when LiteLLM and OpenRouter pricing fetch requests time out, making the dashboard inaccessible and Ctrl+C unable to exit.

Root Cause

After upgrading to 2026.4.24, openclaw gateway hangs during startup when LiteLLM and OpenRouter pricing fetch requests time out, making the dashboard inaccessible and Ctrl+C unable to exit.

Fix Action

Fixed

PR fix notes

PR #72306: fix(gateway): unref pricing refresh timer and abort fetches on stop

Description (problem / solution / changelog)

Summary

  • Problem: After openclaw gateway starts, the 24-hour pricing-refresh setTimeout and any in-flight LiteLLM/OpenRouter fetches keep the Node event loop alive. SIGINT/Ctrl+C cannot exit the process; users must kill from another terminal.
  • Why it matters: Regression vs 2026.4.21. Reported on Windows 11 npm-global install (#72208). Affects every gateway run, not only pricing-fetch failures.
  • What changed: .unref() the 24h refresh timer so it never blocks process exit; thread an AbortController through both pricing fetches and abort it from the stop() callback returned by startGatewayModelPricingRefresh.
  • What did NOT change (scope boundary): No change to fetch URLs, timeout values, queueMicrotask deferral, cache TTL, schema, or HTTP server startup. Dashboard reachability during the initial fetch window is out of scope (already targeted by 966e814c5e).

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #72208
  • Related #67605, #65365
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: scheduleRefresh() creates a setTimeout(..., 24h) whose handle holds a ref on the libuv loop. The pricing fetches in refreshGatewayModelPricingCache only honored AbortSignal.timeout(...); nothing connected them to gateway shutdown. When the user pressed Ctrl+C, the SIGINT handler ran but the loop stayed alive on the timer (and on any unsettled fetch socket).
  • Missing detection / guardrail: No test asserted that timers/handles created by gateway runtime services are unref'd or torn down on stop().
  • Contributing context (if known): The 24h refresh path was added with the tiered-pricing work in #67605. The async-bootstrap deferral in 966e814c5e made the timer reachable on every gateway start, exposing the unref gap.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/gateway/model-pricing-cache.test.ts
  • Scenario the test should lock in:
    1. The 24h refresh timer is created with hasRef() === false.
    2. Calling the stop() from startGatewayModelPricingRefresh aborts every in-flight fetch's AbortSignal.
  • Why this is the smallest reliable guardrail: Both failure modes are observable at the cache module's public surface without spinning a real gateway, so a unit test pins them deterministically and runs in milliseconds.
  • Existing test that already covers this (if any): None — defers bootstrap refresh work until after the starter returns covers the prior queueMicrotask fix but not timer ref state or abort propagation.
  • If no new test is added, why not: N/A — two tests added.

User-visible / Behavior Changes

  • Ctrl+C now exits openclaw gateway cleanly even when the most recent pricing refresh has scheduled its next 24h tick or is still waiting on upstream. No config or default changes.

Diagram (if applicable)

Before:
SIGINT -> stop() -> clearRefreshTimer()
       -> in-flight fetch still pending (AbortSignal.timeout only)
       -> 24h setTimeout still ref'd
       -> event loop stays alive -> process hangs

After:
SIGINT -> stop() -> clearRefreshTimer() + abortInFlightRefresh()
       -> AbortSignal.any([timeout, cancel]) -> fetches reject AbortError
       -> any new 24h timer scheduled by the catch path is unref'd
       -> event loop drains -> process exits

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux 6.8 (issue reporter: Windows 11)
  • Runtime/container: Node 22, pnpm
  • Model/provider: any (reporter used openrouter/openrouter/free)
  • Integration/channel (if any): gateway HTTP server only
  • Relevant config (redacted): default agents.defaults.model

Steps

  1. pnpm openclaw gateway with network conditions that delay/timeout openrouter.ai/api/v1/models and the LiteLLM raw-content URL.
  2. Wait for the pricing fetches to complete or time out.
  3. Press Ctrl+C.

Expected

  • Process exits within a second of SIGINT.

Actual (before fix)

  • Process stays alive; only kill -9 recovers it.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

pnpm test src/gateway/model-pricing-cache.test.ts — 13 passed (2 new). New tests:

  • unrefs the 24-hour refresh timer so SIGINT exits cleanly
  • aborts in-flight pricing fetches when stop is called

Human Verification (required)

  • Verified scenarios:
    • Ran pnpm test src/gateway/model-pricing-cache.test.ts — all 13 cases green.
    • Inspected the abort path: AbortSignal.any([timeoutSignal, cancelSignal]) — aborting cancel synchronously flips the combined signal, which both fetch handlers observe.
    • Confirmed the new 24h timer scheduled by scheduleRefresh() is also unref'd, so a refresh that completes after stop() cannot resurrect a ref'd handle.
  • Edge cases checked:
    • Concurrent callers of refreshGatewayModelPricingCache share the in-flight promise; the abort controller belongs to the original caller and is cleared in the finally.
    • __resetGatewayModelPricingCacheForTest aborts and clears, keeping suite isolation intact.
  • What you did not verify:
    • Live Windows-11 npm-global reproduction (issue's host platform).
    • Dashboard reachability during the initial 30s fetch window — that symptom is the queueMicrotask deferral's responsibility (966e814c5e) and is unchanged here.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: After stop() aborts an in-flight refresh, the IIFE's catch handlers still reach scheduleRefresh(), scheduling a fresh 24h timer.
    • Mitigation: That timer is also unref'd via unrefRefreshTimer(), so it cannot keep the process alive. Tradeoff is a single redundant timer entry until GC; intentionally left in to keep the diff minimal.
  • Risk: Test wraps globalThis.setTimeout to capture the 24h handle; a parallel test that also wraps it could collide.
    • Mitigation: Wrap is restored in a finally, and the suite is isolated under the gateway vitest project.

Changed files

  • src/gateway/model-pricing-cache.test.ts (modified, +77/-0)
  • src/gateway/model-pricing-cache.ts (modified, +37/-5)
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

After upgrading to 2026.4.24, openclaw gateway hangs during startup when LiteLLM and OpenRouter pricing fetch requests time out, making the dashboard inaccessible and Ctrl+C unable to exit.

Steps to reproduce

  1. npm install -g [email protected]
  2. openclaw gateway
  3. Wait for pricing fetch timeout (~30s each)
  4. Process hangs, dashboard at http://127.0.0.1:18789/ unreachable

Expected behavior

In 2026.4.21, the same pricing fetch failures occur but gateway continues to start normally and the dashboard is accessible.

Actual behavior

Gateway hangs after pricing fetch timeout. No further output. Ctrl+C does not respond. Dashboard URL unreachable. Must kill node processes from another terminal.

OpenClaw version

2026.4.24

Operating system

windows11

Install method

npm global

Model

openrouter/openrouter/free

Provider / routing chain

openrouter

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

The issue can be mitigated by adjusting the timeout settings for LiteLLM and OpenRouter pricing fetch requests to prevent the openclaw gateway from hanging during startup.

Guidance

  • Review the configuration options for LiteLLM and OpenRouter to identify where timeout settings can be adjusted.
  • Consider increasing the timeout values or implementing a retry mechanism to handle temporary failures.
  • Investigate the differences in behavior between versions 2026.4.21 and 2026.4.24 to understand what change may have introduced this regression.
  • Test the gateway startup with a stable internet connection to see if the issue persists, which could indicate a problem unrelated to the pricing fetch requests.

Example

No specific code example can be provided without more details on the configuration and implementation of the pricing fetch requests.

Notes

The solution may require changes to the openclaw gateway's configuration or the implementation of the pricing fetch requests. Without access to the specific code or configuration files, it's challenging to provide a precise fix.

Recommendation

Apply workaround: Adjust the timeout settings for the pricing fetch requests to prevent the gateway from hanging. This approach is chosen because it directly addresses the observed behavior and can be implemented without waiting for a potential fix in a future version.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

In 2026.4.21, the same pricing fetch failures occur but gateway continues to start normally and the dashboard is accessible.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: 4.24版本全局安装后卡死,无法使用 [1 pull requests, 2 comments, 3 participants]