openclaw - 💡(How to fix) Fix [Bug]: Gateway hard-couples to OpenRouter + LiteLLM pricing fetches at boot; no opt-out [3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73329Fetched 2026-04-29 06:20:55
View on GitHub
Comments
3
Participants
3
Timeline
4
Reactions
0
Timeline (top)
commented ×3closed ×1

Gateway boot is hard-coupled to two external HTTP endpoints (OpenRouter /models and the LiteLLM pricing JSON on raw.githubusercontent.com) for the model-pricing cache. There is no config flag, env var, or plugin disable to opt out. On networks where either endpoint is slow or unreachable, the fetches stack up to ~90 seconds of dead time during the post-ready "starting channels and sidecars..." phase, delaying Kitchen, Telegram, and every other channel from coming online — even when the user does not configure either as a model provider.

The pricing-cache code already handles the empty-cache case gracefully (refs.length === 0 early return), so adding an explicit "skip remote pricing bootstrap" knob is purely additive and very small.

Error Message

2026-04-28T01:53:21.450-04:00 [gateway] starting... 2026-04-28T01:53:49.110-04:00 [gateway] starting HTTP server... 2026-04-28T01:53:49.344-04:00 [gateway] ready (5 plugins: kitchen, llm-task, memory-core, recipes, telegram; 27.9s) 2026-04-28T01:53:49.383-04:00 [gateway] starting channels and sidecars... 2026-04-28T01:54:21.790-04:00 [model-pricing] OpenRouter pricing fetch failed: TypeError: fetch failed 2026-04-28T01:59:39.105-04:00 [model-pricing] LiteLLM pricing fetch failed (timeout 60s): TimeoutError: The operation was aborted due to timeout 2026-04-28T01:59:41.984-04:00 [telegram] [default] starting provider 2026-04-28T01:59:42.314-04:00 [plugins] [kitchen] listening on http://100.81.189.7:7777 (dev=false)

Root Cause

  • openclaw recipes list (and any other gateway RPC) blocks behind the slow boot path. Measured 119,961 ms for a list call that returns 17 entries from on-disk markdown — same I/O work that completes in <50 ms in standalone Node.
  • WS clients see handshake timeout and closed before connect.
  • [telegram] Polling stall detected (active getUpdates stuck for 141.65s) happens repeatedly because the polling runner can't initialize cleanly while the gateway loop is blocked.

Code Example

2026-04-28T01:53:21.450-04:00 [gateway] starting...
2026-04-28T01:53:49.110-04:00 [gateway] starting HTTP server...
2026-04-28T01:53:49.344-04:00 [gateway] ready (5 plugins: kitchen, llm-task, memory-core, recipes, telegram; 27.9s)
2026-04-28T01:53:49.383-04:00 [gateway] starting channels and sidecars...
2026-04-28T01:54:21.790-04:00 [model-pricing] OpenRouter pricing fetch failed: TypeError: fetch failed
2026-04-28T01:59:39.105-04:00 [model-pricing] LiteLLM pricing fetch failed (timeout 60s): TimeoutError: The operation was aborted due to timeout
2026-04-28T01:59:41.984-04:00 [telegram] [default] starting provider
2026-04-28T01:59:42.314-04:00 [plugins] [kitchen] listening on http://100.81.189.7:7777 (dev=false)

---

// Line 57–58 — hardcoded URLs
const OPENROUTER_MODELS_URL = "https://openrouter.ai/api/v1/models";
const LITELLM_PRICING_URL  = "https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json";

// Line 439–445 — bootstrap fires both, regardless of whether the user uses these as providers
const [catalogById, litellmCatalog] = await Promise.all([
  fetchOpenRouterPricingCatalog(fetchImpl).catch((error) => {
    log.warn(formatPricingFetchFailure("OpenRouter", error));
    openRouterFailed = true;
    return new Map();
  }),
  fetchLiteLLMPricingCatalog(fetchImpl).catch((error) => {
    log.warn(formatPricingFetchFailure("LiteLLM", error));
    litellmFailed = true;
    return new Map();
  })
]);

// Line 505 — startGatewayModelPricingRefresh is called unconditionally from
// the gateway boot path in dist/server.impl-*.js:6900:
//   stopModelPricingRefresh: !params.minimalTestGateway && !isVitestRuntimeEnv()
//     ? startGatewayModelPricingRefresh({ config: params.cfgAtStart })
//     : () => {}

---

{
  "gateway": {
    "modelPricing": {
      "enabled": true,            // default: true (existing behavior)
      "sources": ["openrouter", "litellm"],   // optional allowlist
      "timeoutMs": 60000           // existing implicit timeout
    }
  }
}

---

function startGatewayModelPricingRefresh(params) {
  // NEW: respect explicit opt-out
  if (params.config.gateway?.modelPricing?.enabled === false) {
    return () => {};
  }
  // ... existing body
}
RAW_BUFFERClick to expand / collapse

Summary

Gateway boot is hard-coupled to two external HTTP endpoints (OpenRouter /models and the LiteLLM pricing JSON on raw.githubusercontent.com) for the model-pricing cache. There is no config flag, env var, or plugin disable to opt out. On networks where either endpoint is slow or unreachable, the fetches stack up to ~90 seconds of dead time during the post-ready "starting channels and sidecars..." phase, delaying Kitchen, Telegram, and every other channel from coming online — even when the user does not configure either as a model provider.

The pricing-cache code already handles the empty-cache case gracefully (refs.length === 0 early return), so adding an explicit "skip remote pricing bootstrap" knob is purely additive and very small.

Environment

  • OpenClaw CLI: 2026.4.25 (aa36ee6)
  • Host OS: macOS Darwin 25.4.0 arm64
  • Network: residential, both endpoints reachable but slow today; no proxy
  • Configured providers: openai-codex/gpt-5.4 (primary) + anthropic/claude-opus-4-7 (fallback). Neither OpenRouter nor LiteLLM is in plugins.entries, plugins.allow, or anywhere as a model provider.

What happened

A normal gateway restart (launchctl kickstart -k gui/$UID/ai.openclaw.gateway) on 2026-04-28 produced this timeline:

2026-04-28T01:53:21.450-04:00 [gateway] starting...
2026-04-28T01:53:49.110-04:00 [gateway] starting HTTP server...
2026-04-28T01:53:49.344-04:00 [gateway] ready (5 plugins: kitchen, llm-task, memory-core, recipes, telegram; 27.9s)
2026-04-28T01:53:49.383-04:00 [gateway] starting channels and sidecars...
2026-04-28T01:54:21.790-04:00 [model-pricing] OpenRouter pricing fetch failed: TypeError: fetch failed
2026-04-28T01:59:39.105-04:00 [model-pricing] LiteLLM pricing fetch failed (timeout 60s): TimeoutError: The operation was aborted due to timeout
2026-04-28T01:59:41.984-04:00 [telegram] [default] starting provider
2026-04-28T01:59:42.314-04:00 [plugins] [kitchen] listening on http://100.81.189.7:7777 (dev=false)

ready at 01:53:49. Kitchen/Telegram online at 01:59:42. Five-minute-fifty-three-second post-ready stall, of which ~90 seconds is OpenRouter (~32 s) + LiteLLM (60 s timeout). During that window:

  • openclaw recipes list (and any other gateway RPC) blocks behind the slow boot path. Measured 119,961 ms for a list call that returns 17 entries from on-disk markdown — same I/O work that completes in <50 ms in standalone Node.
  • WS clients see handshake timeout and closed before connect.
  • [telegram] Polling stall detected (active getUpdates stuck for 141.65s) happens repeatedly because the polling runner can't initialize cleanly while the gateway loop is blocked.

Removing OpenRouter/LiteLLM from openclaw.json is not possible because they are not present there. They are baked-in URLs in dist/usage-format-*.js.

Root cause area

dist/usage-format-ZhKID6__.js:

// Line 57–58 — hardcoded URLs
const OPENROUTER_MODELS_URL = "https://openrouter.ai/api/v1/models";
const LITELLM_PRICING_URL  = "https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json";

// Line 439–445 — bootstrap fires both, regardless of whether the user uses these as providers
const [catalogById, litellmCatalog] = await Promise.all([
  fetchOpenRouterPricingCatalog(fetchImpl).catch((error) => {
    log.warn(formatPricingFetchFailure("OpenRouter", error));
    openRouterFailed = true;
    return new Map();
  }),
  fetchLiteLLMPricingCatalog(fetchImpl).catch((error) => {
    log.warn(formatPricingFetchFailure("LiteLLM", error));
    litellmFailed = true;
    return new Map();
  })
]);

// Line 505 — startGatewayModelPricingRefresh is called unconditionally from
// the gateway boot path in dist/server.impl-*.js:6900:
//   stopModelPricingRefresh: !params.minimalTestGateway && !isVitestRuntimeEnv()
//     ? startGatewayModelPricingRefresh({ config: params.cfgAtStart })
//     : () => {}

collectConfiguredModelPricingRefs(config) walks every agents.*.model, tools.subagents.model, hooks.gmail.model, etc. As soon as ANY model ref exists in config, both fetches fire. There is no early return for "user hasn't opted in to remote pricing data."

Why this looks like a bug (not just slow networking)

  1. Coupling violates least-surprise. A user with openai + anthropic providers does not expect their gateway to phone OpenRouter and the BerriAI/litellm GitHub repo at boot.
  2. Air-gapped / restricted-internet installs are blocked. The 60 s LiteLLM timeout is paid on every restart; that's a hard latency floor on recovery from any gateway issue.
  3. No documented opt-out. No env var (grep -E 'OPENCLAW_DISABLE_PRICING|DISABLE_PRICING' dist/ → empty). No schema field for gateway.modelPricing. No plugin to disable. Searching the schema for pricing returns only the parseLiteLLMTieredPricing internals.
  4. Failure mode is graceful but the timeouts still serialize sidecar startup. The .catch handlers prevent crash, but they don't prevent the wait.
  5. The endpoints don't sign their data and aren't authenticated. Anyone who hijacks them can poison a user's local pricing cache. That's a separate concern but worth noting since the coupling is hardcoded.

Suggested fix

Add a config flag, default-true to preserve existing behavior:

{
  "gateway": {
    "modelPricing": {
      "enabled": true,            // default: true (existing behavior)
      "sources": ["openrouter", "litellm"],   // optional allowlist
      "timeoutMs": 60000           // existing implicit timeout
    }
  }
}

Implementation:

function startGatewayModelPricingRefresh(params) {
  // NEW: respect explicit opt-out
  if (params.config.gateway?.modelPricing?.enabled === false) {
    return () => {};
  }
  // ... existing body
}

The empty-cache path already exists (refs.length === 0 short-circuit at line 432–435), so flipping the flag to false is a clean no-op.

Severity / impact

  • High for restricted-network installs (corporate, regulated, intermittent): each gateway restart costs a hard 60+ s plus whatever variability the OpenRouter endpoint adds.
  • Medium for everyone else: ~30–90 s of post-ready dead time on every restart. Compounds with the worker-storm pattern (#70472 / sibling) when paired.
  • Low function risk: the existing graceful-fail path means correctness is not affected — only latency.

Additional context

This was caught while debugging a slow-restart symptom on a gateway that also had an orphaned wedged agent:main:main session (#70472). Even after archiving the wedged session and launchctl kickstart-ing, the post-ready stall persisted at ~6 min, of which ~90 s was directly attributable to the two pricing-fetch timeouts. The remaining ~3 min of unaccounted sidecar time appears to be elsewhere in the channel-startup serialization but is out of scope for this issue.

I'd be happy to send a PR if a maintainer can confirm the proposed config shape (gateway.modelPricing.enabled: boolean).

extent analysis

TL;DR

Add a configuration flag to opt-out of remote model pricing bootstrapping to prevent unnecessary timeouts and delays during gateway startup.

Guidance

  • Identify the hardcoded URLs in dist/usage-format-*.js and consider the impact of removing or modifying them.
  • Review the proposed config shape (gateway.modelPricing.enabled: boolean) to determine its feasibility and potential effects on existing behavior.
  • Consider implementing a timeout or retry mechanism for the pricing fetches to mitigate the effects of slow or unreachable endpoints.
  • Evaluate the security implications of the hardcoded endpoints and the potential risks of data poisoning.

Example

The proposed fix suggests adding a config flag to opt-out of remote model pricing bootstrapping:

{
  "gateway": {
    "modelPricing": {
      "enabled": true,
      "sources": ["openrouter", "litellm"],
      "timeoutMs": 60000
    }
  }
}

And implementing it in the startGatewayModelPricingRefresh function:

function startGatewayModelPricingRefresh(params) {
  if (params.config.gateway?.modelPricing?.enabled === false) {
    return () => {};
  }
  // ... existing body
}

Notes

The proposed fix assumes that the existing behavior can be preserved by defaulting the enabled flag to true. However, this may not be desirable in all scenarios, and further discussion may be necessary to determine the best approach.

Recommendation

Apply the proposed workaround by adding a configuration flag to opt-out of remote model pricing bootstrapping, as it provides a clear and explicit way to mitigate the issue without introducing significant changes to the existing codebase.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Gateway hard-couples to OpenRouter + LiteLLM pricing fetches at boot; no opt-out [3 comments, 3 participants]