If a primary model fails, OpenClaw should: 1. detect the primary failure 2. attempt fallback automatically 3. suppress the primary error from the user-facing reply if fallback succeeds 4. return the fallback model’s answer normally At minimum, if certain failure classes are intentionally not fallback-eligible, that behavior should be documented clearly.

openclaw - ✅(Solved) Fix [Bug]: Automatic model fallback does not recover cleanly from invalid primary model ID; user-facing HTTP 400 is surfaced instead of silent failover [1 pull requests, 1 comments, 2 participants]

forrestgdean · 2026-03-18T20:50:10Z

[openclaw] When OpenClaw is configured with a primary model plus valid fallback models, and the primary is intentionally or accidentally set to an invalid mode… When OpenClaw is configured with a primary model plus valid fallback models, and the primary is intentionally or accidentally set to an invalid model ID, OpenClaw surfaces the provider error directly to the user: HTTP 400: openrouter/invalid_test_model is not a valid model ID Instead of transparently failing over to the first valid fallback model, the chat interaction is interrupted and repeated user messages continue to fail until the config is manually restored. # PR #50028: fix: classify invalid-model fallback errors - Repository: openclaw/openclaw - Author: xiwuqi - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/50028 ## Description (problem / solution / changelog) ## Summary - Problem: invalid primary model IDs returned as HTTP 400/422 were classified as generic `format` failures instead of `model_not_found`, which weakened the automatic fallback path for issue-backed `"… is not a valid model ID"` responses. - Why it matters: the fallback system, warning path, and retry semantics all treat `model_not_found` as a first-class reason; misclassifying these provider responses makes invalid-model failover behavior less reliable and less observable. - What changed: `isModelNotFoundErrorMessage()` now recognizes `"not a valid model ID"`, and HTTP 400/422 classification now prefers `model_not_found` before the generic `format` fallback. - What did NOT change (scope boundary): generic 400/422 request-shape errors still stay in the `format` lane, and billing overrides on 400/422 are still preserved. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [x] API / contracts - [x] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #50017 - Related # ## User-visible / Behavior Changes Invalid primary model IDs that come back as `HTTP 400/422 ... is not a valid model ID` now stay in the `model_not_found` failover lane instead of being downgraded to a generic format error. ## Security Impact (required) - New permissions/capabilities? (`Yes/No`): No - Secrets/tokens handling changed? (`Yes/No`): No - New/changed network calls? (`Yes/No`): No - Command/tool execution surface changed? (`Yes/No`): No - Data access scope changed? (`Yes/No`): No - If any `Yes`, explain risk + mitigation: ## Repro + Verification ### Environment - OS: Windows 11 - Runtime/container: Node 20 / pnpm 10 in local worktree validation - Model/provider: invalid model IDs under `openrouter`-style HTTP 400/422 responses - Integration/channel (if any): agent model failover - Relevant config (redacted): primary model set to an invalid model ID with valid configured fallbacks ### Steps 1. Configure a primary model ref that does not exist and at least one valid fallback. 2. Trigger a provider error shaped like `HTTP 400: openrouter/__invalid_test_model__ is not a valid model ID`. 3. Observe failover classification and fallback attempt recording. ### Expected - Invalid-model failures are classified as `model_not_found`. - Automatic fallback keeps the not-found semantics instead of treating the error as a generic format failure. ### Actual - Before this change, the same HTTP 400/422 responses were classified as `format`. ## Evidence Attach at least one: - [x] Failing test/log before + passing after - [ ] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) ## Human Verification (required) What you personally verified (not just CI), and how: - Verified scenarios: added a regression in `failover-error.test.ts` for raw and HTTP 400/422 invalid-model messages, plus a `model-fallback.test.ts` case that now records the attempt as `model_not_found` during automatic fallback. - Edge cases checked: status-only `400/422` still map to `format`; `400/422` billing payloads still map to `billing`. - What you did **not** verify: full `pnpm check` / full repo test lanes did not complete locally in this Windows environment before the local 10s guard, so full coverage is left to CI. ## Review Conversations - [x] I replied to or resolved every bot review conversation I addressed in this PR. - [x] I left unresolved only the conversations that still need reviewer or maintainer judgment. ## Compatibility / Migration - Backward compatible? (`Yes/No`): Yes - Config/env changes? (`Yes/No`): No - Migration needed? (`Yes/No`): No - If yes, exact upgrade steps: ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: revert this commit to restore the previous `400/422 -> format` behavior. - Files/config to restore: `src/agents/pi-embedded-helpers/errors.ts`, `src/agents/failover-error.test.ts`, `src/agents/model-fal

openclaw2026-03-18 20:50:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#50017•Fetched 2026-04-08 01:00:15

View on GitHub

Comments

Participants

Timeline

Reactions

Author

forrestgdean

Participants

forrestgdean

Hollychou924

Timeline (top)

labeled ×2commented ×1cross-referenced ×1

When OpenClaw is configured with a primary model plus valid fallback models, and the primary is intentionally or accidentally set to an invalid model ID, OpenClaw surfaces the provider error directly to the user:

HTTP 400: openrouter/invalid_test_model is not a valid model ID

Instead of transparently failing over to the first valid fallback model, the chat interaction is interrupted and repeated user messages continue to fail until the config is manually restored.

Error Message

fallback does not rescue the interaction before the error is surfaced

Root Cause

HTTP 400: openrouter/invalid_test_model is not a valid model ID

Instead of transparently failing over to the first valid fallback model, the chat interaction is interrupted and repeated user messages continue to fail until the config is manually restored.

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Summary

HTTP 400: openrouter/invalid_test_model is not a valid model ID

Instead of transparently failing over to the first valid fallback model, the chat interaction is interrupted and repeated user messages continue to fail until the config is manually restored.

Steps to reproduce

Reproduction steps: Configure valid fallback models in agents.defaults.model.fallbacks Temporarily set primary to an invalid model ID, e.g.: openrouter/invalid_test_model Send a normal user request Observe the reply

Expected behavior

If a primary model fails, OpenClaw should:

detect the primary failure
attempt fallback automatically
suppress the primary error from the user-facing reply if fallback succeeds
return the fallback model’s answer normally

At minimum, if certain failure classes are intentionally not fallback-eligible, that behavior should be documented clearly.

Actual behavior

Actual behavior:

invalid primary model ID returns a user-visible HTTP 400
fallback does not rescue the interaction before the error is surfaced
subsequent messages continue failing until config is manually corrected

OpenClaw version

v2026.3.13

Operating system

Ubuntu 24.04.4 LTS

Install method

npm global

Model

openai-codex/gpt-5.4

Provider / routing chain

openclaw -> openai-codex (primary) -> openrouter (fallbacks)

Config file / key location

~/.openclaw/openclaw.json

Additional provider/model setup details

Additional observations

Explicitly selecting a fallback model for a session works fine
Fallback model path itself is functional
The failure appears to be in automatic failover UX/behavior, not in the fallback models themselves
Previous testing also suggested user-visible errors may leak through during other fallback-triggering cases such as 429s

Logs, screenshots, and evidence

Impact and severity

It happens when the primary model tokens have been exhausted and OpenClaw needs to fall back to the fallback models.

The severity is not critical as there is a work around by manually modifying the order of models in the openclaw.json config file.

Additional information

Suggested improvement A more robust behavior would be:

if primary fails and a fallback succeeds, only return the fallback result
log the primary failure internally
optionally expose failure details in logs/status, not in the user-facing chat path

extent analysis

Fix Plan

To address the issue of OpenClaw not automatically failing over to a valid fallback model when the primary model ID is invalid, we need to modify the code to handle this scenario. Here are the steps:

Modify the openclaw.js file to catch the error when the primary model fails and attempt to use the first valid fallback model.
Add a check to ensure that the primary model ID is valid before attempting to use it.
If the primary model ID is invalid, log the error internally and use the first valid fallback model.

Example code changes:

// openclaw.js
const primaryModelId = 'openrouter/__invalid_test_model__';
const fallbackModels = ['openai-codex/gpt-5.4', 'other-fallback-model'];

try {
  // Attempt to use primary model
  const primaryModel = getModel(primaryModelId);
  const result = await primaryModel.respond(userInput);
  // If primary model succeeds, return result
  return result;
} catch (error) {
  // If primary model fails, log error and attempt to use first valid fallback model
  console.error(`Primary model ${primaryModelId} failed: ${error}`);
  const fallbackModelId = fallbackModels[0];
  const fallbackModel = getModel(fallbackModelId);
  const fallbackResult = await fallbackModel.respond(userInput);
  // Return fallback result
  return fallbackResult;
}

Update the openclaw.json config file to include a list of valid fallback models.

{
  "model": {
    "primary": "openrouter/__invalid_test_model__",
    "fallbacks": ["openai-codex/gpt-5.4", "other-fallback-model"]
  }
}

Verification

To verify that the fix worked, follow these steps:

Set the primary model ID to an invalid value in the openclaw.json config file.
Send a user request to OpenClaw.
Verify that the response is generated by the first valid fallback model.
Check the logs to ensure that the primary model failure is logged internally.

Extra Tips

Make sure to test the fallback behavior with different types of primary model failures, such as 429 errors.
Consider adding additional logging and monitoring to detect and respond to primary model failures.
Review the OpenClaw documentation to ensure that the fallback behavior is clearly documented.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

If a primary model fails, OpenClaw should:

detect the primary failure
attempt fallback automatically
suppress the primary error from the user-facing reply if fallback succeeds
return the fallback model’s answer normally

At minimum, if certain failure classes are intentionally not fallback-eligible, that behavior should be documented clearly.

#api #ssr #installation #tensor shape #autograd error #conversation history #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: Automatic model fallback does not recover cleanly from invalid primary model ID; user-facing HTTP 400 is surfaced instead of silent failover [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #50028: fix: classify invalid-model fallback errors

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Changed files

Bug type

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Config file / key location

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING