openclaw - 💡(How to fix) Fix LiveSessionModelSwitchError prevents cross-provider model fallback [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58045Fetched 2026-04-08 01:54:36
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
2
Author
Participants

When the primary model (e.g. anthropic/claude-sonnet-4-5) returns a 429 rate limit error, the gateway correctly decides to fall back to the next configured model (e.g. google/gemini-2.5-flash). However, the embedded run's resolvePersistedLiveSelection() detects a "model switch" and throws LiveSessionModelSwitchError, preventing the fallback from executing. All fallback candidates fail the same way, resulting in "internal error" returned to the user.

Error Message

[model-fallback/decision] decision=candidate_failed requested=anthropic/claude-sonnet-4-5 candidate=anthropic/claude-sonnet-4-5 reason=rate_limit next=google/gemini-2.5-flash

[agent/embedded] live session model switch detected before attempt: google/gemini-2.5-flash -> anthropic/claude-sonnet-4-5

[model-fallback/decision] decision=candidate_failed requested=anthropic/claude-sonnet-4-5 candidate=google/gemini-2.5-flash reason=unknown next=anthropic/claude-haiku-4-5

[agent/embedded] live session model switch detected before attempt: anthropic/claude-haiku-4-5 -> anthropic/claude-sonnet-4-5

[openai-compat] chat completion failed: LiveSessionModelSwitchError: Live session model switch requested: anthropic/claude-sonnet-4-5

Root Cause

resolvePersistedLiveSelection() returns the agent's default model (Sonnet) from the session store. The fallback caller (runWithModelFallback) changes the runtime model to Gemini, but the embedded run interprets this as a user-initiated model switch and overrides it back to Sonnet, then throws LiveSessionModelSwitchError.

The model switch detection was designed for /model user commands mid-session, not for failover-initiated model changes. There's no flag to distinguish "fallback attempt" from "user switched models."

Fix Action

Workaround

Use the fallback provider as the PRIMARY model. E.g., set Gemini as primary (never rate-limits) with Anthropic as fallback. The fallback direction (Gemini → Anthropic) may also hit this bug, but since Gemini rarely fails, the issue doesn't surface in practice.

Code Example

primary: anthropic/claude-sonnet-4-5
   fallbacks: [google/gemini-2.5-flash, anthropic/claude-haiku-4-5]

---

[model-fallback/decision] decision=candidate_failed requested=anthropic/claude-sonnet-4-5 candidate=anthropic/claude-sonnet-4-5 reason=rate_limit next=google/gemini-2.5-flash

[agent/embedded] live session model switch detected before attempt: google/gemini-2.5-flash -> anthropic/claude-sonnet-4-5

[model-fallback/decision] decision=candidate_failed requested=anthropic/claude-sonnet-4-5 candidate=google/gemini-2.5-flash reason=unknown next=anthropic/claude-haiku-4-5

[agent/embedded] live session model switch detected before attempt: anthropic/claude-haiku-4-5 -> anthropic/claude-sonnet-4-5

[openai-compat] chat completion failed: LiveSessionModelSwitchError: Live session model switch requested: anthropic/claude-sonnet-4-5
RAW_BUFFERClick to expand / collapse

Summary

When the primary model (e.g. anthropic/claude-sonnet-4-5) returns a 429 rate limit error, the gateway correctly decides to fall back to the next configured model (e.g. google/gemini-2.5-flash). However, the embedded run's resolvePersistedLiveSelection() detects a "model switch" and throws LiveSessionModelSwitchError, preventing the fallback from executing. All fallback candidates fail the same way, resulting in "internal error" returned to the user.

Reproduction

  1. Configure an agent with:
    primary: anthropic/claude-sonnet-4-5
    fallbacks: [google/gemini-2.5-flash, anthropic/claude-haiku-4-5]
  2. Exhaust all Anthropic rate limit quota
  3. Send a chat completion request to the agent
  4. Expected: falls back to Gemini
  5. Actual: all candidates fail with LiveSessionModelSwitchError, returns "internal error"

Log Evidence

[model-fallback/decision] decision=candidate_failed requested=anthropic/claude-sonnet-4-5 candidate=anthropic/claude-sonnet-4-5 reason=rate_limit next=google/gemini-2.5-flash

[agent/embedded] live session model switch detected before attempt: google/gemini-2.5-flash -> anthropic/claude-sonnet-4-5

[model-fallback/decision] decision=candidate_failed requested=anthropic/claude-sonnet-4-5 candidate=google/gemini-2.5-flash reason=unknown next=anthropic/claude-haiku-4-5

[agent/embedded] live session model switch detected before attempt: anthropic/claude-haiku-4-5 -> anthropic/claude-sonnet-4-5

[openai-compat] chat completion failed: LiveSessionModelSwitchError: Live session model switch requested: anthropic/claude-sonnet-4-5

Root Cause

resolvePersistedLiveSelection() returns the agent's default model (Sonnet) from the session store. The fallback caller (runWithModelFallback) changes the runtime model to Gemini, but the embedded run interprets this as a user-initiated model switch and overrides it back to Sonnet, then throws LiveSessionModelSwitchError.

The model switch detection was designed for /model user commands mid-session, not for failover-initiated model changes. There's no flag to distinguish "fallback attempt" from "user switched models."

Impact

  • Cross-provider fallback is completely broken (Anthropic → Google, Anthropic → OpenAI)
  • Same-provider fallback MAY also be affected (Sonnet → Haiku) though not confirmed
  • Customer-facing agents go down entirely when the primary provider is rate limited, even with fallbacks configured
  • Affects all agents with cross-provider fallback configurations

Suggested Fix

Pass a isFallbackAttempt: true flag from runWithModelFallback into the embedded run context. When this flag is set, resolvePersistedLiveSelection() should skip the model switch detection and allow the fallback model to proceed.

Workaround

Use the fallback provider as the PRIMARY model. E.g., set Gemini as primary (never rate-limits) with Anthropic as fallback. The fallback direction (Gemini → Anthropic) may also hit this bug, but since Gemini rarely fails, the issue doesn't surface in practice.

Environment

  • OpenClaw version: 2026.3.28 (f9b1079)

extent analysis

Fix Plan

To fix the issue, we need to pass a flag from runWithModelFallback to the embedded run context to distinguish between user-initiated model switches and fallback attempts. Here are the steps:

  • Modify runWithModelFallback to pass isFallbackAttempt: true to the embedded run context.
  • Update resolvePersistedLiveSelection() to check for the isFallbackAttempt flag. If it's true, skip the model switch detection.

Example code:

// runWithModelFallback.js
const embeddedRunContext = {
  // ... other properties ...
  isFallbackAttempt: true,
};

// resolvePersistedLiveSelection.js
if (embeddedRunContext.isFallbackAttempt) {
  // Skip model switch detection
  return fallbackModel;
} else {
  // Current model switch detection logic
}

Verification

To verify the fix, follow these steps:

  1. Configure an agent with a primary model and fallbacks.
  2. Exhaust the primary model's rate limit quota.
  3. Send a chat completion request to the agent.
  4. Check if the agent falls back to the next configured model successfully.

Extra Tips

  • Make sure to update the documentation to reflect the new isFallbackAttempt flag.
  • Consider adding a test case to cover the fallback scenario to prevent regressions.
  • Review other parts of the codebase that may be affected by the model switch detection logic.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING