openclaw - 💡(How to fix) Fix Live model switch check prevents model fallback from working [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58556Fetched 2026-04-08 02:01:08
View on GitHub
Comments
3
Participants
2
Timeline
3
Reactions
0
Participants
Timeline (top)
commented ×2cross-referenced ×1

When the heartbeat is configured with a different model than the agent's primary (e.g. heartbeat on claude-haiku-4-5, primary on claude-opus-4-6), the live model switch mechanism intercepts every fallback candidate and forces it back to the primary model. This defeats model fallback entirely — if the primary is overloaded, no fallback candidate ever gets a real attempt.

Root Cause

In src/agents/pi-embedded-runner/run.ts at line 452, before every attempt the runner calls resolvePersistedLiveSelection(). This reads the agent's configured default model via resolveDefaultModelForAgent() in src/agents/live-model-switch.ts:41-46. When there is no session-level modelOverride/providerOverride, it returns the agent's primary model (Opus).

When the fallback loop runs a non-primary candidate (e.g. Sonnet), the pre-attempt check compares the candidate (Sonnet) against the persisted live selection (Opus), sees they differ, and throws LiveSessionModelSwitchError. This happens for every fallback candidate except Opus itself.

The LiveSessionModelSwitchError is not recognized as a failover reason, so it gets logged as reason=unknown.

Code Example

{
     "heartbeat": {
       "every": "30m",
       "model": "anthropic/claude-haiku-4-5",
       "lightContext": true,
       "isolatedSession": true
     }
   }

---

[agent/embedded] live session model switch detected before attempt for 65ea85b7: openrouter/openrouter/auto -> anthropic/claude-opus-4-6
[model-fallback/decision] candidate_failed requested=anthropic/claude-haiku-4-5 candidate=openrouter/openrouter/auto reason=unknown
[agent/embedded] live session model switch detected before attempt for 65ea85b7: anthropic/claude-sonnet-4-6 -> anthropic/claude-opus-4-6
[model-fallback/decision] candidate_failed requested=anthropic/claude-haiku-4-5 candidate=anthropic/claude-sonnet-4-6 reason=unknown
[agent/embedded] live session model switch detected before attempt for 65ea85b7: google/gemini-2.5-flash -> anthropic/claude-opus-4-6
[model-fallback/decision] candidate_failed requested=anthropic/claude-haiku-4-5 candidate=google/gemini-2.5-flash reason=unknown
RAW_BUFFERClick to expand / collapse

Live model switch check prevents model fallback from working

Summary

When the heartbeat is configured with a different model than the agent's primary (e.g. heartbeat on claude-haiku-4-5, primary on claude-opus-4-6), the live model switch mechanism intercepts every fallback candidate and forces it back to the primary model. This defeats model fallback entirely — if the primary is overloaded, no fallback candidate ever gets a real attempt.

Steps to reproduce

  1. Configure primary model as anthropic/claude-opus-4-6
  2. Configure heartbeat with a cheaper model:
    {
      "heartbeat": {
        "every": "30m",
        "model": "anthropic/claude-haiku-4-5",
        "lightContext": true,
        "isolatedSession": true
      }
    }
  3. Configure fallbacks: ["openrouter/auto", "anthropic/claude-sonnet-4-6", "google/gemini-2.5-flash"]
  4. Wait for the primary model to return overloaded_error

Expected behavior

The fallback chain should try each candidate in order (OpenRouter, Sonnet, Gemini Flash) when the primary is overloaded.

Actual behavior

Every fallback candidate is rejected by the live model switch check before it can make an API call:

[agent/embedded] live session model switch detected before attempt for 65ea85b7: openrouter/openrouter/auto -> anthropic/claude-opus-4-6
[model-fallback/decision] candidate_failed requested=anthropic/claude-haiku-4-5 candidate=openrouter/openrouter/auto reason=unknown
[agent/embedded] live session model switch detected before attempt for 65ea85b7: anthropic/claude-sonnet-4-6 -> anthropic/claude-opus-4-6
[model-fallback/decision] candidate_failed requested=anthropic/claude-haiku-4-5 candidate=anthropic/claude-sonnet-4-6 reason=unknown
[agent/embedded] live session model switch detected before attempt for 65ea85b7: google/gemini-2.5-flash -> anthropic/claude-opus-4-6
[model-fallback/decision] candidate_failed requested=anthropic/claude-haiku-4-5 candidate=google/gemini-2.5-flash reason=unknown

All candidates are switched to Opus, which is the model that's overloaded, so the entire chain fails.

Root cause

In src/agents/pi-embedded-runner/run.ts at line 452, before every attempt the runner calls resolvePersistedLiveSelection(). This reads the agent's configured default model via resolveDefaultModelForAgent() in src/agents/live-model-switch.ts:41-46. When there is no session-level modelOverride/providerOverride, it returns the agent's primary model (Opus).

When the fallback loop runs a non-primary candidate (e.g. Sonnet), the pre-attempt check compares the candidate (Sonnet) against the persisted live selection (Opus), sees they differ, and throws LiveSessionModelSwitchError. This happens for every fallback candidate except Opus itself.

The LiveSessionModelSwitchError is not recognized as a failover reason, so it gets logged as reason=unknown.

Suggested fix

The live model switch check should be skipped or aware of model fallback context. When the embedded runner is being invoked as part of the fallback loop with a non-primary candidate, the pre-attempt live selection comparison should not override the candidate selection. The fallback loop has already decided to try a different model — the live switch check shouldn't fight that decision.

Environment

  • OpenClaw 2026.3.28 (f9b1079)
  • Node 22 on Debian (Hetzner ARM64)
  • Docker via Coolify

extent analysis

TL;DR

Modify the live model switch check to skip or be aware of the model fallback context when the embedded runner is invoked with a non-primary candidate.

Guidance

  • Identify the resolvePersistedLiveSelection() function in src/agents/pi-embedded-runner/run.ts and consider modifying it to check if the current attempt is part of a fallback loop.
  • If it is a fallback attempt, skip the live model switch check or make it aware of the fallback context to prevent overriding the candidate selection.
  • Review the LiveSessionModelSwitchError handling to recognize it as a valid failover reason and log a more informative error message.
  • Consider adding a flag or indicator to the fallback loop to signal that the live model switch check should be skipped or modified.

Example

// In src/agents/pi-embedded-runner/run.ts
if (isFallbackAttempt) {
  // Skip live model switch check or modify it to be aware of fallback context
  // ...
} else {
  // Current live model switch check logic
  const persistedLiveSelection = resolvePersistedLiveSelection();
  // ...
}

Notes

The suggested fix requires modifying the existing codebase, specifically the resolvePersistedLiveSelection() function and the live model switch check logic. This change should be thoroughly tested to ensure it resolves the issue without introducing new problems.

Recommendation

Apply a workaround by modifying the live model switch check to be aware of the model fallback context, as this is a more targeted solution that addresses the root cause of the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The fallback chain should try each candidate in order (OpenRouter, Sonnet, Gemini Flash) when the primary is overloaded.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Live model switch check prevents model fallback from working [3 comments, 2 participants]