openclaw - ✅(Solved) Fix Failover model permanently locks session — agent config changes ignored [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#64571Fetched 2026-04-11 06:14:21
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
0
Timeline (top)
referenced ×5cross-referenced ×2commented ×1

Root Cause

  1. setSessionRuntimeModel() sets entry.model/entry.modelProvider to whatever model was used (including fallback)
  2. applyFallbackCandidateSelectionToEntry() sets modelOverride/providerOverride with modelOverrideSource: "auto"
  3. resolveGatewayRunModel() checks entry.model first — if set, returns immediately without consulting agent config

Fix Action

Fixed

PR fix notes

PR #64579: fix(agents): clear auto failover session model stickiness (#64571)

Description (problem / solution / changelog)

Summary

  • Problem: After a successful model failover, persisted modelOverrideSource: "auto" overrides plus sticky model / modelProvider session fields kept subsequent inbound turns on the fallback model instead of re-consulting the agent configured primary.
  • Why it matters: Operators could change the agent primary or expect the next message to retry the primary while healthy; the session stayed pinned until manual intervention.
  • What changed: At the start of createModelSelectionState, clear auto failover sticky fields (overrides, auth-profile fields written with failover, runtime model identity, cached context window, fallback notice metadata) and persist when a session store path is present. Exposed helper clearAutoFailoverSessionModelStickyState in src/sessions/model-overrides.ts.
  • What did NOT change: User-initiated overrides (modelOverrideSource: "user" or legacy overrides without source), allowlist enforcement, or failover persistence during an active failing run.

Change Type

  • Bug fix

Scope

  • Gateway / orchestration

Linked Issue/PR

  • Closes #64571

Root Cause

  • Root cause: Failover persistence wrote auto overrides and updateSessionStoreAfterAgentRun wrote last-run model identity; inbound selection honored those before agent primary.
  • Missing detection / guardrail: No per-turn reconciliation that auto failover state should not pin the next message if the configured primary should be tried again.

Regression Test Plan

  • Coverage: Unit tests in src/sessions/model-overrides.test.ts and src/auto-reply/reply/model-selection.test.ts.
  • Scenario: Session with modelOverrideSource: "auto" and fallback overrides clears at model selection; resolved provider/model match configured primary; resetModelOverride is set for directive handling.

User-visible / Behavior Changes

  • After a prior successful run on a fallback model, the next user message again selects the agent configured primary (then failover may still occur if the primary remains unavailable).

Diagram

N/A

Security Impact

  • New permissions/capabilities? No
  • New network endpoints / listeners? No
  • Touches auth/secrets/pairing? No (clears session-scoped override metadata only for auto failover)

Made with Cursor

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/auto-reply/reply/model-selection.test.ts (modified, +43/-0)
  • src/auto-reply/reply/model-selection.ts (modified, +24/-1)
  • src/sessions/model-overrides.test.ts (modified, +121/-1)
  • src/sessions/model-overrides.ts (modified, +56/-0)

PR #64591: fix(agents): clear auto failover session-model stickiness (#64571)

Description (problem / solution / changelog)

Summary

  • Problem: successful failover persisted auto model-selection state plus sticky runtime model fields, so later turns stayed pinned to fallback instead of retrying the configured primary.
  • Core fix: clear auto-failover sticky session model state at inbound model selection, before override resolution.
  • Review-driven hardening: preserve user/legacy-user auth profile overrides, avoid false model-override-not-allowed signaling, and persist cleanup from latest on-disk entry inside the store lock path while guarding unsafe object keys.
  • Scope boundary: does not change user-initiated model overrides, normal failover behavior during an active failing run, or allowlist semantics.

Change Type

  • Bug fix

Scope

  • Gateway / orchestration

Linked Issue/PR

  • Closes #64571

Root Cause

  • Failover persistence and runtime-model persistence outlived the transient fallback turn, and inbound selection consumed that persisted state on later turns before consulting configured primary.

Regression Test Plan

  • Unit tests:
    • src/sessions/model-overrides.test.ts
    • src/auto-reply/reply/model-selection.test.ts
  • Scenarios covered:
    • auto failover sticky model state clears and primary is selected next turn
    • user and legacy-user auth profile overrides are preserved
    • inferred auto auth profile state (compaction-count path) is cleared
    • disallowed-override reset signaling remains distinct from auto-failover cleanup

User-visible / Behavior Changes

  • After a fallback-success turn, the next inbound turn re-consults agent primary config instead of staying pinned to the prior fallback.

Diagram

N/A

Security Impact

  • New permissions/capabilities? No
  • New network endpoints/listeners? No
  • Touches auth/secrets/pairing? Session auth-profile override persistence only (state correctness hardening).

Changed files

  • CHANGELOG.md (modified, +3/-0)
  • src/auto-reply/reply/model-selection.test.ts (modified, +43/-0)
  • src/auto-reply/reply/model-selection.ts (modified, +53/-1)
  • src/sessions/model-overrides.test.ts (modified, +121/-1)
  • src/sessions/model-overrides.ts (modified, +56/-0)
RAW_BUFFERClick to expand / collapse

Problem

After a model failover, the session entry permanently locks to the fallback model via sticky model, modelProvider, modelOverride, and providerOverride fields. Subsequent messages never consult the agent's configured primary model again.

Root Cause

  1. setSessionRuntimeModel() sets entry.model/entry.modelProvider to whatever model was used (including fallback)
  2. applyFallbackCandidateSelectionToEntry() sets modelOverride/providerOverride with modelOverrideSource: "auto"
  3. resolveGatewayRunModel() checks entry.model first — if set, returns immediately without consulting agent config

Evidence

MonicaHall configured with github-copilot/claude-sonnet-4.6 but all Telegram sessions stuck on azure/kimi-k2.5-thinking or azure/gpt-5.4 after failover events.

Proposed Fix

Re-resolve from config at run start: when modelOverrideSource: "auto" and the agent config primary differs, clear the override and use config.

Impact

Any agent experiencing one failover permanently runs on the wrong model until manually fixed.

Related

  • #724

extent analysis

TL;DR

Clearing the modelOverride and providerOverride fields when modelOverrideSource is "auto" and the agent config primary differs may resolve the issue.

Guidance

  • Review the setSessionRuntimeModel() and applyFallbackCandidateSelectionToEntry() functions to ensure they are not permanently setting the model and modelProvider fields to the fallback model.
  • Verify that the resolveGatewayRunModel() function is correctly checking the agent config primary model when entry.model is set.
  • Consider implementing a check at run start to clear the override and use the config primary model when modelOverrideSource is "auto" and the agent config primary differs.
  • Test the proposed fix with different model configurations and failover scenarios to ensure the issue is fully resolved.

Example

No code snippet is provided as the issue does not contain sufficient code details.

Notes

The proposed fix may not apply to all scenarios, and additional testing is required to ensure the solution works as expected. The issue is related to #724, which may contain additional context or information.

Recommendation

Apply workaround: Clear the modelOverride and providerOverride fields when modelOverrideSource is "auto" and the agent config primary differs, as this is a targeted solution to the specific issue described.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING