openclaw - ✅(Solved) Fix [Bug]: Fallback model can stick to session after auto fallback, causing later runs to start from persisted override [2 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#68706Fetched 2026-04-19 15:08:27
View on GitHub
Comments
1
Participants
1
Timeline
7
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×2labeled ×2referenced ×2commented ×1

Automatic fallback model selection appears to persist into session state and can affect later turns as if it were a durable session model override.

I observed a Telegram session that seemed to switch away from the configured primary model and continue from a fallback model on later messages.

Root Cause

Automatic fallback model selection appears to persist into session state and can affect later turns as if it were a durable session model override.

I observed a Telegram session that seemed to switch away from the configured primary model and continue from a fallback model on later messages.

Fix Action

Fixed

PR fix notes

PR #68764: fix: ignore stale auto-fallback model overrides on session reload

Description (problem / solution / changelog)

Problem

Closes #68706

When auto-fallback selects an alternative model (e.g. primary model is rate-limited), it writes modelOverride/providerOverride/modelOverrideSource:"auto" to the persisted session store. A rollback mechanism clears this after the turn completes, but if the process crashes between persist and rollback, the stale "auto" override survives and pins the session to the fallback model indefinitely.

Subsequent messages continue using the fallback model even after the primary model becomes available again, because resolveStoredModelOverride() does not distinguish between user-initiated and auto-fallback overrides.

Fix

In resolveStoredModelOverride() (src/auto-reply/reply/stored-model-override.ts), skip the override when modelOverrideSource === "auto". This causes the session to re-resolve the default model on the next turn.

This is safe because:

  • "auto" overrides are transient by design (they exist only for the current turn)
  • If the primary model is still unavailable, auto-fallback will re-select an alternative on the next turn
  • User-sourced overrides (modelOverrideSource: "user" or undefined) are completely unaffected
  • This is the single entry point for reading stored overrides — all channels (Discord, Telegram, etc.) go through it

Impact

  • 1 file changed, 6 insertions(+), 1 deletion(-)
  • Only affects sessions with stale "auto" source overrides after a crash
  • No changes to normal auto-fallback or user override behavior

Changed files

  • src/auto-reply/reply/stored-model-override.ts (modified, +6/-1)

PR #68798: fix: prevent auto-fallback model from persisting into session state

Description (problem / solution / changelog)

Problem

When the primary model is unavailable (rate limit, overloaded, etc.) and a fallback model is selected automatically, the fallback model override (modelOverrideSource: 'auto') is persisted to session state via persistFallbackCandidateSelection() but never rolled back after a successful run. This causes subsequent turns to start from the fallback model instead of the configured primary model.

Root Cause

In agent-runner-execution.ts, persistFallbackCandidateSelection() writes modelOverride, providerOverride, and modelOverrideSource: 'auto' to both in-memory and on-disk session state. The returned rollback function is only called on error paths. On success, the auto-fallback override persists indefinitely.

Fix

  1. Post-run rollback (agent-runner-execution.ts): After a successful run where fallback was used, clear the auto-fallback model override (providerOverride, modelOverride, modelOverrideSource) from both in-memory and persisted session state. User-initiated model changes (modelOverrideSource: 'user', set via /model command) are preserved.

  2. Defense-in-depth (stored-model-override.ts): resolveStoredModelOverride() now skips overrides with modelOverrideSource: 'auto' when resolving the stored model for the next turn. This prevents stale auto-fallback state from affecting model selection even if the post-run rollback fails.

Tests

  • ✅ Auto-fallback model override is rolled back after a successful run
  • ✅ User-initiated model override (/model) is preserved after a fallback run
  • resolveStoredModelOverride() skips auto-fallback overrides (direct and parent session)
  • ✅ All 33 existing tests in agent-runner-execution.test.ts pass (1 updated to match new behavior)

Closes #68706

Changed files

  • src/auto-reply/reply/agent-runner-execution.test.ts (modified, +135/-3)
  • src/auto-reply/reply/agent-runner-execution.ts (modified, +41/-0)
  • src/auto-reply/reply/stored-model-override.test.ts (added, +103/-0)
  • src/auto-reply/reply/stored-model-override.ts (modified, +14/-5)
RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

Automatic fallback model selection appears to persist into session state and can affect later turns as if it were a durable session model override.

I observed a Telegram session that seemed to switch away from the configured primary model and continue from a fallback model on later messages.

Steps to reproduce

I do not have a minimal clean repro yet, but the observed flow appears to be:

  1. Run a session with a configured primary model and fallbacks.
  2. Trigger a failure or fallback transition during a turn.
  3. OpenClaw persists fallback selection into session state.
  4. If rollback or cleanup does not complete, a later turn may start from the persisted fallback instead of the configured default.

Expected behavior

Automatic fallback should be transient for the current run only, unless the user explicitly changes the session model. Later turns should return to the configured default model, not reuse a stale auto-fallback override.

Actual behavior

A later turn appeared to start from the fallback model again, consistent with fallback state being reused from session storage.

Logs / evidence

During the incident I also saw: Invalid schema for function 'crypto__get_instruments': In context=(), object schema missing properties. That may explain the failed turn itself, but there also appears to be a separate sticky-fallback issue in session state handling.

Relevant installed runtime code paths:

• agent-runner.runtime-CH0aH7T6.js • persistFallbackCandidateSelection(...) • applyFallbackCandidateSelectionToEntry(...) • get-reply-DBfoq_6j.js • resolveStoredModelOverride(...) • stored-model-override-DN6f_u7P.js

The fallback state appears to be written into durable session fields such as:

• providerOverride • modelOverride • modelOverrideSource: "auto"

If rollback does not complete, later turns may read that override back from session state.

OpenClaw version

2026.4.14

Operating system

Ubuntu 24.04

Install method

npm

Model

Primary model: openai-codex/gpt-5.4 || Fallbacks: opencode-go/glm-5.1, opencode-go/minimax-m2.7

Provider / routing chain

Telegram -> OpenClaw gateway -> primary route openai-codex/gpt-5.4 -> fallback candidates opencode-go/glm-5.1 and opencode-go/minimax-m2.7

Impact and severity

Affected users/systems/channels: OpenClaw users running Telegram sessions with model fallbacks enabled, specifically observed on a Telegram direct-message session on a Linux-hosted OpenClaw gateway.

Severity: blocks workflow. The observed incident caused a Telegram DM to get stuck on infinite typing and the turn failed.

Frequency: intermittent / edge case. It was observed in a real incident, but I do not have evidence that it happens on every fallback.

Consequence: failed replies, stuck Telegram interaction, and possible unintended routing of later turns through a fallback model instead of the configured default, which can make behavior harder to predict and debug.

Additional information

It may be safer to keep auto fallback state in memory only, or store it separately from durable session override fields. Another option would be to ignore stale persisted overrides where modelOverrideSource === "auto" on later turns.

extent analysis

TL;DR

The issue can be fixed by ensuring that automatic fallback model selection does not persist into session state, potentially by storing auto-fallback state in memory only or ignoring stale persisted overrides.

Guidance

  • Review the persistFallbackCandidateSelection function in agent-runner.runtime-CH0aH7T6.js to ensure it does not write fallback state to durable session fields like providerOverride, modelOverride, and modelOverrideSource.
  • Consider modifying resolveStoredModelOverride in get-reply-DBfoq_6j.js to ignore stale persisted overrides where modelOverrideSource equals "auto" on later turns.
  • Investigate storing auto-fallback state separately from durable session override fields to prevent unintended reuse of fallback models.
  • Verify that the rollback or cleanup process completes successfully to prevent fallback state from being persisted.

Example

No code snippet is provided due to the complexity of the issue and the need for a thorough review of the relevant code paths.

Notes

The issue appears to be intermittent and may not occur on every fallback. However, it can cause significant problems, such as stuck Telegram interactions and unintended routing of later turns through a fallback model. The provided logs and evidence suggest a separate issue with the crypto__get_instruments function, but this does not seem to be directly related to the sticky-fallback problem.

Recommendation

Apply a workaround by modifying the resolveStoredModelOverride function to ignore stale persisted overrides where modelOverrideSource equals "auto" on later turns, as this seems to be the most straightforward way to address the issue without introducing significant changes to the existing codebase.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Automatic fallback should be transient for the current run only, unless the user explicitly changes the session model. Later turns should return to the configured default model, not reuse a stale auto-fallback override.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Fallback model can stick to session after auto fallback, causing later runs to start from persisted override [2 pull requests, 1 comments, 1 participants]