openclaw - ✅(Solved) Fix Fallback model persistence overrides channel-level model overrides permanently [3 pull requests, 1 comments, 2 participants]

chakumon · 2026-04-09T11:20:48Z

[openclaw] PR 63844: fix: don't persist fallback model to session when channel override is active - Repository: openclaw/openclaw - Author: zozo123 - State: cl… # PR #63844: fix: don't persist fallback model to session when channel override is active - Repository: openclaw/openclaw - Author: zozo123 - State: closed | merged: False - Link: https://github.com/openclaw/openclaw/pull/63844 ## Description (problem / solution / changelog) ## Summary When a channel-level model override (channels.modelByChannel) is active, the fallback model was being incorrectly persisted to the session. This fix prevents that. Closes #63712 ## Testing - Relevant tests pass --- > This PR was developed with AI assistance (Claude). All code has been reviewed and tested. > Built with [islo.dev](https://islo.dev) ## Changed files - `src/auto-reply/reply/agent-runner-execution.test.ts` (modified, +63/-0) - `src/auto-reply/reply/agent-runner-execution.ts` (modified, +12/-1) - `src/auto-reply/reply/agent-runner.ts` (modified, +3/-0) - `src/auto-reply/reply/get-reply-run.ts` (modified, +3/-0) - `src/auto-reply/reply/get-reply.ts` (modified, +7/-1) --- # PR #63895: fix: don't persist fallback model to session when channel override is active - Repository: openclaw/openclaw - Author: zozo123 - State: closed | merged: False - Link: https://github.com/openclaw/openclaw/pull/63895 ## Description (problem / solution / changelog) ## Summary When a channel-level model override is active and the primary model fails, the fallback model selection was persisted to the session entry (`modelOverride`/`providerOverride`). This permanently shadowed the channel override on subsequent turns because channel overrides are gated by `!hasSessionModelOverride`. Closes #63712 ## Changes - Added `hasChannelModelOverride` parameter through the reply chain (`get-reply.ts` → `get-reply-run.ts` → `agent-runner.ts` → `agent-runner-execution.ts`) - Skip fallback persistence when `hasChannelModelOverride` is true - Added regression test verifying fallback doesn't pin to session under channel override ## Testing - New test: `does not persist fallback selection to session when channel model override is active` --- > This PR was developed with AI assistance (Claude). Built with [islo.dev](https://islo.dev) ## Changed files - `src/auto-reply/reply/agent-runner-execution.test.ts` (modified, +55/-0) - `src/auto-reply/reply/agent-runner-execution.ts` (modified, +12/-1) - `src/auto-reply/reply/agent-runner.ts` (modified, +3/-0) - `src/auto-reply/reply/get-reply-run.ts` (modified, +3/-0) - `src/auto-reply/reply/get-reply.ts` (modified, +7/-1) --- # PR #64471: fix: prevent fallback persistence from clobbering user /models picks - Repository: openclaw/openclaw - Author: hoyyeva - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/64471 ## Description (problem / solution / changelog) ## Summary - **Problem**: Selecting a model via `/models` shows "Model changed to X" but the next message reverts to a previous model. - **Why it matters**: `/models` is the primary way users switch models per-session. When it silently fails, users lose trust in model selection and can get permanently stuck on a fallback model they never chose. - **What changed**: (1) Skip fallback persistence when the active `modelOverride` was set by the user (`modelOverrideSource === "user"`, plus legacy entries with no source field). (2) In the Telegram `/models` callback, use the fresh `runtimeCfg` instead of the stale startup `cfg` snapshot for store path and default-model resolution. - **What did NOT change**: No changes to how the fallback chain itself is resolved, no changes to how `/models` shows the picker, no schema/config changes. Channels other than Telegram are unaffected by the second fix (they don't have inline-button model pickers). ## Change Type (select all) - [x] Bug fix ## Scope (select all touched areas) - [x] Gateway / orchestration - [x] Integrations ## Linked Issue/PR - Closes #63611 - Related #63712, #63900 - [x] This PR fixes a bug or regression ## Root Cause (if applicable) - **Root cause**: Two independent issues, both leading to the same user-visible symptom. 1. `persistFallbackCandidateSelection` in `agent-runner-execution.ts` unconditionally overwrites `modelOverride`/`providerOverride` with the fallback model whenever a fallback succeeds. The session entry already carries `modelOverrideSource: "user" | "auto"` to distinguish user picks from fallback-set values, but the persistence path never checked it. So one transient failure of a user-selected model permanently pinned the session to the fallback. 2. The Telegram `/models` callback handler in `bot-handlers.runtime.ts` used the outer `cfg` (a snapshot captured at handler-registration time, closed over by `registerTelegramHandlers`) for `resolveStorePath` and `resolveDefaultModelForAgent`, while the inbound message pipeline always uses fresh config via `loadConfig()`. After a config reload, the override could

openclaw2026-04-09 11:20:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#63712•Fetched 2026-04-10 03:42:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

chakumon

Participants

Artyomkun

chakumon

Timeline (top)

cross-referenced ×2referenced ×2commented ×1mentioned ×1

Root Cause

If the fallback candidate (e.g., openai/gpt-5.4) succeeds, the session-level pin is never rolled back. From that point on, every subsequent message in that session uses the fallback model because channel overrides are gated by:

Fix Action

Workaround

Manually remove modelOverride, providerOverride, and related fields from all session entries in sessions.json across all agents, then restart the gateway. This has been needed 3+ times in 3 days.

PR fix notes

PR #63844: fix: don't persist fallback model to session when channel override is active

Repository: openclaw/openclaw
Author: zozo123
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/63844

Description (problem / solution / changelog)

Summary

When a channel-level model override (channels.modelByChannel) is active, the fallback model was being incorrectly persisted to the session. This fix prevents that.

Closes #63712

Testing

Relevant tests pass

This PR was developed with AI assistance (Claude). All code has been reviewed and tested. Built with islo.dev

Changed files

src/auto-reply/reply/agent-runner-execution.test.ts (modified, +63/-0)
src/auto-reply/reply/agent-runner-execution.ts (modified, +12/-1)
src/auto-reply/reply/agent-runner.ts (modified, +3/-0)
src/auto-reply/reply/get-reply-run.ts (modified, +3/-0)
src/auto-reply/reply/get-reply.ts (modified, +7/-1)

PR #63895: fix: don't persist fallback model to session when channel override is active

Repository: openclaw/openclaw
Author: zozo123
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/63895

Description (problem / solution / changelog)

Summary

When a channel-level model override is active and the primary model fails, the fallback model selection was persisted to the session entry (modelOverride/providerOverride). This permanently shadowed the channel override on subsequent turns because channel overrides are gated by !hasSessionModelOverride.

Closes #63712

Changes

Added hasChannelModelOverride parameter through the reply chain (get-reply.ts → get-reply-run.ts → agent-runner.ts → agent-runner-execution.ts)
Skip fallback persistence when hasChannelModelOverride is true
Added regression test verifying fallback doesn't pin to session under channel override

Testing

New test: does not persist fallback selection to session when channel model override is active

This PR was developed with AI assistance (Claude). Built with islo.dev

Changed files

src/auto-reply/reply/agent-runner-execution.test.ts (modified, +55/-0)
src/auto-reply/reply/agent-runner-execution.ts (modified, +12/-1)
src/auto-reply/reply/agent-runner.ts (modified, +3/-0)
src/auto-reply/reply/get-reply-run.ts (modified, +3/-0)
src/auto-reply/reply/get-reply.ts (modified, +7/-1)

PR #64471: fix: prevent fallback persistence from clobbering user /models picks

Repository: openclaw/openclaw
Author: hoyyeva
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/64471

Description (problem / solution / changelog)

Summary

Problem: Selecting a model via /models shows "Model changed to X" but the next message reverts to a previous model.
Why it matters: /models is the primary way users switch models per-session. When it silently fails, users lose trust in model selection and can get permanently stuck on a fallback model they never chose.
What changed: (1) Skip fallback persistence when the active modelOverride was set by the user (modelOverrideSource === "user", plus legacy entries with no source field). (2) In the Telegram /models callback, use the fresh runtimeCfg instead of the stale startup cfg snapshot for store path and default-model resolution.
What did NOT change: No changes to how the fallback chain itself is resolved, no changes to how /models shows the picker, no schema/config changes. Channels other than Telegram are unaffected by the second fix (they don't have inline-button model pickers).

Change Type (select all)

Bug fix

Scope (select all touched areas)

Gateway / orchestration
Integrations

Linked Issue/PR

Closes #63611
Related #63712, #63900
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: Two independent issues, both leading to the same user-visible symptom.
1. persistFallbackCandidateSelection in agent-runner-execution.ts unconditionally overwrites modelOverride/providerOverride with the fallback model whenever a fallback succeeds. The session entry already carries modelOverrideSource: "user" | "auto" to distinguish user picks from fallback-set values, but the persistence path never checked it. So one transient failure of a user-selected model permanently pinned the session to the fallback.
2. The Telegram /models callback handler in bot-handlers.runtime.ts used the outer cfg (a snapshot captured at handler-registration time, closed over by registerTelegramHandlers) for resolveStorePath and resolveDefaultModelForAgent, while the inbound message pipeline always uses fresh config via loadConfig(). After a config reload, the override could be written to the wrong store file, or the selected model could be misidentified as "the default" — which deletes the override instead of saving it.
Missing detection / guardrail: No regression test exercising "user picks model X via /models → X fails → fallback Y succeeds → next message still targets X." Existing tests covered the fallback persistence path but not the user-override interaction.
Contributing context: modelOverrideSource was added later in the codebase. session-reset-service.ts already treats modelOverride set with modelOverrideSource === undefined as legacy user state for backward compat — that pattern wasn't applied here.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
Target tests added:
- src/auto-reply/reply/agent-runner-execution.test.ts — "does not persist fallback selection when modelOverrideSource is user" + "does not persist fallback selection for legacy user overrides without modelOverrideSource"
- extensions/telegram/src/bot.test.ts — "persists non-default model override using fresh config, not stale startup snapshot"
Scenario the tests lock in:
1. With modelOverrideSource: "user" (and the legacy undefined case), a successful fallback to a different model must NOT mutate modelOverride/providerOverride.
2. With a Telegram bot started with one default model, after loadConfig returns a different default, the callback must still persist the user's selection correctly.
Why this is the smallest reliable guardrail: Both bugs are pure state-mutation issues at well-defined seams (the persistence helper and the callback handler). Unit tests against those seams catch the regression without needing a running gateway or live model failures.

User-visible / Behavior Changes

/models selections now survive transient primary-model failures. When the user-selected model rate-limits and a fallback model serves a single response, the next message attempts the user-selected model again (instead of being permanently routed to the fallback).
No config changes, no command changes, no UI changes.

Diagram

Before (fallback path):
  user picks X via /models   →  modelOverride = X, source = "user"
  send message               →  X fails  →  fallback Y succeeds
  persist fallback           →  modelOverride = Y, source = "auto"  ← clobbered
  next message               →  uses Y (forever)

After:
  user picks X via /models   →  modelOverride = X, source = "user"
  send message               →  X fails  →  fallback Y succeeds
  persist fallback           →  source === "user"  →  skip
  next message               →  uses X (the user's choice)

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Human Verification (required)

Verified scenarios:
- Full unit test suites for agent-runner-execution, model-selection, model-overrides, and the Telegram bot pass locally (114/114 across the related files, plus the 2 new regression tests).
- Manually traced the flow from /models callback → session store write → next-message session load → createModelSelectionState → fallback persistence to confirm the override survives.
Edge cases checked:
- Legacy session entries with modelOverride set but modelOverrideSource === undefined (backward compat — now covered by an explicit test).
- Cross-provider user selection (e.g., user picks openai/... while config primary is anthropic/...): resolveAgentModelFallbackCandidates already returns no fallbacks in that case, so this PR has no effect there. The user instead sees the existing FallbackSummaryError with cooldown details, which is the correct UX.
What I did NOT verify: End-to-end behavior against a live model that actually rate-limits (no live LLM testing in this PR — covered by mocked unit tests against the same seam).

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No — legacy session entries (with modelOverride set but no modelOverrideSource) are explicitly handled.

Risks and Mitigations

Risk: A user whose selected model is in long cooldown will pay the failed-primary cost on every message (since fallback is no longer persisted), making responses slower until they switch models manually.
- Mitigation: This is strictly better than the previous behavior (silent permanent override). When the entire fallback chain is exhausted, the existing FallbackSummaryError already surfaces a cooldown message to the user, who can then re-select via /models. No new mitigation added in this PR to keep scope tight.

Changed files

CHANGELOG.md (modified, +1/-0)
extensions/telegram/src/bot-handlers.runtime.ts (modified, +9/-4)
extensions/telegram/src/bot.test.ts (modified, +87/-0)
src/auto-reply/reply/agent-runner-execution.test.ts (modified, +134/-0)
src/auto-reply/reply/agent-runner-execution.ts (modified, +20/-0)

Code Example

if (!hasSessionModelOverride && channelModelOverride)

RAW_BUFFERClick to expand / collapse

Bug Description

When a primary model (e.g., anthropic/claude-opus-4-6) configured via channels.modelByChannel experiences a transient failure (rate limit, timeout, cooldown), the fallback system in persistFallbackCandidateSelection() writes modelOverride and providerOverride to the session entry in sessions.json before the fallback candidate runs.

if (!hasSessionModelOverride && channelModelOverride)

This means a single transient Opus failure permanently pins the session to GPT-5.4, even though the channel override explicitly requests Opus.

Steps to Reproduce

Configure channels.modelByChannel.discord.<channel_id> to use anthropic/claude-opus-4-6
Configure fallback chain: openai/gpt-5.4 → minimax/MiniMax-M2.7
Start a new session (/new) in the overridden channel
Wait for (or simulate) a transient Opus failure (rate limit, 529, timeout)
The fallback candidate (GPT-5.4) succeeds → session is permanently pinned

Expected Behavior

Channel override should be treated as the "intended" model
After a successful fallback run, the session-level pin should be cleared so the next message re-evaluates against the channel override
Alternatively, fallback persistence should not override channel-level config at all

Actual Behavior

Session is permanently pinned to the fallback model
Channel override is never re-evaluated
User must manually run /models or clear modelOverride/providerOverride from sessions.json

Affected Code Paths

agent-runner.runtime → persistFallbackCandidateSelection() (writes session override before fallback runs)
model-overrides → applyModelOverrideToSessionEntry()
reply module → channel override gating: if (!hasSessionModelOverride && channelModelOverride)
runWithModelFallback() → orchestrates fallback attempts

Environment

OpenClaw: 2026.4.9 (commit 0512059)
OS: Debian 13, Linux 6.12.74
Node: v24.14.1
Channels: Discord
Models: anthropic/claude-opus-4-6 (primary), openai-codex/gpt-5.4 (fallback)

Workaround

Suggested Fix

Option A: Don't persist fallback selections when a channel override is active — treat the channel override as authoritative.

Option B: Clear the fallback pin after a configurable TTL (e.g., one turn or N minutes) so the next message re-evaluates against the channel override.

Option C: Add a fallbackPersistence: false config option to disable this behavior entirely for users who rely on channel overrides.

extent analysis

TL;DR

The most likely fix is to modify the persistFallbackCandidateSelection() function to not write modelOverride and providerOverride to the session entry when a channel override is active.

Guidance

Review the persistFallbackCandidateSelection() function to ensure it checks for active channel overrides before writing to the session entry.
Consider implementing a TTL (time-to-live) for fallback pins, allowing the next message to re-evaluate against the channel override after a set period.
Evaluate the applyModelOverrideToSessionEntry() function to ensure it correctly handles channel overrides and fallbacks.
Test the runWithModelFallback() function to verify it correctly orchestrates fallback attempts and clears the fallback pin when necessary.

Example

// Example of modified persistFallbackCandidateSelection() function
if (!channelModelOverride) {
  // Write modelOverride and providerOverride to session entry
  sessionEntry.modelOverride = fallbackModel;
  sessionEntry.providerOverride = fallbackProvider;
}

Notes

The provided workaround of manually removing modelOverride and providerOverride fields from session entries and restarting the gateway is a temporary solution. A more permanent fix is needed to prevent this issue from recurring.

Recommendation

Apply workaround: Manually remove modelOverride, providerOverride, and related fields from all session entries in sessions.json across all agents, then restart the gateway, until a permanent fix can be implemented. This is a temporary solution to mitigate the issue until the underlying code can be modified to correctly handle channel overrides and fallbacks.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#GPU compatibility #latency issue #model loading #dependency error #configuration error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Fallback model persistence overrides channel-level model overrides permanently [3 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

PR fix notes

PR #63844: fix: don't persist fallback model to session when channel override is active

Description (problem / solution / changelog)

Summary

Testing

Changed files

PR #63895: fix: don't persist fallback model to session when channel override is active

Description (problem / solution / changelog)

Summary

Changes

Testing

Changed files

PR #64471: fix: prevent fallback persistence from clobbering user /models picks

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram

Changed files

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Code Paths

Environment

Workaround

Suggested Fix

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING