openclaw - ✅(Solved) Fix Fallback model persistence overrides channel-level model overrides permanently [3 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63712Fetched 2026-04-10 03:42:07
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×2referenced ×2commented ×1mentioned ×1

Root Cause

If the fallback candidate (e.g., openai/gpt-5.4) succeeds, the session-level pin is never rolled back. From that point on, every subsequent message in that session uses the fallback model because channel overrides are gated by:

Fix Action

Workaround

Manually remove modelOverride, providerOverride, and related fields from all session entries in sessions.json across all agents, then restart the gateway. This has been needed 3+ times in 3 days.

PR fix notes

PR #63844: fix: don't persist fallback model to session when channel override is active

Description (problem / solution / changelog)

Summary

When a channel-level model override (channels.modelByChannel) is active, the fallback model was being incorrectly persisted to the session. This fix prevents that.

Closes #63712

Testing

  • Relevant tests pass

This PR was developed with AI assistance (Claude). All code has been reviewed and tested. Built with islo.dev

Changed files

  • src/auto-reply/reply/agent-runner-execution.test.ts (modified, +63/-0)
  • src/auto-reply/reply/agent-runner-execution.ts (modified, +12/-1)
  • src/auto-reply/reply/agent-runner.ts (modified, +3/-0)
  • src/auto-reply/reply/get-reply-run.ts (modified, +3/-0)
  • src/auto-reply/reply/get-reply.ts (modified, +7/-1)

PR #63895: fix: don't persist fallback model to session when channel override is active

Description (problem / solution / changelog)

Summary

When a channel-level model override is active and the primary model fails, the fallback model selection was persisted to the session entry (modelOverride/providerOverride). This permanently shadowed the channel override on subsequent turns because channel overrides are gated by !hasSessionModelOverride.

Closes #63712

Changes

  • Added hasChannelModelOverride parameter through the reply chain (get-reply.tsget-reply-run.tsagent-runner.tsagent-runner-execution.ts)
  • Skip fallback persistence when hasChannelModelOverride is true
  • Added regression test verifying fallback doesn't pin to session under channel override

Testing

  • New test: does not persist fallback selection to session when channel model override is active

This PR was developed with AI assistance (Claude). Built with islo.dev

Changed files

  • src/auto-reply/reply/agent-runner-execution.test.ts (modified, +55/-0)
  • src/auto-reply/reply/agent-runner-execution.ts (modified, +12/-1)
  • src/auto-reply/reply/agent-runner.ts (modified, +3/-0)
  • src/auto-reply/reply/get-reply-run.ts (modified, +3/-0)
  • src/auto-reply/reply/get-reply.ts (modified, +7/-1)

PR #64471: fix: prevent fallback persistence from clobbering user /models picks

Description (problem / solution / changelog)

Summary

  • Problem: Selecting a model via /models shows "Model changed to X" but the next message reverts to a previous model.
  • Why it matters: /models is the primary way users switch models per-session. When it silently fails, users lose trust in model selection and can get permanently stuck on a fallback model they never chose.
  • What changed: (1) Skip fallback persistence when the active modelOverride was set by the user (modelOverrideSource === "user", plus legacy entries with no source field). (2) In the Telegram /models callback, use the fresh runtimeCfg instead of the stale startup cfg snapshot for store path and default-model resolution.
  • What did NOT change: No changes to how the fallback chain itself is resolved, no changes to how /models shows the picker, no schema/config changes. Channels other than Telegram are unaffected by the second fix (they don't have inline-button model pickers).

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Gateway / orchestration
  • Integrations

Linked Issue/PR

  • Closes #63611
  • Related #63712, #63900
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: Two independent issues, both leading to the same user-visible symptom.
    1. persistFallbackCandidateSelection in agent-runner-execution.ts unconditionally overwrites modelOverride/providerOverride with the fallback model whenever a fallback succeeds. The session entry already carries modelOverrideSource: "user" | "auto" to distinguish user picks from fallback-set values, but the persistence path never checked it. So one transient failure of a user-selected model permanently pinned the session to the fallback.
    2. The Telegram /models callback handler in bot-handlers.runtime.ts used the outer cfg (a snapshot captured at handler-registration time, closed over by registerTelegramHandlers) for resolveStorePath and resolveDefaultModelForAgent, while the inbound message pipeline always uses fresh config via loadConfig(). After a config reload, the override could be written to the wrong store file, or the selected model could be misidentified as "the default" — which deletes the override instead of saving it.
  • Missing detection / guardrail: No regression test exercising "user picks model X via /models → X fails → fallback Y succeeds → next message still targets X." Existing tests covered the fallback persistence path but not the user-override interaction.
  • Contributing context: modelOverrideSource was added later in the codebase. session-reset-service.ts already treats modelOverride set with modelOverrideSource === undefined as legacy user state for backward compat — that pattern wasn't applied here.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
  • Target tests added:
    • src/auto-reply/reply/agent-runner-execution.test.ts — "does not persist fallback selection when modelOverrideSource is user" + "does not persist fallback selection for legacy user overrides without modelOverrideSource"
    • extensions/telegram/src/bot.test.ts — "persists non-default model override using fresh config, not stale startup snapshot"
  • Scenario the tests lock in:
    1. With modelOverrideSource: "user" (and the legacy undefined case), a successful fallback to a different model must NOT mutate modelOverride/providerOverride.
    2. With a Telegram bot started with one default model, after loadConfig returns a different default, the callback must still persist the user's selection correctly.
  • Why this is the smallest reliable guardrail: Both bugs are pure state-mutation issues at well-defined seams (the persistence helper and the callback handler). Unit tests against those seams catch the regression without needing a running gateway or live model failures.

User-visible / Behavior Changes

  • /models selections now survive transient primary-model failures. When the user-selected model rate-limits and a fallback model serves a single response, the next message attempts the user-selected model again (instead of being permanently routed to the fallback).
  • No config changes, no command changes, no UI changes.

Diagram

Before (fallback path):
  user picks X via /models   →  modelOverride = X, source = "user"
  send message               →  X fails  →  fallback Y succeeds
  persist fallback           →  modelOverride = Y, source = "auto"  ← clobbered
  next message               →  uses Y (forever)

After:
  user picks X via /models   →  modelOverride = X, source = "user"
  send message               →  X fails  →  fallback Y succeeds
  persist fallback           →  source === "user"  →  skip
  next message               →  uses X (the user's choice)

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Human Verification (required)

  • Verified scenarios:
    • Full unit test suites for agent-runner-execution, model-selection, model-overrides, and the Telegram bot pass locally (114/114 across the related files, plus the 2 new regression tests).
    • Manually traced the flow from /models callback → session store write → next-message session load → createModelSelectionState → fallback persistence to confirm the override survives.
  • Edge cases checked:
    • Legacy session entries with modelOverride set but modelOverrideSource === undefined (backward compat — now covered by an explicit test).
    • Cross-provider user selection (e.g., user picks openai/... while config primary is anthropic/...): resolveAgentModelFallbackCandidates already returns no fallbacks in that case, so this PR has no effect there. The user instead sees the existing FallbackSummaryError with cooldown details, which is the correct UX.
  • What I did NOT verify: End-to-end behavior against a live model that actually rate-limits (no live LLM testing in this PR — covered by mocked unit tests against the same seam).

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No — legacy session entries (with modelOverride set but no modelOverrideSource) are explicitly handled.

Risks and Mitigations

  • Risk: A user whose selected model is in long cooldown will pay the failed-primary cost on every message (since fallback is no longer persisted), making responses slower until they switch models manually.
    • Mitigation: This is strictly better than the previous behavior (silent permanent override). When the entire fallback chain is exhausted, the existing FallbackSummaryError already surfaces a cooldown message to the user, who can then re-select via /models. No new mitigation added in this PR to keep scope tight.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/telegram/src/bot-handlers.runtime.ts (modified, +9/-4)
  • extensions/telegram/src/bot.test.ts (modified, +87/-0)
  • src/auto-reply/reply/agent-runner-execution.test.ts (modified, +134/-0)
  • src/auto-reply/reply/agent-runner-execution.ts (modified, +20/-0)

Code Example

if (!hasSessionModelOverride && channelModelOverride)
RAW_BUFFERClick to expand / collapse

Bug Description

When a primary model (e.g., anthropic/claude-opus-4-6) configured via channels.modelByChannel experiences a transient failure (rate limit, timeout, cooldown), the fallback system in persistFallbackCandidateSelection() writes modelOverride and providerOverride to the session entry in sessions.json before the fallback candidate runs.

If the fallback candidate (e.g., openai/gpt-5.4) succeeds, the session-level pin is never rolled back. From that point on, every subsequent message in that session uses the fallback model because channel overrides are gated by:

if (!hasSessionModelOverride && channelModelOverride)

This means a single transient Opus failure permanently pins the session to GPT-5.4, even though the channel override explicitly requests Opus.

Steps to Reproduce

  1. Configure channels.modelByChannel.discord.<channel_id> to use anthropic/claude-opus-4-6
  2. Configure fallback chain: openai/gpt-5.4minimax/MiniMax-M2.7
  3. Start a new session (/new) in the overridden channel
  4. Wait for (or simulate) a transient Opus failure (rate limit, 529, timeout)
  5. The fallback candidate (GPT-5.4) succeeds → session is permanently pinned

Expected Behavior

  • Channel override should be treated as the "intended" model
  • After a successful fallback run, the session-level pin should be cleared so the next message re-evaluates against the channel override
  • Alternatively, fallback persistence should not override channel-level config at all

Actual Behavior

  • Session is permanently pinned to the fallback model
  • Channel override is never re-evaluated
  • User must manually run /models or clear modelOverride/providerOverride from sessions.json

Affected Code Paths

  • agent-runner.runtimepersistFallbackCandidateSelection() (writes session override before fallback runs)
  • model-overridesapplyModelOverrideToSessionEntry()
  • reply module → channel override gating: if (!hasSessionModelOverride && channelModelOverride)
  • runWithModelFallback() → orchestrates fallback attempts

Environment

  • OpenClaw: 2026.4.9 (commit 0512059)
  • OS: Debian 13, Linux 6.12.74
  • Node: v24.14.1
  • Channels: Discord
  • Models: anthropic/claude-opus-4-6 (primary), openai-codex/gpt-5.4 (fallback)

Workaround

Manually remove modelOverride, providerOverride, and related fields from all session entries in sessions.json across all agents, then restart the gateway. This has been needed 3+ times in 3 days.

Suggested Fix

Option A: Don't persist fallback selections when a channel override is active — treat the channel override as authoritative.

Option B: Clear the fallback pin after a configurable TTL (e.g., one turn or N minutes) so the next message re-evaluates against the channel override.

Option C: Add a fallbackPersistence: false config option to disable this behavior entirely for users who rely on channel overrides.

extent analysis

TL;DR

The most likely fix is to modify the persistFallbackCandidateSelection() function to not write modelOverride and providerOverride to the session entry when a channel override is active.

Guidance

  • Review the persistFallbackCandidateSelection() function to ensure it checks for active channel overrides before writing to the session entry.
  • Consider implementing a TTL (time-to-live) for fallback pins, allowing the next message to re-evaluate against the channel override after a set period.
  • Evaluate the applyModelOverrideToSessionEntry() function to ensure it correctly handles channel overrides and fallbacks.
  • Test the runWithModelFallback() function to verify it correctly orchestrates fallback attempts and clears the fallback pin when necessary.

Example

// Example of modified persistFallbackCandidateSelection() function
if (!channelModelOverride) {
  // Write modelOverride and providerOverride to session entry
  sessionEntry.modelOverride = fallbackModel;
  sessionEntry.providerOverride = fallbackProvider;
}

Notes

The provided workaround of manually removing modelOverride and providerOverride fields from session entries and restarting the gateway is a temporary solution. A more permanent fix is needed to prevent this issue from recurring.

Recommendation

Apply workaround: Manually remove modelOverride, providerOverride, and related fields from all session entries in sessions.json across all agents, then restart the gateway, until a permanent fix can be implemented. This is a temporary solution to mitigate the issue until the underlying code can be modified to correctly handle channel overrides and fallbacks.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING