openclaw - 💡(How to fix) Fix Persisted auto-fallback model overrides only recover on heartbeats, not on regular turns

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Persisted modelOverrideSource: "auto" session entries (created when the runtime falls back to a non-primary model during a provider failure) are sticky on non-heartbeat turns and never expire. As a result, a single transient provider outage can permanently pin a long-lived Slack/Telegram/DM session to a fallback model — billing the wrong provider indefinitely until manually cleared.

A partial fix exists for heartbeat turns via isStaleHeartbeatAutoFallbackOverride (src/auto-reply/reply/stored-model-override.ts), but the function is gated by params.isHeartbeat !== true and so does not run for regular inbound turns.

Root Cause

In src/auto-reply/reply/stored-model-override.ts:

function isStaleHeartbeatAutoFallbackOverride(params) {
    if (params.isHeartbeat !== true || params.hasResolvedHeartbeatModelOverride === true) return false;
    // ... rest of the staleness check
}

The params.isHeartbeat !== true clause limits the staleness bypass to heartbeats only. The same logic should apply to every turn.

Caller (get-reply.ts, ~line 4216) already passes the same parameters regardless of whether the turn is a heartbeat — it just gates the bypass on this function's return value:

const staleHeartbeatAutoFallbackOverride = isStaleHeartbeatAutoFallbackOverride({ ... });
if (storedModelOverride?.model && !hasResolvedHeartbeatModelOverride && !staleHeartbeatAutoFallbackOverride) {
    provider = storedModelOverride.provider ?? defaultProvider;
    model = storedModelOverride.model;
}

So extending the bypass to non-heartbeat turns just requires removing the params.isHeartbeat !== true gate (or renaming the function to drop "Heartbeat" from the name and removing the gate).

Fix Action

Fix / Workaround

Workaround we shipped

Patched the dist file directly on all four agents to remove the heartbeat gate. Sentinel-guarded, idempotent re-applier runs nightly via cron in case of upgrades. Happy to upstream the fix as a PR if useful.

Code Example

{
     "providerOverride": "openai",
     "modelOverride": "gpt-5.3-codex",
     "modelOverrideSource": "auto",
     "modelOverrideFallbackOriginProvider": "anthropic",
     "modelOverrideFallbackOriginModel": "claude-opus-4-7"
   }

---

function isStaleHeartbeatAutoFallbackOverride(params) {
    if (params.isHeartbeat !== true || params.hasResolvedHeartbeatModelOverride === true) return false;
    // ... rest of the staleness check
}

---

const staleHeartbeatAutoFallbackOverride = isStaleHeartbeatAutoFallbackOverride({ ... });
if (storedModelOverride?.model && !hasResolvedHeartbeatModelOverride && !staleHeartbeatAutoFallbackOverride) {
    provider = storedModelOverride.provider ?? defaultProvider;
    model = storedModelOverride.model;
}

---

function isStaleHeartbeatAutoFallbackOverride(params) {
-    if (params.isHeartbeat !== true || params.hasResolvedHeartbeatModelOverride === true) return false;
+    if (params.hasResolvedHeartbeatModelOverride === true) return false;
     if (params.storedOverride?.source !== "session") return false;
     // ...
 }
RAW_BUFFERClick to expand / collapse

Summary

Persisted modelOverrideSource: "auto" session entries (created when the runtime falls back to a non-primary model during a provider failure) are sticky on non-heartbeat turns and never expire. As a result, a single transient provider outage can permanently pin a long-lived Slack/Telegram/DM session to a fallback model — billing the wrong provider indefinitely until manually cleared.

A partial fix exists for heartbeat turns via isStaleHeartbeatAutoFallbackOverride (src/auto-reply/reply/stored-model-override.ts), but the function is gated by params.isHeartbeat !== true and so does not run for regular inbound turns.

Reproduce

  1. Run an agent with a configured fallback chain like anthropic/claude-opus-4-7 → openai/gpt-5.3-codex → google/gemini-2.5-pro.
  2. Wait for or simulate a primary provider failure that exhausts retries (e.g. Anthropic Overloaded). The runtime persists to the active session entry:
    {
      "providerOverride": "openai",
      "modelOverride": "gpt-5.3-codex",
      "modelOverrideSource": "auto",
      "modelOverrideFallbackOriginProvider": "anthropic",
      "modelOverrideFallbackOriginModel": "claude-opus-4-7"
    }
  3. Wait for the primary to recover.
  4. Send a regular inbound message to the same session (not a heartbeat) — e.g. a Slack DM or Telegram message.

Expected: the runtime detects the persisted auto-override no longer matches the configured primary and runs the turn on the primary (matching the heartbeat behavior).

Actual: the override is honored and the turn runs on the fallback model. Repeat indefinitely.

Root cause

In src/auto-reply/reply/stored-model-override.ts:

function isStaleHeartbeatAutoFallbackOverride(params) {
    if (params.isHeartbeat !== true || params.hasResolvedHeartbeatModelOverride === true) return false;
    // ... rest of the staleness check
}

The params.isHeartbeat !== true clause limits the staleness bypass to heartbeats only. The same logic should apply to every turn.

Caller (get-reply.ts, ~line 4216) already passes the same parameters regardless of whether the turn is a heartbeat — it just gates the bypass on this function's return value:

const staleHeartbeatAutoFallbackOverride = isStaleHeartbeatAutoFallbackOverride({ ... });
if (storedModelOverride?.model && !hasResolvedHeartbeatModelOverride && !staleHeartbeatAutoFallbackOverride) {
    provider = storedModelOverride.provider ?? defaultProvider;
    model = storedModelOverride.model;
}

So extending the bypass to non-heartbeat turns just requires removing the params.isHeartbeat !== true gate (or renaming the function to drop "Heartbeat" from the name and removing the gate).

Proposed fix

 function isStaleHeartbeatAutoFallbackOverride(params) {
-    if (params.isHeartbeat !== true || params.hasResolvedHeartbeatModelOverride === true) return false;
+    if (params.hasResolvedHeartbeatModelOverride === true) return false;
     if (params.storedOverride?.source !== "session") return false;
     // ...
 }

Naming-wise, dropping "Heartbeat" from the function name would be clearer (isStaleAutoFallbackOverride), but that's separate from the behavior fix.

Real-world impact

Across a four-agent fleet running 2026.5.12, we hit this twice in two weeks:

  • A agent:main:main session pinned to openai/gpt-5.3-codex for 15 days after a single Anthropic Opus overload — ~$30/day of misrouted spend until detected and manually cleared.
  • A Slack #amw_quoting channel session pinned to codex after another transient Anthropic outage. UI continued to show Opus as the configured default; only the specific channel was stuck.

The heartbeat fix from #74284 caught one path but left the more common one (channel turns) exposed.

Workaround we shipped

Patched the dist file directly on all four agents to remove the heartbeat gate. Sentinel-guarded, idempotent re-applier runs nightly via cron in case of upgrades. Happy to upstream the fix as a PR if useful.

Environment

  • OpenClaw 2026.5.12 (f066dd2)
  • Node v22.22.0 / v22.22.1
  • Linux Ubuntu 22.04/24.04, four-agent fleet
  • Anthropic primary, OpenAI Codex + Google + sonnet fallbacks

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING