openclaw - 💡(How to fix) Fix Runtime model selection sticks on fallback after `reason=abort`; `/new` does not clear active model

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

After a Claude live session closes with reason=abort, the Telegram gateway flips to the configured fallback model (openai-codex/gpt-5.5) and remains there across subsequent turns. A user-issued /new resets the conversation transcript but does not restore the policy primary — the new session continues on the fallback. Active model only returns to primary after manual intervention (e.g. /model claude-cli/claude-opus-4-7).

This violates the implicit contract that primary is the steady-state model and fallback is a temporary recovery state.

Root Cause

  • dist/pi-embedded-CPEBK2iK.js:~2579[timeout-compaction] falls through to failover rotation once MAX_TIMEOUT_COMPACTION_ATTEMPTS is reached; rotation appears to select fallbacks[0] without first re-attempting primary.
  • dist/model-fallback-CFP2h22a.js:~504suspendSession; suspected to be marking the primary in a cooldown state that survives session reset.
  • dist/model-fallback-CFP2h22a.js:~893shouldProbePrimaryDuringCooldown exists, suggesting the auto-revert path is implemented but not firing in this scenario. This is the likely root cause: the probe path isn't being entered after an abort-induced fallback.
  • Banner template: openclaw.json responsePrefix: \"[<Agent>-{provider}/{model}]\" — no drift flag in template variables.

Fix Action

Fix / Workaround

Workarounds for users today

RAW_BUFFERClick to expand / collapse

Summary

After a Claude live session closes with reason=abort, the Telegram gateway flips to the configured fallback model (openai-codex/gpt-5.5) and remains there across subsequent turns. A user-issued /new resets the conversation transcript but does not restore the policy primary — the new session continues on the fallback. Active model only returns to primary after manual intervention (e.g. /model claude-cli/claude-opus-4-7).

This violates the implicit contract that primary is the steady-state model and fallback is a temporary recovery state.

Version

[email protected]

Repro

  1. Agent main configured with policy agent.chat.claude-cli → primary claude-cli/claude-opus-4-7, fallback openai-codex/gpt-5.5.
  2. Abort an active Claude turn (or send SIGINT mid-turn).
  3. Send a new prompt several minutes later.

Observed

  • 19:07:04 UTC — gateway log: [agent/cli-backend] claude live session close: provider=claude-cli model=claude-opus-4-7 reason=abort
  • 19:30:25 UTC — gateway log: [agent/embedded] [timeout-compaction] compaction succeeded for openai-codex/gpt-5.5; retrying prompt (already on fallback by this point — switch happened on the next-turn fallback selection between 19:07 and 19:30)
  • 19:31:13 UTC — turn rendered as [Reghar-openai-codex/gpt-5.5] (Telegram banner)
  • 19:42:44 UTC — user runs /new. New session opens on gpt-5.5/openai-codex (per session reset banner).
  • 19:44 UTC and subsequent turns — continued on fallback until manual /model command.

Expected

  • After the abort cooldown elapses or after /new, the next turn re-attempts the primary model.
  • Either:
    • /new clears the runtime active_model so the next turn re-reads the policy primary; or
    • The fallback marker carries a TTL / success-count and reverts to primary after N successful fallback turns or T minutes.
  • Banner surfaces drift (e.g. [Reghar-... ⚠️ on fallback]) when active != primary so users notice without checking config.

Code surfaces (from dist/ inspection of v2026.5.12)

  • dist/pi-embedded-CPEBK2iK.js:~2579[timeout-compaction] falls through to failover rotation once MAX_TIMEOUT_COMPACTION_ATTEMPTS is reached; rotation appears to select fallbacks[0] without first re-attempting primary.
  • dist/model-fallback-CFP2h22a.js:~504suspendSession; suspected to be marking the primary in a cooldown state that survives session reset.
  • dist/model-fallback-CFP2h22a.js:~893shouldProbePrimaryDuringCooldown exists, suggesting the auto-revert path is implemented but not firing in this scenario. This is the likely root cause: the probe path isn't being entered after an abort-induced fallback.
  • Banner template: openclaw.json responsePrefix: \"[<Agent>-{provider}/{model}]\" — no drift flag in template variables.

Proposed fixes (any subset)

  1. Compaction retry: on falling through to failover rotation, retry primary before any fallback unless primary has a recent hard failure (not just an abort).
  2. /new clears active model marker: session reset hook nulls the cached active model so the next turn re-evaluates policy.
  3. Fallback TTL / revert: stamp the fallback marker with set_at and revert_after (default 15 min OR 2 successful fallback turns). On new turn after revert condition, attempt primary.
  4. Drift banner: extend template to expose {is_fallback} or render ⚠️ on fallback (primary={...}) when active != policy.primary.

Workarounds for users today

  • /model claude-cli/claude-opus-4-7 to force-restore primary.
  • Restart the gateway (clears session state entirely).

Repro-environment context

  • macOS Darwin 24.6.0
  • [email protected]
  • Telegram channel, agent main, agentRuntime.id = claude-cli.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Runtime model selection sticks on fallback after `reason=abort`; `/new` does not clear active model