openclaw - 💡(How to fix) Fix Model fallback doesn't recover after cooldown — persists across sessions [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#76108Fetched 2026-05-03 04:42:13
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Timeline (top)
commented ×1unsubscribed ×1

Error Message

When minimax-portal/MiniMax-M2.7 triggers a format error and enters cooldown, OpenClaw correctly falls back to the next available model (volcengine-plan/ark-code-latest). However, after the cooldown expires, OpenClaw does NOT automatically recover the primary model. Instead: 2. Use the primary model normally until a format error occurs (e.g., API returns an unexpected response format) auth-state.json after the format error:

Fix Action

Workaround

Users must manually:

  1. Edit ~/.openclaw/agents/main/agent/auth-state.json to reset errorCount: 0 and remove cooldownUntil
  2. Restart the gateway
  3. Run /model <primary-alias> to force-switch back

This is non-obvious and burdensome for end users.

Code Example

{
  "minimax-portal:default": {
    "errorCount": 1,
    "cooldownUntil": 1777728876071,
    "cooldownReason": "format",
    "lastFailureAt": 1777728846071
  }
}
RAW_BUFFERClick to expand / collapse

Bug description

When minimax-portal/MiniMax-M2.7 triggers a format error and enters cooldown, OpenClaw correctly falls back to the next available model (volcengine-plan/ark-code-latest). However, after the cooldown expires, OpenClaw does NOT automatically recover the primary model. Instead:

  1. The session continues using the fallback model indefinitely
  2. New sessions (e.g., new Feishu DM threads) also use the fallback model instead of the configured primary
  3. Only a manual model switch via /model or restarting the gateway restores the primary model

Steps to reproduce

  1. Configure minimax-portal/MiniMax-M2.7 as the primary model with fallbacks: [minimax/MiniMax-M2.7, volcengine-plan/ark-code-latest, ...]
  2. Use the primary model normally until a format error occurs (e.g., API returns an unexpected response format)
  3. The model enters cooldown and OpenClaw falls back to volcengine-plan/ark-code-latest
  4. Wait for the cooldown to expire (the cooldownUntil timestamp passes)
  5. Observe that new messages still use volcengine-plan/ark-code-latest, not the primary model

Expected behavior

After a cooldown expires, OpenClaw should:

  • Attempt to use the primary model again for new requests
  • Or at minimum, attempt models in fallback order fresh, rather than sticky-falling-back to the last successful one

Diagnosis data

auth-state.json after the format error:

{
  "minimax-portal:default": {
    "errorCount": 1,
    "cooldownUntil": 1777728876071,
    "cooldownReason": "format",
    "lastFailureAt": 1777728846071
  }
}

After cooldown expires, lastGood still points to the working fallback, not the primary. The heuristic algorithm appears to prefer the most recently successful model over the configured primary.

System info

  • OpenClaw 2026.4.26 (be8c246)
  • macOS Darwin 24.6.0 (arm64)
  • Node.js v25.9.0
  • Model: minimax-portal/MiniMax-M2.7 (OAuth)
  • Channel: Feishu DM

Suggested fix

Options (any of these would help):

  1. Primary model always-first: After cooldown expires, always attempt the primary model first, not the last-used fallback
  2. Session-level model reset: When a session is idle for >X minutes or on explicit model switch, re-evaluate the primary model instead of sticky-using the fallback
  3. Configurable recovery policy: Add a config option like agents.defaults.model.recoveryPolicy: "primary_always" | "last_successful_preferred" so users can control fallback recovery behavior
  4. Clearer diagnostics: Add a log line when cooldown expires and which model is being selected next, so users can understand why the model hasn't "recovered"

Workaround

Users must manually:

  1. Edit ~/.openclaw/agents/main/agent/auth-state.json to reset errorCount: 0 and remove cooldownUntil
  2. Restart the gateway
  3. Run /model <primary-alias> to force-switch back

This is non-obvious and burdensome for end users.

extent analysis

TL;DR

The primary model is not automatically recovered after the cooldown expires, and OpenClaw continues to use the fallback model.

Guidance

  • After the cooldown expires, OpenClaw should attempt to use the primary model again for new requests, rather than sticking with the last successful fallback model.
  • To verify the issue, check the auth-state.json file to see if the lastGood field still points to the working fallback model instead of the primary model after the cooldown expires.
  • A possible mitigation is to implement a configurable recovery policy, such as adding a recoveryPolicy option to the config, allowing users to control fallback recovery behavior.
  • Another possible solution is to reset the errorCount and remove the cooldownUntil field in the auth-state.json file after the cooldown expires, to force OpenClaw to re-evaluate the primary model.

Example

No code snippet is provided as it is not clearly supported by the issue.

Notes

The issue seems to be related to the heuristic algorithm used by OpenClaw to select the next model after a cooldown expires. The algorithm appears to prefer the most recently successful model over the configured primary model.

Recommendation

Apply a workaround by editing the auth-state.json file to reset errorCount: 0 and remove cooldownUntil, then restart the gateway and run /model <primary-alias> to force-switch back to the primary model. This is a temporary solution until a more permanent fix is implemented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After a cooldown expires, OpenClaw should:

  • Attempt to use the primary model again for new requests
  • Or at minimum, attempt models in fallback order fresh, rather than sticky-falling-back to the last successful one

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING