openclaw - 💡(How to fix) Fix Gateway enters infinite model-switch loop when all auth profiles fail [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#57905Fetched 2026-04-08 01:56:17
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0

Error Message

When all models/providers fail, the gateway should send an error message to the user and stop retrying — not spin forever.

Root Cause

Problem

When all auth profiles enter cooldown (e.g., Anthropic OAuth token rejected + no xAI key), the gateway enters an infinite model-switch loop cycling every ~1 second. The gateway becomes completely unresponsive and cannot process any messages. The loop persists across restarts because session state is recreated from sessions.json.

RAW_BUFFERClick to expand / collapse

Problem

When all auth profiles enter cooldown (e.g., Anthropic OAuth token rejected + no xAI key), the gateway enters an infinite model-switch loop cycling every ~1 second. The gateway becomes completely unresponsive and cannot process any messages. The loop persists across restarts because session state is recreated from sessions.json.

Expected behavior

When all models/providers fail, the gateway should send an error message to the user and stop retrying — not spin forever.

Environment

  • OpenClaw 2026.3.28
  • macOS (arm64)
  • Telegram channel

Steps to reproduce

  1. Have Anthropic as primary, with fallbacks to providers that have no key configured
  2. Anthropic OAuth token gets rejected (401)
  3. Auth profile enters cooldown
  4. Gateway cycles through fallbacks, all fail, loops back to primary
  5. Infinite loop at ~1/sec consuming all gateway resources

Additional context

  • Manual cooldown reset in auth-profiles.json did not take effect on restart
  • The configure wizard also left stale claude-cli/ model prefixes which compounded the issue

extent analysis

Fix Plan

To resolve the infinite model-switch loop, we need to implement a mechanism to detect when all auth profiles have failed and enter a cooldown period, preventing further retries.

Code Changes

We will introduce a new variable allProfilesFailed to track when all profiles have entered cooldown. We will also add a check to prevent retries when this condition is met.

# auth_manager.py
class AuthManager:
    def __init__(self):
        self.allProfilesFailed = False
        # ... existing code ...

    def switch_model(self):
        if self.allProfilesFailed:
            # Send error message to user and stop retrying
            self.send_error_message("All auth profiles have failed. Please check your configuration.")
            return

        # ... existing code to switch models ...

    def update_profile_status(self, profile, status):
        if status == "cooldown":
            # Check if all profiles have failed
            self.allProfilesFailed = all(profile.status == "cooldown" for profile in self.profiles)
            if self.allProfilesFailed:
                # Prevent retries
                self.retry_timeout = None
        # ... existing code ...

Configuration Changes

No configuration changes are required for this fix.

Verification

To verify the fix, follow these steps:

  • Reproduce the issue by having all auth profiles enter cooldown.
  • Verify that the gateway sends an error message to the user and stops retrying.
  • Check the logs to ensure that the allProfilesFailed variable is set correctly and retries are prevented.

Extra Tips

  • Make sure to handle the allProfilesFailed variable correctly in case of multiple auth profiles.
  • Consider adding a timeout or a maximum number of retries before entering the cooldown period to prevent abuse.
  • Review the auth-profiles.json file to ensure that the cooldown reset is properly applied after a restart.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When all models/providers fail, the gateway should send an error message to the user and stop retrying — not spin forever.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING