openclaw - ✅(Solved) Fix [Bug]: New sessions inherit authProfileOverride from cooldown/rate-limited backup profile instead of using default [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62412Fetched 2026-04-08 03:04:39
View on GitHub
Comments
0
Participants
1
Timeline
9
Reactions
0
Author
Participants
Timeline (top)
referenced ×7cross-referenced ×2

New sessions are created with authProfileOverride set to a non-default auth profile (e.g. anthropic:backup1) even when:

  1. The lastGood field in auth-profiles.json correctly points to anthropic:default
  2. The backup profile is in cooldown due to rate limiting (cooldownReason: "rate_limit")
  3. The anthropic:default profile has errorCount: 0 and is fully functional

The gateway auto-failover rotates to a backup profile on a transient rate limit, sets authProfileOverride with authProfileOverrideSource: "auto" in the session store, and then every new session inherits that override — even after the cooldown expires and the default profile is healthy again.

Clearing the override from sessions.json and restarting the gateway does not help — the override is immediately re-written on the next request.

Root Cause

New sessions are created with authProfileOverride set to a non-default auth profile (e.g. anthropic:backup1) even when:

  1. The lastGood field in auth-profiles.json correctly points to anthropic:default
  2. The backup profile is in cooldown due to rate limiting (cooldownReason: "rate_limit")
  3. The anthropic:default profile has errorCount: 0 and is fully functional

The gateway auto-failover rotates to a backup profile on a transient rate limit, sets authProfileOverride with authProfileOverrideSource: "auto" in the session store, and then every new session inherits that override — even after the cooldown expires and the default profile is healthy again.

Clearing the override from sessions.json and restarting the gateway does not help — the override is immediately re-written on the next request.

PR fix notes

PR #1: fix(auth): stop new sessions inheriting auto-selected auth profile overrides

Description (problem / solution / changelog)

Summary

  • Problem: New sessions inherit authProfileOverride from rate-limited or cooldown backup profiles instead of using the default/best available profile. The auto-failover override persists across gateway restarts, /new, /reset, and manual session clearing.
  • Why it matters: Users get stuck on a degraded auth profile with no way to recover short of deleting session state, and the lastGood/cooldownUntil fields are effectively ignored.
  • What changed: Auto-selected (source === "auto") auth profile overrides are now discarded at session boundaries (gateway reset, /new, /reset). Only user-explicitly-set overrides (source === "user") carry over. Defense-in-depth in resolveSessionAuthProfileOverride ensures new sessions always call pickFirstAvailable() for auto overrides.
  • What did NOT change (scope boundary): User-set auth profile overrides (/auth command) still carry over across session resets as before. The round-robin and cooldown logic is unchanged for continuing (non-new) sessions.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #62412
  • Related #51251, #55063, #57760, #56393
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: Three code paths unconditionally carry over authProfileOverride from old to new sessions regardless of whether the override was auto-selected (failover) or user-set. The authProfileOverrideSource discriminator existed in the data model but was not consulted at session boundaries.
  • Missing detection / guardrail: No distinction between "auto" and "user" source when propagating auth profile overrides across session resets.
  • Contributing context (if known): The auto-failover system correctly sets authProfileOverrideSource: "auto" but the session reset and init paths treated all overrides identically.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/auth-profiles/session-override.test.ts
  • Scenario the test should lock in: New session with authProfileOverrideSource: "auto" picks first available profile (not round-robin from stale position); new session with authProfileOverrideSource: "user" round-robins as before; cooldown profiles are skipped.
  • Why this is the smallest reliable guardrail: The resolver function is the last line of defense — testing it covers the defense-in-depth layer regardless of upstream session propagation behavior.
  • Existing test that already covers this (if any): None — only provider-alias and early-return tests existed.
  • If no new test is added, why not: 5 new tests added.

User-visible / Behavior Changes

  • New sessions (via /new, /reset, gateway restart, or session expiry) now start with the best available auth profile instead of inheriting a stale auto-failover profile that may be in cooldown or rate-limited.
  • User-explicitly-set auth profile overrides (via /auth or similar) still persist across session resets.

Diagram (if applicable)

Before:
[rate limit] -> [auto-failover to backup1] -> [/new] -> [backup1 still active, even if in cooldown]

After:
[rate limit] -> [auto-failover to backup1] -> [/new] -> [pickFirstAvailable() -> default profile]

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux 6.17.0-1010-azure (x64) per issue report
  • Runtime/container: Node v22.22.1
  • Model/provider: anthropic/claude-opus-4-6
  • Integration/channel (if any): Telegram DM
  • Relevant config (redacted): 3 Anthropic auth profiles (default + 2 backups)

Steps

  1. Configure multiple auth profiles for the same provider
  2. Use system until one profile hits a transient rate limit, triggering auto-failover
  3. Start a new session (/new, /reset, or new message after expiry)

Expected

  • New session uses the best available non-cooldown profile (typically the default)

Actual

  • New session inherits the rate-limited backup profile from the previous session

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

5 new unit tests in src/agents/auth-profiles/session-override.test.ts all pass, covering auto vs user override propagation, cooldown skipping, legacy inference, and all-cooldown fallback.

Human Verification (required)

  • Verified scenarios: All 5 new tests pass; all 7 tests in session-override.test.ts pass; full auth-profiles test suite (109/110 pass, 1 pre-existing unrelated failure in state-observation.test.ts)
  • Edge cases checked: Legacy session entries without authProfileOverrideSource, all-profiles-in-cooldown fallback, provider alias normalization
  • What you did not verify: End-to-end gateway restart with real multi-profile setup; the gateway session-reset-service path is tested indirectly via the defense-in-depth layer

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: Existing sessions with authProfileOverrideSource: "auto" will lose their override on the next session reset, changing profile selection behavior.
    • Mitigation: This is the intended fix. Auto overrides were never meant to be permanent; they should be re-evaluated per session.

🤖 Generated with Claude Code

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/auth-profiles/session-override.test.ts (modified, +205/-1)
  • src/agents/auth-profiles/session-override.ts (modified, +4/-1)
  • src/auto-reply/reply/session.ts (modified, +5/-3)
  • src/gateway/session-reset-service.ts (modified, +12/-3)

PR #62710: fix(auth): stop new sessions inheriting auto-selected auth profile overrides

Description (problem / solution / changelog)

Summary

  • Problem: New sessions inherit authProfileOverride from rate-limited or cooldown backup profiles instead of using the default/best available profile. The auto-failover override persists across gateway restarts, /new, /reset, and manual session clearing.
  • Why it matters: Users get stuck on a degraded auth profile with no way to recover short of deleting session state, and the lastGood/cooldownUntil fields are effectively ignored.
  • What changed: Auto-selected (source === "auto") auth profile overrides are now discarded at session boundaries (gateway reset, /new, /reset). Only user-explicitly-set overrides (source === "user") carry over. Defense-in-depth in resolveSessionAuthProfileOverride ensures new sessions always call pickFirstAvailable() for auto overrides.
  • What did NOT change (scope boundary): User-set auth profile overrides (/auth command) still carry over across session resets as before. The round-robin and cooldown logic is unchanged for continuing (non-new) sessions.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #62412
  • Related #51251, #55063, #57760, #56393
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: Three code paths unconditionally carry over authProfileOverride from old to new sessions regardless of whether the override was auto-selected (failover) or user-set. The authProfileOverrideSource discriminator existed in the data model but was not consulted at session boundaries.
  • Missing detection / guardrail: No distinction between "auto" and "user" source when propagating auth profile overrides across session resets.
  • Contributing context (if known): The auto-failover system correctly sets authProfileOverrideSource: "auto" but the session reset and init paths treated all overrides identically.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/auth-profiles/session-override.test.ts
  • Scenario the test should lock in: New session with authProfileOverrideSource: "auto" picks first available profile (not round-robin from stale position); new session with authProfileOverrideSource: "user" round-robins as before; cooldown profiles are skipped.
  • Why this is the smallest reliable guardrail: The resolver function is the last line of defense — testing it covers the defense-in-depth layer regardless of upstream session propagation behavior.
  • Existing test that already covers this (if any): None — only provider-alias and early-return tests existed.
  • If no new test is added, why not: 5 new tests added.

User-visible / Behavior Changes

  • New sessions (via /new, /reset, gateway restart, or session expiry) now start with the best available auth profile instead of inheriting a stale auto-failover profile that may be in cooldown or rate-limited.
  • User-explicitly-set auth profile overrides (via /auth or similar) still persist across session resets.

Diagram (if applicable)

Before:
[rate limit] -> [auto-failover to backup1] -> [/new] -> [backup1 still active, even if in cooldown]

After:
[rate limit] -> [auto-failover to backup1] -> [/new] -> [pickFirstAvailable() -> default profile]

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux 6.17.0-1010-azure (x64) per issue report
  • Runtime/container: Node v22.22.1
  • Model/provider: anthropic/claude-opus-4-6
  • Integration/channel (if any): Telegram DM
  • Relevant config (redacted): 3 Anthropic auth profiles (default + 2 backups)

Steps

  1. Configure multiple auth profiles for the same provider
  2. Use system until one profile hits a transient rate limit, triggering auto-failover
  3. Start a new session (/new, /reset, or new message after expiry)

Expected

  • New session uses the best available non-cooldown profile (typically the default)

Actual

  • New session inherits the rate-limited backup profile from the previous session

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

5 new unit tests in src/agents/auth-profiles/session-override.test.ts all pass, covering auto vs user override propagation, cooldown skipping, legacy inference, and all-cooldown fallback.

Human Verification (required)

  • Verified scenarios: All 5 new tests pass; all 7 tests in session-override.test.ts pass; full auth-profiles test suite (109/110 pass, 1 pre-existing unrelated failure in state-observation.test.ts)
  • Edge cases checked: Legacy session entries without authProfileOverrideSource, all-profiles-in-cooldown fallback, provider alias normalization
  • What you did not verify: End-to-end gateway restart with real multi-profile setup; the gateway session-reset-service path is tested indirectly via the defense-in-depth layer

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: Existing sessions with authProfileOverrideSource: "auto" will lose their override on the next session reset, changing profile selection behavior.
    • Mitigation: This is the intended fix. Auto overrides were never meant to be permanent; they should be re-evaluated per session.

🤖 Generated with Claude Code

AI Disclosure

  • This PR was AI-assisted (built with Claude Code / Claude Opus 4.6)
  • Fully tested: 5 new unit tests added, all 7 tests in session-override.test.ts pass, full auth-profiles/ suite passes (109/110, 1 pre-existing unrelated failure)
  • I understand what the code does
  • Bot review conversations will be resolved as addressed

CI Note

The check and check-additional CI failures are pre-existing on main — type errors in extensions/slack/src/approval-handler.runtime.test.ts and extensions/telegram/src/approval-handler.runtime.test.ts (TS18046: 'payload' is of type 'unknown'). These are unrelated to this PR's changes.

Changed files

  • CHANGELOG.md (modified, +3/-0)
  • src/agents/auth-profiles/session-override.test.ts (modified, +295/-1)
  • src/agents/auth-profiles/session-override.ts (modified, +7/-1)
  • src/auto-reply/reply/session.ts (modified, +11/-3)
  • src/gateway/session-reset-service.ts (modified, +9/-2)

Code Example

{
  "lastGood": {
    "anthropic": "anthropic:default"
  },
  "usageStats": {
    "anthropic:default": {
      "errorCount": 0,
      "lastUsed": 1775552848071
    },
    "anthropic:backup1": {
      "errorCount": 1,
      "cooldownUntil": 1775552797656,
      "cooldownReason": "rate_limit"
    }
  }
}

---

{
  "authProfileOverride": "anthropic:backup1",
  "authProfileOverrideSource": "auto"
}
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect state without crash)

Summary

New sessions are created with authProfileOverride set to a non-default auth profile (e.g. anthropic:backup1) even when:

  1. The lastGood field in auth-profiles.json correctly points to anthropic:default
  2. The backup profile is in cooldown due to rate limiting (cooldownReason: "rate_limit")
  3. The anthropic:default profile has errorCount: 0 and is fully functional

The gateway auto-failover rotates to a backup profile on a transient rate limit, sets authProfileOverride with authProfileOverrideSource: "auto" in the session store, and then every new session inherits that override — even after the cooldown expires and the default profile is healthy again.

Clearing the override from sessions.json and restarting the gateway does not help — the override is immediately re-written on the next request.

Environment

  • OpenClaw version: 2026.4.5 (3e72c03)
  • OS: Linux 6.17.0-1010-azure (x64)
  • Node: v22.22.1
  • Gateway mode: local
  • Channel: Telegram DM

Configuration

Three Anthropic auth profiles configured:

  • anthropic:default (primary, working)
  • anthropic:backup1 (backup, rate-limited)
  • anthropic:backup2 (backup)

Primary model: anthropic/claude-opus-4-6

Steps to Reproduce

  1. Configure multiple auth profiles for the same provider (e.g., 3 Anthropic token profiles)
  2. Use the system normally until one profile hits a transient rate limit
  3. Gateway auto-rotates to a backup profile and sets authProfileOverride in session state
  4. The backup profile also hits rate limits and enters cooldown
  5. Observe that auth-profiles.json correctly shows:
    • lastGood.anthropic = anthropic:default
    • backup profile has cooldownUntil set and cooldownReason: "rate_limit"
    • default profile has errorCount: 0
  6. Start a new session (new message, cron run, etc.)
  7. New session is created with authProfileOverride: "anthropic:backup1" despite the above

Expected Behavior

  • New sessions should start with no authProfileOverride (use the default profile)
  • If auto-failover must set an override, it should respect lastGood and cooldown state
  • A profile in cooldown should never be selected as the override for a new session
  • Clearing the override from sessions.json + gateway restart should be sufficient to reset

Actual Behavior

  • New sessions are immediately created with authProfileOverride pointing to the rate-limited backup
  • The override persists across:
    • Gateway restarts
    • Manual clearing of sessions.json fields (re-written on next request)
    • /new and /reset commands
  • The lastGood and cooldownUntil fields in auth-profiles.json are ignored when selecting the profile for new sessions

Evidence

auth-profiles.json usageStats showing the contradiction:

{
  "lastGood": {
    "anthropic": "anthropic:default"
  },
  "usageStats": {
    "anthropic:default": {
      "errorCount": 0,
      "lastUsed": 1775552848071
    },
    "anthropic:backup1": {
      "errorCount": 1,
      "cooldownUntil": 1775552797656,
      "cooldownReason": "rate_limit"
    }
  }
}

Yet sessions.json for every new session shows:

{
  "authProfileOverride": "anthropic:backup1",
  "authProfileOverrideSource": "auto"
}

Related Issues

  • #51251 — Session modelOverride persists across gateway restarts
  • #55063 — /new and /reset preserve session model override
  • #57760 — Automatic model failover loop on rate limits
  • #56393 — Bot not responding after API key update (session override suspected)

These cover the model override variant; this issue is specifically about auth profile override stickiness and the failure to respect lastGood / cooldown state when creating new sessions.

extent analysis

TL;DR

The most likely fix involves modifying the gateway's auto-failover logic to respect the lastGood and cooldown state when selecting an auth profile for new sessions.

Guidance

  1. Review auto-failover logic: Examine the code responsible for auto-failover to ensure it checks the lastGood field in auth-profiles.json and the cooldown state of backup profiles before setting authProfileOverride.
  2. Update session creation: Modify the session creation process to clear authProfileOverride if the selected profile is in cooldown or not the lastGood profile.
  3. Verify auth-profiles.json usage: Confirm that auth-profiles.json is being updated correctly, reflecting the lastGood profile and cooldown states.
  4. Test with cooldown scenarios: Thoroughly test the updated logic with various cooldown scenarios to ensure new sessions are created with the correct auth profile.

Example

A code snippet to illustrate the logic update might look like this:

if (backupProfile.cooldownUntil > Date.now() || backupProfile.errorCount > 0) {
  // Do not set authProfileOverride to a profile in cooldown or with errors
  authProfileOverride = lastGoodProfile;
} else {
  // Set authProfileOverride to the backup profile if it's not in cooldown and has no errors
  authProfileOverride = backupProfile.name;
}

However, without the exact codebase, this remains a conceptual example.

Notes

The provided information suggests a logic issue within the auto-failover mechanism rather than a configuration problem. The fix should focus on ensuring the gateway respects the lastGood and cooldown states when managing auth profiles for new sessions.

Recommendation

Apply a workaround by manually intervening in the auto-failover logic to prevent it from setting authProfileOverride to a profile in cooldown, allowing the system to default to the lastGood profile as intended. This approach requires careful consideration of the cooldown and error states of all configured profiles.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING