openclaw - ✅(Solved) Fix [Bug]:auth_permanent cache never expires — gateway restart required to recover from transient Google API failures [1 pull requests, 1 participants]

openclaw2026-03-29 06:30:53

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#56838•Fetched 2026-04-08 01:47:10

View on GitHub

Comments

Participants

Timeline

Reactions

Author

fenglanhua

Participants

fenglanhua

Timeline (top)

referenced ×3closed ×1cross-referenced ×1labeled ×1

When a Google API call fails transiently (e.g., due to a momentary GCP outage or rate limit), OpenClaw marks the entire Google provider as auth_permanent, causing all Google models to be permanently skipped for the rest of the gateway session. Even after the GCP issue resolves, the gateway continues to skip all Google models indefinitely until manually restarted.

Root Cause

Fix Action

Fix / Workaround

Frequency: Occurs every time GCP has even a brief transient failure Severity: Medium-High — agent becomes completely non-functional until user manually restarts the gateway Workaround: Manual systemctl --user restart openclaw-gateway.service — works but requires user intervention

PR fix notes

PR #60404: fix(auth): use shorter backoff for auth_permanent failures

Repository: openclaw/openclaw
Author: extrasmall0
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/60404

Description (problem / solution / changelog)

Problem

When a provider returns a transient auth error (e.g. API_KEY_INVALID during a GCP outage), OpenClaw marks it as auth_permanent and applies the same 5h–24h exponential backoff used for billing failures. This effectively disables the provider for hours even after the upstream issue resolves — the only workaround is restarting the gateway.

Reported in #56838.

Fix

Give auth_permanent its own backoff curve with much shorter defaults:

Config key	Default	Description
`auth.cooldowns.authPermanentBackoffMinutes`	10	Base backoff (minutes)
`auth.cooldowns.authPermanentMaxMinutes`	60	Backoff cap (minutes)

The exponential progression (base × 2^(n-1), capped) still applies, so repeated failures ramp up to 1 hour max rather than 24 hours.

Changes

src/agents/auth-profiles/usage.ts — separate auth_permanent from billing in computeNextProfileUsageStats, add new config resolution
src/config/types.auth.ts — add authPermanentBackoffMinutes and authPermanentMaxMinutes to cooldown config type
src/config/schema.labels.ts / schema.help.ts — labels and help text for new config keys
src/agents/auth-profiles/usage.test.ts — update test expectations for the new backoff values

All existing tests pass (136/136).

Changed files

CHANGELOG.md (modified, +1/-0)
docs/.generated/config-baseline.core.json (modified, +32/-0)
docs/.generated/config-baseline.json (modified, +32/-0)
docs/gateway/configuration-reference.md (modified, +4/-0)
src/agents/auth-profiles.markauthprofilefailure.test.ts (modified, +29/-1)
src/agents/auth-profiles/usage.test.ts (modified, +3/-3)
src/agents/auth-profiles/usage.ts (modified, +34/-5)
src/agents/failover-error.test.ts (modified, +9/-13)
src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +15/-6)
src/agents/pi-embedded-helpers/failover-matches.ts (modified, +3/-4)
src/config/config-misc.test.ts (modified, +14/-0)
src/config/schema.base.generated.ts (modified, +18/-0)
src/config/schema.help.ts (modified, +4/-0)
src/config/schema.labels.ts (modified, +2/-0)
src/config/types.auth.ts (modified, +9/-0)
src/config/zod-schema.ts (modified, +2/-0)

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Steps to reproduce

Configure OpenClaw with Google Gemini as primary model and fallback chain Wait for a transient GCP API failure (e.g., INVALID_ARGUMENT: API_KEY_INVALID or UNAUTHENTICATED) — this can happen during periods of GCP instability Observe auth_permanent being set for Google provider in logs Wait for GCP to recover (confirmed working via GCP Console) Send a message to the agent

Expected behavior

After a transient failure, OpenClaw should:

Retry the provider after a cooldown period (e.g., 5–10 minutes) OR expose a CLI command to clear auth_permanent state without restarting the gateway OR automatically clear auth_permanent on the next heartbeat/health check if the provider recovers

Actual behavior

model fallback decision: decision=skip_candidate reason=auth_permanent model fallback decision: decision=skip_candidate reason=auth_permanent model fallback decision: decision=skip_candidate reason=auth_permanent Embedded agent failed before reply: All models failed: google/gemini-3-flash-preview: Provider google has auth_permanent issue (skipping all models) google/gemini-3.1-flash-lite-preview: Provider google has auth_permanent issue (skipping all models) ...

OpenClaw version

2026.3.24

Operating system

Rocky 10.1

Install method

No response

Model

Google Gemini (gemini-3-flash-preview, gemini-3.1-flash-lite-preview, gemini-3.1-pro-preview, gemini-2.5-flash)

Provider / routing chain

gemini-3-flash-preview, gemini-3.1-flash-lite-preview, gemini-3.1-pro-preview, gemini-2.5-flash

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

Additional information

Suggested fix Add a TTL (e.g., 5–15 minutes) to auth_permanent state so it automatically retries after a cooldown period, similar to how transient errors are handled in other systems. Alternatively, expose a CLI command such as: openclaw models auth reset --provider google to clear the cached failure state without requiring a full gateway restart.

extent analysis

Fix Plan

To address the issue, we will implement a TTL (time-to-live) for the auth_permanent state. This will allow the system to automatically retry the Google provider after a cooldown period.

Step-by-Step Solution

Add a TTL to the auth_permanent state:
- Introduce a new configuration option, e.g., auth_permanent_ttl, to set the cooldown period (e.g., 5-15 minutes).
- Update the auth_permanent state to include a timestamp when it is set.
Implement automatic retry:
- Create a scheduled task (e.g., every 1-2 minutes) to check the auth_permanent state for each provider.
- If the cooldown period has passed, clear the auth_permanent state and allow the provider to be retried.
Expose a CLI command to clear the auth_permanent state:
- Add a new CLI command, e.g., openclaw models auth reset --provider google, to manually clear the auth_permanent state for a specific provider.

Example Code (Python)

import time
from datetime import datetime, timedelta

# Configuration option for auth_permanent TTL
auth_permanent_ttl = 10  # minutes

# Example function to set auth_permanent state with TTL
def set_auth_permanent(provider):
    auth_permanent_state[provider] = {
        'timestamp': datetime.now(),
        'ttl': auth_permanent_ttl
    }

# Example function to check and clear auth_permanent state
def check_auth_permanent(provider):
    if provider in auth_permanent_state:
        state = auth_permanent_state[provider]
        if datetime.now() - state['timestamp'] > timedelta(minutes=state['ttl']):
            del auth_permanent_state[provider]
            return True
    return False

# Example CLI command to clear auth_permanent state
def clear_auth_permanent(provider):
    if provider in auth_permanent_state:
        del auth_permanent_state[provider]
        print(f"Cleared auth_permanent state for {provider}")
    else:
        print(f"No auth_permanent state found for {provider}")

Verification

To verify the fix, follow these steps:

Configure the auth_permanent_ttl option to a suitable value (e.g., 5 minutes).
Simulate a transient GCP API failure to trigger the auth_permanent state.
Wait for the cooldown period to pass and verify that the auth_permanent state is automatically cleared.
Use the CLI command to manually clear the auth_permanent state and verify that it is removed.

Extra Tips

Monitor

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

After a transient failure, OpenClaw should:

#api #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]:auth_permanent cache never expires — gateway restart required to recover from transient Google API failures [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #60404: fix(auth): use shorter backoff for auth_permanent failures

Description (problem / solution / changelog)

Problem

Fix

Changes

Changed files

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

Fix Plan

Step-by-Step Solution

Example Code (Python)

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING