hermes - 💡(How to fix) Fix [Bug]: _is_entitlement_failure over-matches xAI 'bad-credentials' 403 — long-running TUI sessions can't auto-refresh stale OAuth tokens

hermes2026-05-20 14:41:50

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

_is_entitlement_failure in run_agent.py over-matches on xAI Grok 403 responses, causing legitimate "OAuth access token failed validation" errors to be misclassified as unsubscribed-account entitlement failures. The defensive guard against entitlement refresh loops (existing test references issue #26847) suppresses the refresh-on-401 path for both real cases, leaving long-running TUI sessions stuck on a stale token with no recovery.

Workaround: exit and reopen the TUI — the startup refresh path bypasses the broken classifier.

Error Message

"error": "The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]" 5. Hermes logs Non-retryable client error and surfaces it to the user. No refresh attempt happens, even though the credential pool's _refresh_entry for this provider works fine (proven by opening a new TUI session — the startup-resolve path refreshes successfully). _is_entitlement_failure returns True because the response body matches its substring heuristic on "caller does not have permission". The recovery short-circuits, returns False, error surfaces as non-retryable. | Condition | code (same) | error field (the disambiguator) |

Tightest — In _is_entitlement_failure, check the body's error field first: if it contains [WKE=unauthenticated: (or specifically [WKE=unauthenticated:bad-credentials]), return False immediately. Refresh path then handles it. "error": "The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]",

but with status_code=403 and the bad-credentials error body. Should call try_refresh_current().

Root Cause

xAI's API returns the same code field text for two distinct conditions:

Condition	`code` (same)	`error` field (the disambiguator)
Entitlement (account isn't SuperGrok-subscribed)	`"The caller does not have permission to execute the specified operation"`	`"... active Grok subscription. Manage at https://grok.com"` (or similar entitlement language)
Bad credentials (access token failed validation)	`"The caller does not have permission to execute the specified operation"`	`"The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]"`

The existing tests in tests/run_agent/test_codex_xai_oauth_recovery.py cover the entitlement case correctly (test_is_entitlement_failure_matches_real_xai_bodies), but there's no test case for the bad-credentials variant — so the classifier treats both identically.

The [WKE=unauthenticated:bad-credentials] suffix is xAI's authoritative disambiguator. Hermes currently ignores it.

Fix Action

Fix / Workaround

Workaround: exit and reopen the TUI — the startup refresh path bypasses the broken classifier.

Code Example

{
  "code": "The caller does not have permission to execute the specified operation",
  "error": "The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]"
}

---

def test_is_entitlement_failure_false_for_bad_credentials_wke_suffix():
    """403 with WKE=unauthenticated:bad-credentials is auth failure, not entitlement."""
    from run_agent import AIAgent
    assert not AIAgent._is_entitlement_failure(
        {
            "code": "The caller does not have permission to execute the specified operation",
            "error": "The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]",
        },
        403,
    )

def test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403():
    """A bad-credentials 403 from xai-oauth must trigger refresh."""
    # Same scaffolding as test_recover_with_credential_pool_still_refreshes_genuine_auth_failure,
    # but with status_code=403 and the bad-credentials error body. Should call try_refresh_current().

RAW_BUFFERClick to expand / collapse

Summary

Workaround: exit and reopen the TUI — the startup refresh path bypasses the broken classifier.

Repro

Open a Hermes TUI session against provider/xai-oauth (SuperGrok).
Let it sit idle long enough that the access token goes stale by xAI's server-side criteria (in my case, ~22 hours; can happen sooner if xAI rotates session-side).
Send a request.
xAI returns HTTP 403 with this body:

{
  "code": "The caller does not have permission to execute the specified operation",
  "error": "The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]"
}

Hermes logs Non-retryable client error and surfaces it to the user. No refresh attempt happens, even though the credential pool's _refresh_entry for this provider works fine (proven by opening a new TUI session — the startup-resolve path refreshes successfully).

Expected

The [WKE=unauthenticated:bad-credentials] suffix unambiguously indicates this is a credential-validation failure, not an entitlement failure. Hermes should:

Call _recover_with_credential_pool → try_refresh_current() → _swap_credential
Retry the request with the refreshed token
Either succeed (the typical case after a stale token) or, if the refresh itself fails terminally, fall through to the existing terminal-quarantine path

Actual

_is_entitlement_failure returns True because the response body matches its substring heuristic on "caller does not have permission". The recovery short-circuits, returns False, error surfaces as non-retryable.

Root cause

xAI's API returns the same code field text for two distinct conditions:

Condition	`code` (same)	`error` field (the disambiguator)
Entitlement (account isn't SuperGrok-subscribed)	`"The caller does not have permission to execute the specified operation"`	`"... active Grok subscription. Manage at https://grok.com"` (or similar entitlement language)
Bad credentials (access token failed validation)	`"The caller does not have permission to execute the specified operation"`	`"The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]"`

The [WKE=unauthenticated:bad-credentials] suffix is xAI's authoritative disambiguator. Hermes currently ignores it.

Proposed fixes (escalating, pick one)

Tightest — In _is_entitlement_failure, check the body's error field first: if it contains [WKE=unauthenticated: (or specifically [WKE=unauthenticated:bad-credentials]), return False immediately. Refresh path then handles it.
Pragmatic — Require BOTH the entitlement keyword AND the absence of "OAuth2 access token could not be validated" before classifying as entitlement.
Safest — When the WKE suffix says unauthenticated, attempt refresh-once before classifying. The existing loop-protection still kicks in on the second 403 if refresh didn't actually help.

Fix #1 is mechanical and matches the explicit disambiguator xAI sends. Recommended.

Test additions

Suggested cases for tests/run_agent/test_codex_xai_oauth_recovery.py:

def test_is_entitlement_failure_false_for_bad_credentials_wke_suffix():
    """403 with WKE=unauthenticated:bad-credentials is auth failure, not entitlement."""
    from run_agent import AIAgent
    assert not AIAgent._is_entitlement_failure(
        {
            "code": "The caller does not have permission to execute the specified operation",
            "error": "The OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]",
        },
        403,
    )

def test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403():
    """A bad-credentials 403 from xai-oauth must trigger refresh."""
    # Same scaffolding as test_recover_with_credential_pool_still_refreshes_genuine_auth_failure,
    # but with status_code=403 and the bad-credentials error body. Should call try_refresh_current().

Impact

Any long-running TUI / chat session against provider/xai-oauth will eventually 403 once the token goes stale, and the user has to exit/reopen to recover.
Bridge adapters (Discord, Telegram, etc.) appear unaffected in practice because their process lifecycle / proactive refresh cadence keeps tokens fresh enough that the reactive-recovery path is rarely exercised. But they're vulnerable to the same bug under the right timing.
Reproduced on two independent installations of Hermes against two separate SuperGrok-active xAI OAuth accounts — same exact symptom, same exact 403 body.

Environment

Hermes — recent v0.14.x snapshot (cloned source, current main)
Python 3.11.15 on Linux
provider/xai-oauth source manual:xai_pkce (not loopback_pkce, but the bug is upstream of the loopback-vs-manual distinction)
xAI Grok backend, grok-4.3 model, https://api.x.ai/v1

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: _is_entitlement_failure over-matches xAI 'bad-credentials' 403 — long-running TUI sessions can't auto-refresh stale OAuth tokens

Recommended Tools

GitHub issue graph ai analysis

Error Message

but with status_code=403 and the bad-credentials error body. Should call try_refresh_current().

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Repro

Expected

Actual

Root cause

Proposed fixes (escalating, pick one)

Test additions

Impact

Environment

Still need to ship something?

TRENDING