openclaw - 💡(How to fix) Fix OAuth refresh is not single-flight — concurrent runtime turns revoke the refresh token family

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

Workaround (operator-side, not a fix)

Code Example

const inflightRefreshes = new Map<string, Promise<RefreshResult>>();

async function refreshOAuthCredentialForRuntime(
  provider: string,
  profileId: string,
  /* ... */
): Promise<RefreshResult> {
  const key = `${provider}::${profileId}`;
  const existing = inflightRefreshes.get(key);
  if (existing) return existing;

  const refreshPromise = (async () => {
    try {
      return await performRefresh(/* ... */);
    } finally {
      inflightRefreshes.delete(key);
    }
  })();
  inflightRefreshes.set(key, refreshPromise);
  return refreshPromise;
}
RAW_BUFFERClick to expand / collapse

Problem

OAuth refresh in the runtime is not single-flight. When the access token expires while multiple concurrent runtime turns are in flight (lane workers, parallel agent calls, etc.), every concurrent caller independently invokes refreshXaiOAuthCredential (and the equivalent code paths for other OAuth providers like OpenAI Codex). Each call holds the same in-memory refresh token. Providers implementing OAuth 2.1 token-rotation with reuse detection — xAI is one — react to this by revoking the entire token family, forcing the user to manually re-authenticate.

Reproduce

In a Miles runtime with 4 trading lane workers active:

  1. Wait until access-token expiry (xAI: 6h after issue).
  2. The next request from any lane triggers refreshXaiOAuthCredential.
  3. Within ~1.5s, 3 more lanes each trigger their own refresh attempt with the same refresh token.
  4. The first refresh succeeds. The 3 subsequent refreshes are now using a rotated-out refresh token; xAI returns 400 "invalid_grant" / revoked.
  5. Worse: xAI sees the rotated token replay as a reuse-detection trigger and revokes the entire token family, including the just-issued new tokens.
  6. Operator sees the runtime stop talking to xAI and must openclaw login xai to restore.

This pattern reproduces every ~6h on the Miles runtime under normal load.

Affects

This is not xAI-specific — every OAuth provider implementing OAuth 2.1 refresh-token rotation with reuse detection will hit this. xAI just happens to be one of the strictest. OpenAI Codex with OAuth (vs. API key) is equally exposed in principle.

Concrete fix

Serialize refreshOAuthCredentialForRuntime per (provider, profileId) via a per-key promise cache (single-flight pattern):

const inflightRefreshes = new Map<string, Promise<RefreshResult>>();

async function refreshOAuthCredentialForRuntime(
  provider: string,
  profileId: string,
  /* ... */
): Promise<RefreshResult> {
  const key = `${provider}::${profileId}`;
  const existing = inflightRefreshes.get(key);
  if (existing) return existing;

  const refreshPromise = (async () => {
    try {
      return await performRefresh(/* ... */);
    } finally {
      inflightRefreshes.delete(key);
    }
  })();
  inflightRefreshes.set(key, refreshPromise);
  return refreshPromise;
}

This is the canonical fix for any refresh-token rotation system. All concurrent callers await the same promise; only one HTTP refresh fires; the new tokens are persisted once; all callers see the new access token.

Workaround (operator-side, not a fix)

For now, Miles ships a miles-xai-keepalive plugin (extension in our fork) that pre-refreshes via a single serialized cron call at 4h cadence (2h margin before xAI's 6h expiry). This sidesteps the runtime refresh path entirely — the runtime always finds a fresh access token in the credential store and never hits the refresh code path under concurrency.

This works but doesn't generalize. Every consumer hitting the same problem will need to roll their own pre-refresh sidecar until single-flight lands in core.

Severity

Operator-impact: user has to manually re-authenticate xAI every ~6h. This breaks "always-on Jarvis" semantics. Lane workers fall back to other providers (OpenAI/Anthropic) but observability surfaces flag the gap.

Architecturally: this is a class of bug, not a single bug. The same pattern will bite OpenAI Codex OAuth, Anthropic OAuth, and any future provider with rotation + reuse detection.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING