openclaw - ✅(Solved) Fix [Bug]: v2026.4.26: OAuth Refresh Lock Fails in Multi-Agent Swarm (401 refresh_token_reused) [1 pull requests, 4 comments, 3 participants]

openclaw2026-04-29 03:58:25

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#74055•Fetched 2026-04-30 06:29:19

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×4cross-referenced ×3labeled ×2closed ×1

Description

OS: Windows 11 (PowerShell)

OpenClaw Version: 2026.4.26

Provider: openai-codex (ChatGPT Plus OAuth)

Model Routing: openai/gpt-5.5 canonical mapping

Steps to Reproduce

Purge Cache: Stopped the gateway and completely deleted the root credentials folder, the global workspace .openclaw folder, and all hidden .openclaw state folders inside the individual agent directories to ensure a sterile environment.

Generate Master Token: Ran openclaw models auth login --provider openai-codex and completed the browser OAuth handshake.

Mirror Auth Profile: Used PowerShell to strictly copy the resulting auth-profiles.json into all 5 isolated agent directories (Atlas, Axon, Nova, Nexis, Lumi).

Configure JSON: Updated openclaw.json to use canonical naming (openai/gpt-5.5) and mapped the auth routing at the bottom:

JSON

"auth": { "profiles": { "openai-codex:[REDACTED_EMAIL]": { "provider": "openai-codex", "mode": "oauth" } }, "routing": { "openai": "openai-codex:[REDACTED_EMAIL]" } }

Boot Gateway: Ran openclaw doctor (archived orphan transcripts, declined forced token refresh), then openclaw gateway restart.

Trigger Swarm: Sent a prompt to one of the agents via Discord bindings.

Expected Behavior

OpenClaw's internal OAuth refresh lock recognizes that all 5 agents are sharing identical token footprints via their mirrored auth-profiles.json files, serializes the refresh requests, and safely rotates the token for the entire swarm without causing a race condition. Actual Behavior

The gateway boots cleanly and connects to Discord (sessions.subscribe succeeds). However, upon receiving a prompt, the lock fails to serialize the requests. Multiple agents attempt to spend the rotating token, triggering a hard block from OpenAI's servers and crashing the connection. Redacted Logs Plaintext

21:21:47 [openai-codex] Token refresh failed: 401 { "error": { "message": "Your refresh token has already been used to generate a new access token. Please try signing in again.", "type": "invalid_request_error", "param": null, "code": "refresh_token_reused" } } 21:21:47 [diagnostic] lane task error: lane=session:agent:nova:discord:channel:[REDACTED] durationMs=238566 error="Error: OAuth token refresh failed for openai-codex: Failed to refresh OpenAI Codex token. Please try again or re-authenticate." 21:21:47 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai-codex/gpt-5.5 candidate=openai-codex/gpt-5.5 reason=auth next=none detail=OAuth token refresh failed

Diagnostics

openclaw status --all output: Plaintext

[Paste the output of this command here before submitting]

openclaw agents list --bindings output: Plaintext

[Paste the output of this command here before submitting]

Additional Context

This appears to be a regression or a broken lock mechanism specifically in .26. Staggered booting (starting the gateway with one agent, authenticating, and hot-reloading the JSON with the remaining agents) also results in the exact same 401 crash the moment a newly loaded agent is pinged. The only current workaround is dropping to a single-agent architecture or abandoning OAuth for a paid stateless API key.

Error Message

In version 2026.4.26, the internal OAuth refresh lock fails to serialize token refresh requests when running a multi-agent swarm sharing a single ChatGPT Plus account (openai-codex provider). Even when strictly following the "intended shape" architecture (mirroring a single auth-profiles.json across separate agent directories), concurrent agent activity bypasses the lock. This causes multiple agents to attempt a token refresh simultaneously, resulting in OpenAI instantly invalidating the session with a 401 refresh_token_reused error. "error": { 21:21:47 [diagnostic] lane task error: lane=session:agent:nova:discord:channel:[REDACTED] durationMs=238566 error="Error: OAuth token refresh failed for openai-codex: Failed to refresh OpenAI Codex token. Please try again or re-authenticate." The gateway fails to process the prompt and no reply is posted in Discord; the attached gateway log shows [openai-codex] Token refresh failed: 401 {"error": {"message": "Your refresh token has already been used to generate a new access token...", "code": "refresh_token_reused"}} and the Discord UI returns Model login failed on the gateway for openai-codex.

Root Cause

Description

OS: Windows 11 (PowerShell)

OpenClaw Version: 2026.4.26

Provider: openai-codex (ChatGPT Plus OAuth)

Model Routing: openai/gpt-5.5 canonical mapping

Steps to Reproduce

Purge Cache: Stopped the gateway and completely deleted the root credentials folder, the global workspace .openclaw folder, and all hidden .openclaw state folders inside the individual agent directories to ensure a sterile environment.

Generate Master Token: Ran openclaw models auth login --provider openai-codex and completed the browser OAuth handshake.

Mirror Auth Profile: Used PowerShell to strictly copy the resulting auth-profiles.json into all 5 isolated agent directories (Atlas, Axon, Nova, Nexis, Lumi).

Configure JSON: Updated openclaw.json to use canonical naming (openai/gpt-5.5) and mapped the auth routing at the bottom:

JSON

"auth": { "profiles": { "openai-codex:[REDACTED_EMAIL]": { "provider": "openai-codex", "mode": "oauth" } }, "routing": { "openai": "openai-codex:[REDACTED_EMAIL]" } }

Boot Gateway: Ran openclaw doctor (archived orphan transcripts, declined forced token refresh), then openclaw gateway restart.

Trigger Swarm: Sent a prompt to one of the agents via Discord bindings.

Expected Behavior

Diagnostics

openclaw status --all output: Plaintext

[Paste the output of this command here before submitting]

openclaw agents list --bindings output: Plaintext

[Paste the output of this command here before submitting]

Additional Context

Fix Action

Fix / Workaround

PR fix notes

PR #74214: fix(agents): adopt peer-rotated OAuth credential from in-process cache

Repository: openclaw/openclaw
Author: openperf
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/74214

Description (problem / solution / changelog)

Summary

Problem: Multi-agent OAuth swarms that share a single Codex profile via mirrored auth-profiles.json files (the documented multi-agent shape) crash with OAuthManagerRefreshError: ... refresh_token_reused 401s the moment a queued peer in the same gateway process picks up a refresh request after a peer has already rotated the token. The user-visible failure is reproduced verbatim by src/agents/auth-profiles/oauth.concurrent-agents.test.ts > rescues a queued peer when the leader's mirror to main is lost (issue #74055), which fails on current main with the same error string the user reported.
Root Cause: doRefreshOAuthTokenWithLock (src/agents/auth-profiles/oauth-manager.ts) and resolveOAuthAccess rely on a single hand-off between peers: when the leader rotates the token, it persists the new credential to its own per-agent store and then calls mirrorRefreshedCredentialIntoMainStore, which performs a best-effort updateAuthProfileStoreWithLock(undefined, …) write to the main store. The next queued peer is supposed to discover that mirror through one of three adoption checkpoints (adoptNewerMainOAuthCredential outside the lock, the inside-lock loadAuthProfileStoreForSecretsRuntime(undefined) check, and the post-failure main-store fallback in resolveOAuthAccess). Every one of those checkpoints reads main from the filesystem. When the disk-side mirror is dropped — mirrorRefreshedCredentialIntoMainStore's catch block silently swallows errors at log.debug, withFileLock(mainPath, …) can time out under transient Windows file-lock contention with antivirus tooling, and external processes (e.g. a doctor flow writing back a cached store) can roll the file back — the leader's refresh succeeded but the disk under main is stale. The queued peer then loads the stale main view, falls through to adapter.refreshCredential(credentialToRefresh) with its own already-rotated refresh_token, and the OpenAI Codex token endpoint replies 401 refresh_token_reused. There is no in-process channel that lets the peer discover the leader's result independent of the disk mirror.
Fix: Make the leader's refresh result first-class state inside the OAuth manager. createOAuthManager now keeps a small recentlyRefreshedCredentials map keyed by (provider, profileId) that is populated the moment a refresh succeeds (before the disk mirror runs) and consulted by all three adoption checkpoints — adoptNewerMainOAuthCredential (pre-queue), the inside-lock adoption inside doRefreshOAuthTokenWithLock (just after hasUsableOAuthCredential rejects the agent's stale view), and the post-failure main-store fallback in resolveOAuthAccess (so a refresh that already raised refresh_token_reused can still recover). Cache hits require a strict positive identity match: both the requesting and cached credentials must bear accountId or email and pass isSafeToAdoptMainStoreOAuthIdentity, which prevents fuzzy hits from leaking across unrelated profiles or test fixtures. The disk-side adoption paths are unchanged; the cache is purely additive defense in depth and survives any failure mode that drops the mirror to main on disk. The cache is reset alongside refreshQueues in the existing resetRefreshQueuesForTest hook so the test surface stays unchanged.
What changed:
- src/agents/auth-profiles/oauth-manager.ts: introduce recentlyRefreshedCredentials and the rememberRefreshedCredential / findInProcessRefreshedCredential helpers inside createOAuthManager; consult the cache in adoptNewerMainOAuthCredential, in the inside-lock adoption block of doRefreshOAuthTokenWithLock, and in the post-failure main-store fallback of resolveOAuthAccess; publish to the cache the moment a refresh succeeds; clear it in resetRefreshQueuesForTest; import hasOAuthIdentity from oauth-shared.js for the strict identity gate.
- src/agents/auth-profiles/oauth.concurrent-agents.test.ts: add the rescues a queued peer when the leader's mirror to main is lost (issue #74055) regression. It seeds a leader and a follower sub-agent plus main with the same expired credential, lets the leader rotate the token through the real flow, then forcibly reverts main on disk to the pre-mirror state and configures the refresh mock to throw refresh_token_reused for any subsequent call. Without this PR the test fails with the exact user-reported OAuthManagerRefreshError; with it the follower adopts the leader's result from the in-process cache without ever spending its own already-rotated refresh_token. Adds AuthProfileStore to the type imports.
What did NOT change (scope boundary):
- The cross-process file-lock backbone in src/plugin-sdk/file-lock.ts and the per-profile lock path in src/agents/auth-profiles/path-resolve.ts.
- The in-process refresh queue (refreshQueues) and the FIFO gate semantics in refreshOAuthTokenWithLock.
- The doRefreshOAuthTokenWithLock critical section ordering (global lock → per-agent store lock → reload → adopt → refresh → save → mirror).
- mirrorRefreshedCredentialIntoMainStore, shouldMirrorRefreshedOAuthCredential, isSafeToAdoptMainStoreOAuthIdentity, and every other identity / mirror policy helper in oauth-shared.ts and oauth-identity.ts.
- The disk-side fallback chain in resolveOAuthAccess (loadFreshStoredOAuthCredential re-read, retry-once after refresh_token_reused, ensure-from-main inheritance, final OAuthManagerRefreshError).
- The Codex provider plugin (extensions/openai/openai-codex-provider.ts), the embedded runner, the Discord channel, doctor flows, secrets runtime activation, and any public Plugin SDK / API surface.
- CHANGELOG.md, docs, and the existing concurrent-agent test are unmodified.
- No any, no public type changes, no new exports outside the manager closure.

Reproduction

Install OpenClaw 2026.4.26 on Windows 11 with PowerShell.
openclaw models auth login --provider openai-codex to write a fresh credential into ~\.openclaw\agents\main\agent\auth-profiles.json.
Mirror that file byte-for-byte into five per-agent dirs (~\.openclaw\agents\<atlas|axon|nova|nexis|lumi>\agent\auth-profiles.json).
In ~\.openclaw\openclaw.json, configure each agent to use openai-codex/gpt-5.5 and route auth.profiles.openai-codex:<email> -> openai-codex with auth.routing.openai -> openai-codex:<email>.
Bind each agent to its own Discord channel and run openclaw gateway restart.
Wait for the shared OAuth credential to enter its near-expiry window. Send a prompt to one agent (atlas); when its rotated credential is mirrored to main, immediately have an antivirus tool, an openclaw doctor invocation, or any process briefly hold ~\.openclaw\agents\main\agent\auth-profiles.json so the mirror's withFileLock write times out and is silently swallowed. Now post a prompt to a different agent (nova).
Observe: nova's lane fails with OAuth token refresh failed for openai-codex: 401 refresh_token_reused. Inspect nova's on-disk auth-profiles.json and confirm it still holds the pre-rotation credential despite atlas having a freshly rotated token in its own per-agent store.

After this PR, step 7 instead resolves successfully. nova discovers atlas's rotation through the in-process refresh cache, adopts the credential, and writes it to its own per-agent store — without ever touching main on disk and without spending its own already-rotated refresh_token. The deterministic version of step 6 is exercised in CI by the new regression test, which forcibly reverts main on disk after the leader's refresh.

Risk / Mitigation

Risk: Adding a process-scoped credential cache could let a stale refresh result mask a fresh disk-side update, or could leak between unrelated profiles in the same process (including across tests that share the manager singleton).
Mitigation: The cache is keyed by (provider, profileId) and gated by a strict positive identity match (hasOAuthIdentity plus isSafeToAdoptMainStoreOAuthIdentity), so a freshly-seeded credential without identity (and any unrelated profile) cannot fuzzy-match a peer's rotated token. The cache is publish-on-success only — failed refreshes leave it untouched — and it is cleared by the existing resetRefreshQueuesForTest hook so vitest isolation works without changing other test files. The disk-side adoption chain is preserved unchanged: the cache is consulted before the disk paths, but the disk paths still run if the cache misses (e.g., across process restarts), and a successful disk-side adoption still saves to the agent's own store. All three adoption call sites that consult the cache only ever return its credential after passing the existing identity-safety check, which already gates the disk-side adoption path. The full pnpm test src/agents/auth-profiles/ suite passes (16 files / 162 tests, including the 22 oauth-mirror, fallback-to-main, openai-codex-refresh-fallback, and adopt-identity invariants), pnpm exec oxfmt --check and pnpm tsgo:core are clean, and the new regression flips from RED (exact user error) on main to GREEN with this patch.

Change Type (select all)

Bug fix

Scope (select all touched areas)

Agents / OAuth refresh manager (src/agents/auth-profiles/oauth-manager.ts)
Tests (src/agents/auth-profiles/oauth.concurrent-agents.test.ts)

Linked Issue/PR

Fixes #74055

Changed files

CHANGELOG.md (modified, +1/-0)
src/agents/auth-profiles/oauth-manager.ts (modified, +176/-4)
src/agents/auth-profiles/oauth.concurrent-agents.test.ts (modified, +405/-0)

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

Description

OS: Windows 11 (PowerShell)

OpenClaw Version: 2026.4.26

Provider: openai-codex (ChatGPT Plus OAuth)

Model Routing: openai/gpt-5.5 canonical mapping

Steps to Reproduce

Purge Cache: Stopped the gateway and completely deleted the root credentials folder, the global workspace .openclaw folder, and all hidden .openclaw state folders inside the individual agent directories to ensure a sterile environment.

Generate Master Token: Ran openclaw models auth login --provider openai-codex and completed the browser OAuth handshake.

Mirror Auth Profile: Used PowerShell to strictly copy the resulting auth-profiles.json into all 5 isolated agent directories (Atlas, Axon, Nova, Nexis, Lumi).

Configure JSON: Updated openclaw.json to use canonical naming (openai/gpt-5.5) and mapped the auth routing at the bottom:

JSON

"auth": { "profiles": { "openai-codex:[REDACTED_EMAIL]": { "provider": "openai-codex", "mode": "oauth" } }, "routing": { "openai": "openai-codex:[REDACTED_EMAIL]" } }

Boot Gateway: Ran openclaw doctor (archived orphan transcripts, declined forced token refresh), then openclaw gateway restart.

Trigger Swarm: Sent a prompt to one of the agents via Discord bindings.

Expected Behavior

Diagnostics

openclaw status --all output: Plaintext

[Paste the output of this command here before submitting]

openclaw agents list --bindings output: Plaintext

[Paste the output of this command here before submitting]

Additional Context

Steps to reproduce

Start OpenClaw 2026.4.26 with a multi-agent config sharing a mirrored openai-codex auth-profile across separate agent directories.
Send a prompt to the swarm via Discord bindings to trigger concurrent activity.
Observe gateway crash and confirm the 401 refresh_token_reused log line as multiple agents bypass the refresh lock and attempt simultaneous token refreshes.

Expected behavior

According to the OpenClaw multi-agent OAuth documentation, the system's internal refresh lock should serialize the token refresh requests for all agents sharing the mirrored auth-profiles.json, safely rotating the token without triggering a race condition.

Actual behavior

The gateway fails to process the prompt and no reply is posted in Discord; the attached gateway log shows [openai-codex] Token refresh failed: 401 {"error": {"message": "Your refresh token has already been used to generate a new access token...", "code": "refresh_token_reused"}} and the Discord UI returns Model login failed on the gateway for openai-codex.

OpenClaw version

2026.4.26

Operating system

Windows 11

Install method

No response

Model

openai-codex/gpt-5.5

Provider / routing chain

openclaw -> openai-codex

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

The internal OAuth refresh lock in OpenClaw version 2026.4.26 fails to serialize token refresh requests for multiple agents sharing a single ChatGPT Plus account, causing a 401 refresh_token_reused error.

Guidance

Verify that the auth-profiles.json files are identical across all agent directories to ensure the refresh lock can recognize shared token footprints.
Check the OpenClaw documentation for any updates or changes to the multi-agent OAuth configuration that may have introduced this regression.
Consider staggering the boot process of agents or using a single-agent architecture as a temporary workaround to avoid the 401 error.
Review the openclaw.json configuration file to ensure the auth routing is correctly set up for the shared openai-codex provider.

Example

No code snippet is provided as the issue seems to be related to configuration and authentication rather than code.

Notes

The issue appears to be a regression introduced in version 2026.4.26, and the exact cause may require further investigation. The provided workaround of dropping to a single-agent architecture or using a paid stateless API key may not be feasible for all users.

Recommendation

Apply the temporary workaround of staggering the boot process of agents or using a single-agent architecture until a fix is available, as this will prevent the 401 refresh_token_reused error and allow for continued use of the OpenClaw system.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #tensor shape #autograd error #model save/load #optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: v2026.4.26: OAuth Refresh Lock Fails in Multi-Agent Swarm (401 refresh_token_reused) [1 pull requests, 4 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #74214: fix(agents): adopt peer-rotated OAuth credential from in-process cache

Description (problem / solution / changelog)

Summary

Reproduction

Risk / Mitigation

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Changed files

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING