openclaw - ✅(Solved) Fix EPERM on auth-profiles.json causes full gateway failure cascade (Windows) [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62099Fetched 2026-04-08 03:09:00
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

auth-profiles.json can acquire a Windows ReadOnly attribute during concurrent config writes, causing every LLM request to fail with EPERM: operation not permitted. The error is treated as fatal rather than non-fatal, which cascades through the fallback chain and makes the gateway completely unresponsive.

Error Message

Error: EPERM: operation not permitted, copyfile 
  'C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json.<uuid>.tmp' 
  -> 'C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json'
    at Object.copyFileSync (node:fs:3104:11)
    at renameJsonFileWithFallback (json-file-1PGlTqjr.js:63:7)
    at saveJsonFile (json-file-1PGlTqjr.js:98:3)
    at saveAuthProfileStore (store-HF_Z-jKz.js:427:2)
    at markAuthProfileGood (profiles-DKQdaSwr.js:76:2)
    at pi-embedded-DWASRjxE.js:36473:7

Root Cause

Probable Root Cause

Fix Action

Workaround

attrib -R "C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json"

Then restart the gateway.

PR fix notes

PR #67064: fix(auth-profiles): make post-success bookkeeping saves non-fatal

Description (problem / solution / changelog)

Summary

Fixes #62099. On Windows, concurrent config hot-reload can leave auth-profiles.json with a ReadOnly attribute. The atomic write in saveAuthProfileStore then throws EPERM, and because markAuthProfileGood / markAuthProfileUsed / markAuthProfileFailure run as post-completion bookkeeping, that throw used to cascade into the LLM request that had already succeeded. Fallback triggers, hits the same read-only file, fails the same way. The gateway becomes unresponsive; restarts don't help because the file attribute persists.

The fix wraps the body of each mark* function in try/catch, logging the persistence failure and continuing. Caller-visible behavior is unchanged on the happy path.

What this does NOT change

saveAuthProfileStore itself still throws on failure. OAuth token refresh (in oauth.ts) depends on that behavior, since a silent token-save failure would be a security concern. Only the three mark* functions that run after a successful provider call now tolerate save errors.

Scope

  • src/agents/auth-profiles/profiles.ts - wrap markAuthProfileGood body in try/catch, log warn
  • src/agents/auth-profiles/usage.ts - wrap markAuthProfileUsed and markAuthProfileFailure bodies in try/catch, log warn
  • src/agents/auth-profiles/usage.persist-nonfatal.test.ts - new regression tests, one per mark function, simulating EPERM from both the lock-guarded and direct save paths

markAuthProfileCooldown delegates to markAuthProfileFailure so it's covered transitively.

Before

Error: EPERM: operation not permitted, copyfile
  'auth-profiles.json.<uuid>.tmp' -> 'auth-profiles.json'
    at Object.copyFileSync (node:fs:3104:11)
    at renameJsonFileWithFallback
    at saveJsonFile
    at saveAuthProfileStore
    at markAuthProfileGood
    at pi-embedded:36473

The LLM response arrived, then the save threw, then every fallback hit the same file, then the gateway ran out of models. User hits attrib -R on the file and restarts to recover.

After

The save throws, the catch runs, the warning lands in the subsystem log, the mark function returns void, the LLM request completes normally. lastGood / usage stats stay slightly stale until the next successful save, which is the right tradeoff for a bookkeeping write.

Testing

  • New tests in usage.persist-nonfatal.test.ts pass (3/3). Mocks both updateAuthProfileStoreWithLock and saveAuthProfileStore to throw EPERM, asserts the mark* functions resolve without throwing.
  • Existing usage.test.ts (39 tests) and auth-profiles.markauthprofilefailure.test.ts (9 tests) still pass. Also verified auth-profiles.runtime-snapshot-save.test.ts (1 test).
  • Two pre-existing test failures in state-observation.test.ts and oauth.fallback-to-main-agent.test.ts already fail on main, unrelated to this change.
  • oxlint --type-aware clean on modified files.
  • tsgo --noEmit clean (exit 0).
  • oxfmt --check clean.

Risk

Low. Behavioral change is scoped to the error path of a bookkeeping function. Callers that previously got an unhandled rejection on EPERM now get a resolved promise, which is the intended outcome. Profile state in memory stays authoritative; disk just gets slightly stale until the next successful write.

AI disclosure

This change was drafted with Claude Code acting as coding assistant. The issue was picked from the triaged backlog, the root-cause analysis and implementation plan were produced interactively, and tests were written against the exact stack trace in the issue report. Human review of the patch and regression tests before submission.

Understanding confirmation

Yes, I've read CONTRIBUTING.md and VISION.md. This is a single-concern bug fix with no dependency updates, no schema changes, and no new public APIs. The affected module is src/agents/auth-profiles/. No Carbon changes, no @ts-nocheck, no lint suppression.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/auth-profiles/profiles.ts (modified, +32/-20)
  • src/agents/auth-profiles/usage.persist-nonfatal.test.ts (added, +87/-0)
  • src/agents/auth-profiles/usage.ts (modified, +113/-91)

PR #67077: fix(auth-profiles): make post-success bookkeeping saves non-fatal

Description (problem / solution / changelog)

Summary

Fixes #62099. On Windows, concurrent config hot-reload can leave auth-profiles.json with a ReadOnly attribute. The atomic write in saveAuthProfileStore then throws EPERM, and because markAuthProfileGood / markAuthProfileUsed / markAuthProfileFailure run as post-completion bookkeeping, that throw used to cascade into the LLM request that had already succeeded. Fallback triggers, hits the same read-only file, fails the same way. The gateway becomes unresponsive; restarts don't help because the file attribute persists.

The fix wraps the body of each mark* function in try/catch, logging the persistence failure and continuing. Caller-visible behavior is unchanged on the happy path.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Auth / tokens

Linked Issue/PR

  • Closes #62099

User-visible / Behavior Changes

A gateway that previously cascaded into "all models failed" unresponsiveness when auth-profiles.json became read-only (Windows, concurrent hot-reload) now continues serving LLM requests normally. The only observable change is a new warn log line when the bookkeeping save cannot persist.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

saveAuthProfileStore itself still throws on failure. OAuth token refresh (in oauth.ts) depends on that behavior, since a silent token-save failure would be a security concern. Only the three mark* functions that run after a successful provider call now tolerate save errors. markAuthProfileCooldown delegates to markAuthProfileFailure so it's covered transitively.

Repro + Verification

Environment

  • OS: Windows 11 (reproduced from user report); Linux in CI
  • Runtime/container: Node 22
  • Model/provider: Anthropic primary, Ollama fallback (per user report)
  • Integration/channel (if any): None
  • Relevant config (redacted): auth-profiles.json with Windows ReadOnly attribute set

Steps

  1. Gateway runs on Windows with primary + fallback providers configured
  2. User adds a new model to openclaw.json while the gateway is hot-reloading
  3. Windows sets ReadOnly on auth-profiles.json during the concurrent rename
  4. Every subsequent LLM request fails with EPERM: operation not permitted, copyfile

Expected

LLM requests complete. Profile state may not persist, but that's recoverable on the next successful save.

Actual

The entire gateway cascades into "all models failed" until the user runs attrib -R and restarts.

Evidence

  • Failing test/log before + passing after - see new usage.persist-nonfatal.test.ts
  • Stack trace from the issue:
Error: EPERM: operation not permitted, copyfile
  'auth-profiles.json.<uuid>.tmp' -> 'auth-profiles.json'
    at Object.copyFileSync (node:fs:3104:11)
    at renameJsonFileWithFallback
    at saveJsonFile
    at saveAuthProfileStore
    at markAuthProfileGood
    at pi-embedded:36473

Human Verification (required)

Verified scenarios:

  • Ran new regression tests (usage.persist-nonfatal.test.ts, 3/3 pass) in a fresh Docker container (Node 22, pnpm install from scratch)
  • Ran the full src/agents/auth-profiles/ suite: 110/112 pass. The 2 failures (session-override.test.ts and oauth.openai-codex-refresh-fallback.test.ts variously) also fail on clean main with identical counts, so they're pre-existing flakiness unrelated to this change
  • pnpm tsgo --noEmit clean (exit 0)
  • oxlint --type-aware on modified files: 0 warnings, 0 errors
  • oxfmt --check on modified files: clean

Edge cases checked:

  • Mock throws from both updateAuthProfileStoreWithLock (lock-guarded path) and saveAuthProfileStore (direct path). Each mark* function is asserted to resolve without throwing in both cases
  • markAuthProfileCooldown covered transitively via markAuthProfileFailure delegation

What I did not verify:

  • Live Windows reproduction of the ReadOnly race. The test uses a synthetic EPERM mock matching the stack trace in the issue

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

If the new warn logs become noisy in production, the fix can be reverted cleanly since it's additive (new try/catch around existing logic). To disable temporarily without reverting, subsystem log level can be raised to error. Files to restore: src/agents/auth-profiles/profiles.ts, src/agents/auth-profiles/usage.ts. Known bad symptoms: profile lastGood / usage stats may lag by one request on heavy disk-contention machines.

Risks and Mitigations

  • Risk: Silencing all throws in mark* could mask a genuine bug in the usage-stats computation

    • Mitigation: The try/catch is narrow (function body only), computation is pure (no IO), and failures land in log.warn with the full error message so ops can still see them
  • Risk: saveAuthProfileStore being called from other paths (OAuth token refresh, store inheritance) might also hit EPERM and still throw

    • Mitigation: Intentional. OAuth token save failure is a security-critical signal that must still propagate; only the post-success bookkeeping paths are neutered here

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/auth-profiles/profiles.ts (modified, +38/-20)
  • src/agents/auth-profiles/usage.persist-nonfatal.test.ts (added, +89/-0)
  • src/agents/auth-profiles/usage.ts (modified, +142/-109)

Code Example

Error: EPERM: operation not permitted, copyfile 
  'C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json.<uuid>.tmp' 
  -> 'C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json'
    at Object.copyFileSync (node:fs:3104:11)
    at renameJsonFileWithFallback (json-file-1PGlTqjr.js:63:7)
    at saveJsonFile (json-file-1PGlTqjr.js:98:3)
    at saveAuthProfileStore (store-HF_Z-jKz.js:427:2)
    at markAuthProfileGood (profiles-DKQdaSwr.js:76:2)
    at pi-embedded-DWASRjxE.js:36473:7

---

attrib -R "C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json"
RAW_BUFFERClick to expand / collapse

Bug Report: EPERM on auth-profiles.json causes full gateway failure cascade

Summary

auth-profiles.json can acquire a Windows ReadOnly attribute during concurrent config writes, causing every LLM request to fail with EPERM: operation not permitted. The error is treated as fatal rather than non-fatal, which cascades through the fallback chain and makes the gateway completely unresponsive.

Environment

  • OpenClaw: 2026.4.5 (3e72c03)
  • OS: Windows 11 (10.0.26200, x64)
  • Node: v24.14.1
  • Providers: Anthropic (claude-opus-4-6), Ollama (glm-4.7-flash, gemma4:26b)

Steps to Reproduce

  1. Have a running gateway with Anthropic as primary model and Ollama as fallback (or vice versa)
  2. Add a new Ollama model to the config while the gateway is running (e.g., adding gemma4:26b to openclaw.json models list)
  3. The gateway hot-reloads the config and updates models.json and auth-profiles.json
  4. Under certain timing conditions, auth-profiles.json acquires the Windows ReadOnly file attribute
  5. All subsequent LLM requests fail

Observed Behavior

Once ReadOnly is set on auth-profiles.json:

  1. Every LLM request attempts to write to auth-profiles.json (via markAuthProfileGood)
  2. The atomic write (copyFileSync from .tmp to target) fails with EPERM
  3. This error is treated as a request-level failure, not just a profile-save failure
  4. The fallback system activates: primary model (e.g., ollama/glm-4.7-flash) → fallback model (e.g., anthropic/claude-opus-4-6)
  5. The fallback model hits the same EPERM on the same file → also fails
  6. Result: "All models failed" — complete gateway unresponsiveness
  7. Gateway restarts do NOT fix it (the ReadOnly attribute persists on disk)
  8. Each retry cycle inflates the session context with error metadata, rapidly consuming the context window

Stack Trace

Error: EPERM: operation not permitted, copyfile 
  'C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json.<uuid>.tmp' 
  -> 'C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json'
    at Object.copyFileSync (node:fs:3104:11)
    at renameJsonFileWithFallback (json-file-1PGlTqjr.js:63:7)
    at saveJsonFile (json-file-1PGlTqjr.js:98:3)
    at saveAuthProfileStore (store-HF_Z-jKz.js:427:2)
    at markAuthProfileGood (profiles-DKQdaSwr.js:76:2)
    at pi-embedded-DWASRjxE.js:36473:7

Expected Behavior

  1. Profile write failure should be non-fatal. Failing to save "this API key worked" should not abort the entire LLM request. The response was already received — the profile write is bookkeeping.
  2. Atomic file writes should handle ReadOnly gracefully. renameJsonFileWithFallback should detect the ReadOnly attribute and either clear it or log a warning rather than throwing a fatal error.
  3. Error-loop inflation should be bounded. Failed retries should not dump error metadata into the session context, as this accelerates context exhaustion.

Workaround

attrib -R "C:\Users\OpenClaw\.openclaw\agents\main\agent\auth-profiles.json"

Then restart the gateway.

Impact

  • Gateway becomes completely unresponsive (no LLM requests succeed)
  • Gateway restarts do not fix it (file attribute persists)
  • Fallback chain burns paid API tokens on requests that will fail anyway
  • Session context inflates rapidly from error metadata (~84% of 200k context window in minutes)
  • User must manually identify and fix the file attribute — no error message points to the actual cause

Probable Root Cause

Race condition in the atomic JSON file write logic (renameJsonFileWithFallback) when multiple config files are being updated concurrently during hot-reload. On Windows, a failed rename falling back to copyFileSync may leave the target file with a ReadOnly attribute under certain timing conditions, or Windows itself may set ReadOnly as a protective measure during concurrent file access.

extent analysis

TL;DR

  • The most likely fix is to modify the renameJsonFileWithFallback function to handle the ReadOnly attribute on Windows by either clearing it or logging a warning instead of throwing a fatal error.

Guidance

  • Identify and modify the renameJsonFileWithFallback function in json-file-1PGlTqjr.js to check for and handle the ReadOnly attribute before attempting to write to auth-profiles.json.
  • Consider implementing a retry mechanism with a bounded number of attempts to prevent error-loop inflation and session context exhaustion.
  • Review the atomic file write logic to ensure it can handle concurrent updates and avoid leaving files in a ReadOnly state.
  • Test the changes on Windows to ensure the fix works as expected and does not introduce new issues.

Example

// Example of how to check and clear the ReadOnly attribute in Node.js on Windows
const fs = require('fs');
const path = require('path');

function clearReadOnlyAttribute(filePath) {
  try {
    // Check if the file has the ReadOnly attribute
    const stats = fs.statSync(filePath);
    if (stats.mode & 0o444) {
      // Clear the ReadOnly attribute
      fs.chmodSync(filePath, stats.mode & ~0o444);
    }
  } catch (error) {
    console.error(`Error clearing ReadOnly attribute: ${error}`);
  }
}

// Call clearReadOnlyAttribute before attempting to write to auth-profiles.json
clearReadOnlyAttribute('C:\\Users\\OpenClaw\\.openclaw\\agents\\main\\agent\\auth-profiles.json');

Notes

  • The provided workaround using attrib -R can be used as a temporary fix, but it does not address the underlying issue and may need to be reapplied after each gateway restart.
  • The root cause of the issue is likely related to the atomic file write logic and the handling of concurrent updates on Windows.

Recommendation

  • Apply the workaround using attrib -R as a temporary fix, and then modify the renameJsonFileWithFallback function to handle the ReadOnly attribute as described in the guidance section. This will provide a more permanent solution to the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING