openclaw - 💡(How to fix) Fix EmbeddedAttemptSessionTakeoverError: self-inflicted session file modification during lock-free window (race condition)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

// releaseForPrompt() records fingerprint then releases lock async releaseForPrompt() { fenceFingerprint = await readSessionFileFingerprint(sessionFile); fenceActive = true; await lock.release(); }

// assertSessionFileFence() checks fingerprint hasn't changed async function assertSessionFileFence() { const current = await readSessionFileFingerprint(sessionFile); if (!sameSessionFileFingerprint(fenceFingerprint, current)) { // Only exception: growth is pure assistant transcript entries if (await changeLooksLikeOwnedPromptOutput({...})) { fenceFingerprint = current; return; } throw new EmbeddedAttemptSessionTakeoverError(sessionFile); } }

// Fingerprint uses nanosecond-precision mtime function sameSessionFileFingerprint(left, right) { return left.dev === right.dev && left.ino === right.ino && left.size === right.size && left.mtimeNs === right.mtimeNs && left.ctimeNs === right.ctimeNs; }

Root Cause

Source: dist/selection-BmjEdnnA.js lines 7945-8050

// releaseForPrompt() records fingerprint then releases lock
async releaseForPrompt() {
    fenceFingerprint = await readSessionFileFingerprint(sessionFile);
    fenceActive = true;
    await lock.release();
}

// assertSessionFileFence() checks fingerprint hasn't changed
async function assertSessionFileFence() {
    const current = await readSessionFileFingerprint(sessionFile);
    if (!sameSessionFileFingerprint(fenceFingerprint, current)) {
        // Only exception: growth is pure assistant transcript entries
        if (await changeLooksLikeOwnedPromptOutput({...})) {
            fenceFingerprint = current; return;
        }
        throw new EmbeddedAttemptSessionTakeoverError(sessionFile);
    }
}

// Fingerprint uses nanosecond-precision mtime
function sameSessionFileFingerprint(left, right) {
    return left.dev === right.dev && left.ino === right.ino
        && left.size === right.size
        && left.mtimeNs === right.mtimeNs
        && left.ctimeNs === right.ctimeNs;
}

The fingerprint comparison is correct in principle but the invariant assumption ("no internal process will write to the session file during the lock-free window") is violated by the gateway's own async pipeline.

Fix Action

Fix / Workaround

  • Version: 2026.5.20
  • Trigger: Any isolated cron job with idealab/claude-opus-4-6 that has a ~20s+ model response time
  • Frequency: 100% reproducible once timing corridor is hit (7/7 consecutive failures for the same job)
  • Workaround: Switching to a different provider (e.g. dashscope/deepseek-v4-pro or even idealab/gpt-5.4) avoids the issue — suggesting the race is provider-specific in the streaming/auth initialization path

Code Example

// releaseForPrompt() records fingerprint then releases lock
async releaseForPrompt() {
    fenceFingerprint = await readSessionFileFingerprint(sessionFile);
    fenceActive = true;
    await lock.release();
}

// assertSessionFileFence() checks fingerprint hasn't changed
async function assertSessionFileFence() {
    const current = await readSessionFileFingerprint(sessionFile);
    if (!sameSessionFileFingerprint(fenceFingerprint, current)) {
        // Only exception: growth is pure assistant transcript entries
        if (await changeLooksLikeOwnedPromptOutput({...})) {
            fenceFingerprint = current; return;
        }
        throw new EmbeddedAttemptSessionTakeoverError(sessionFile);
    }
}

// Fingerprint uses nanosecond-precision mtime
function sameSessionFileFingerprint(left, right) {
    return left.dev === right.dev && left.ino === right.ino
        && left.size === right.size
        && left.mtimeNs === right.mtimeNs
        && left.ctimeNs === right.ctimeNs;
}
RAW_BUFFERClick to expand / collapse

Bug Description

Cron jobs using idealab/claude-opus-4-6 as the model consistently fail with EmbeddedAttemptSessionTakeoverError when the session file fingerprint (dev/ino/size/mtimeNs/ctimeNs) changes between releaseForPrompt() and the subsequent assertSessionFileFence() check.

The modification is self-inflicted — the gateway's own internal async process (likely memory-core plugin indexing, model-snapshot write, or trajectory sync) modifies the .jsonl session file during the lock-free window while waiting for the model response.

Reproduction

  • Version: 2026.5.20
  • Trigger: Any isolated cron job with idealab/claude-opus-4-6 that has a ~20s+ model response time
  • Frequency: 100% reproducible once timing corridor is hit (7/7 consecutive failures for the same job)
  • Workaround: Switching to a different provider (e.g. dashscope/deepseek-v4-pro or even idealab/gpt-5.4) avoids the issue — suggesting the race is provider-specific in the streaming/auth initialization path

Evidence

  1. The error appears in logs since 5/19, hitting multiple job types intermittently:

    • memory-capture-fallback (ops agent)
    • daily-ops-review (ops agent)
    • dreaming-narrative (main agent)
    • Dashboard sessions (main agent)
  2. Same agent + same model + different prompt size = different outcome:

    • Short prompt job (capture-fallback, ~4s model response) → succeeds
    • Long prompt job (daily-ops-review, ~24s model response) → fails every time
  3. Gateway restart does NOT fix it (confirmed: restarted, immediately failed again)

  4. Switching model to non-Opus provider → session lock error disappears (fails on tool error instead, but the lock race is gone)

Root Cause Analysis

Source: dist/selection-BmjEdnnA.js lines 7945-8050

// releaseForPrompt() records fingerprint then releases lock
async releaseForPrompt() {
    fenceFingerprint = await readSessionFileFingerprint(sessionFile);
    fenceActive = true;
    await lock.release();
}

// assertSessionFileFence() checks fingerprint hasn't changed
async function assertSessionFileFence() {
    const current = await readSessionFileFingerprint(sessionFile);
    if (!sameSessionFileFingerprint(fenceFingerprint, current)) {
        // Only exception: growth is pure assistant transcript entries
        if (await changeLooksLikeOwnedPromptOutput({...})) {
            fenceFingerprint = current; return;
        }
        throw new EmbeddedAttemptSessionTakeoverError(sessionFile);
    }
}

// Fingerprint uses nanosecond-precision mtime
function sameSessionFileFingerprint(left, right) {
    return left.dev === right.dev && left.ino === right.ino
        && left.size === right.size
        && left.mtimeNs === right.mtimeNs
        && left.ctimeNs === right.ctimeNs;
}

The fingerprint comparison is correct in principle but the invariant assumption ("no internal process will write to the session file during the lock-free window") is violated by the gateway's own async pipeline.

Suggested Fixes

  1. Drain all pending session writes before recording fingerprint — ensure no async internal write is in-flight when releaseForPrompt() snapshots the fingerprint
  2. Relax fingerprint to size-only — if the file grew by known-good internal entries (not just assistant output), allow it
  3. Add grace period — re-read fingerprint after a short delay if mismatch detected, to handle writes that were "in the pipeline" at snapshot time
  4. Provider-aware lock timing — if certain providers trigger additional session writes during auth/streaming setup, account for them in the lock lifecycle

Environment

  • macOS 14.6 (arm64)
  • Node v22.19.0
  • OpenClaw 2026.5.20
  • Plugins: browser, memory-core, searxng, skill-trigger-engine
  • 4 configured agents (main, ops, scout, editor)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix EmbeddedAttemptSessionTakeoverError: self-inflicted session file modification during lock-free window (race condition)