openclaw - 💡(How to fix) Fix EmbeddedAttemptSessionTakeoverError: completed LLM call silently discarded under concurrent same-session writes

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When a second message arrives in the same agent session before the first LLM call completes, the first call is discarded with EmbeddedAttemptSessionTakeoverError — the lock is released during LLM inference, allowing the second message to overwrite the session file and invalidate the first call's fingerprint.

Error Message

gateway.err.log contains 84 occurrences. Representative sample:

2026-05-18T20:56:16.827-07:00 [diagnostic] lane task error: lane=main durationMs=246652 error="EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released: /Users/xiaoou/lobster-team/agents/main/.openclaw-runtime/sessions/ 43febae9-82cf-4e7a-9a49-120018b31401.jsonl"

Some discarded calls had run for as long as 830,691ms (~14 min) before being thrown away.

Root Cause

Root cause analysis:

Code Example

gateway.err.log contains 84 occurrences. Representative sample:

2026-05-18T20:56:16.827-07:00 [diagnostic] lane task error: lane=main
durationMs=246652 error="EmbeddedAttemptSessionTakeoverError: session file
changed while embedded prompt lock was released:
/Users/xiaoou/lobster-team/agents/main/.openclaw-runtime/sessions/
43febae9-82cf-4e7a-9a49-120018b31401.jsonl"

Some discarded calls had run for as long as 830,691ms (~14 min) before 
being thrown away.
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

When a second message arrives in the same agent session before the first LLM call completes, the first call is discarded with EmbeddedAttemptSessionTakeoverError — the lock is released during LLM inference, allowing the second message to overwrite the session file and invalidate the first call's fingerprint.

Steps to reproduce

  1. Start OpenClaw gateway (2026.4.26) on macOS, single agent "main" with multiple entry points (Feishu DM + dashboard + CLI)
  2. Send a message that triggers an LLM call lasting >30 seconds (e.g., a multi-step autonomous task with tool calls)
  3. Within 30 seconds, send a second message in the same session
  4. Observe: the first message never returns a response; the user sees "..." indefinitely

This reproduces consistently whenever two messages arrive within the same LLM inference window. It is not a race condition requiring precise timing — any second message during the first call's inference phase triggers it.

Expected behavior

The first LLM call should complete and return its response to the user. The second message should either wait (pessimistic lock) or be written independently without invalidating the first (sharded sessions). At minimum, a completed LLM call that cost tokens should not be silently discarded.

Actual behavior

The first LLM call's response is silently discarded. The gateway error log records:

EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released: <session-path>.jsonl

84 occurrences of this error in gateway.err.log between 2026-05-18 and 2026-05-19. The discarded call had already completed its full LLM inference (some for up to 830 seconds / ~14 minutes), burning tokens with zero user-visible output.

OpenClaw version

2026.5.18

Operating system

macOS 26.4.1 (Build 25E253)

Install method

npm global install, located at ~/.openclaw/tools/node-v22.22.0/lib/node_modules/openclaw/

Model

mimo/mimo-v2.5-pro (default). Also reproducible with apiset-anthropic/claude-sonnet-4-6 and apiset/qwen3.5-flash. The issue is model-agnostic — it occurs at the session/lock layer, not the LLM layer.

Provider / routing chain

Local OpenClaw gateway at 127.0.0.1:18789, routing to multiple providers (apiset, apiset-anthropic, mimo). Multi-entry: Feishu DM, dashboard web UI, and CLI all targeting the same "main" agent.

Additional provider/model setup details

Single agent ("main") with a model allowlist of 17 models across 3 providers. The agent has 3 Feishu accounts bound to it plus dashboard access. No custom lock configuration — using default OpenClaw session locking. Session files are .jsonl format stored at: agents/main/.openclaw-runtime/sessions/<uuid>.jsonl

Logs, screenshots, and evidence

gateway.err.log contains 84 occurrences. Representative sample:

2026-05-18T20:56:16.827-07:00 [diagnostic] lane task error: lane=main
durationMs=246652 error="EmbeddedAttemptSessionTakeoverError: session file
changed while embedded prompt lock was released:
/Users/xiaoou/lobster-team/agents/main/.openclaw-runtime/sessions/
43febae9-82cf-4e7a-9a49-120018b31401.jsonl"

Some discarded calls had run for as long as 830,691ms (~14 min) before 
being thrown away.

Impact and severity

Severity: Workflow-blocking

Affected: Any user with multiple entry points to the same agent (Feishu + dashboard + CLI), or any user who sends a follow-up message before the first response arrives.

Frequency: Reproducible every time two messages arrive in the same session within one LLM inference window.

Consequences:

  • Tokens burned on discarded LLM calls with zero user-visible output
  • Agent appears frozen/broken (user sees "..." indefinitely)
  • Multi-step autonomous tasks (where the agent chains tool calls) are particularly vulnerable — a user checking progress triggers the bug and kills the running task
  • Makes concurrent use of the same agent (dashboard monitoring + Feishu chat) effectively unsafe

Additional information

Root cause analysis:

The session lock is released before LLM inference begins, not after:

T0: Message A → acquire lock → write session → release lock → LLM inference T1: Message B → acquire lock → overwrite session → release lock → LLM inference T2: A's LLM returns → fingerprint mismatch → takeover error → A discarded

The lock guards only the session file write, not the full request lifecycle.

Suggested fix (pessimistic lock):

acquire_lock → write session → run LLM → write response → release_lock

For single-user and small-team deployments, serializing same-session access is far preferable to silently discarding completed work with burned tokens.

Alternative (sharded sessions): write each turn as an independent file (turn_001_user.json, turn_001_assistant.json, etc.) to eliminate fingerprint conflicts and allow true concurrent writes without a lock.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The first LLM call should complete and return its response to the user. The second message should either wait (pessimistic lock) or be written independently without invalidating the first (sharded sessions). At minimum, a completed LLM call that cost tokens should not be silently discarded.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix EmbeddedAttemptSessionTakeoverError: completed LLM call silently discarded under concurrent same-session writes