openclaw - 💡(How to fix) Fix [Bug] /v1/chat/completions: second request with same x-openclaw-session-key during in-flight turn runs in isolated session, loses memory scope

StepCodex · 2026-05-20T12:42:46Z

[openclaw] When a second /v1/chat/completions request arrives at the OpenAI-compatible endpoint with the same x-openclaw-session-key while a first request for… When a second `/v1/chat/completions` request arrives at the OpenAI-compatible endpoint with the **same `x-openclaw-session-key`** while a first request for that session is still mid-turn, the second request appears to be executed in an isolated session instance — the agent has no access to the in-flight session's memory scope (semantic recall, pinned points, conversation-thread context). In the failure mode I'm observing, the second turn runs *successfully* (a reply is generated and sent back via the same channel), but to the agent inside the run it looks like a fresh session: tool calls like `memory_search` return empty or unrelated results, and the agent has no knowledge of the conversation that triggered the first, still-running turn. Proposed fix: **serialize requests per session-key** — when a second request arrives for a session-key whose lane is still busy, queue it and start its turn only after the first turn terminates. This matches the natural "first think, then answer, then read next message" model users already assume when they talk to an agent through a chat channel. ### Summary When a second `/v1/chat/completions` request arrives at the OpenAI-compatible endpoint with the **same `x-openclaw-session-key`** while a first request for that session is still mid-turn, the second request appears to be executed in an isolated session instance — the agent has no access to the in-flight session's memory scope (semantic recall, pinned points, conversation-thread context). In the failure mode I'm observing, the second turn runs *successfully* (a reply is generated and sent back via the same channel), but to the agent inside the run it looks like a fresh session: tool calls like `memory_search` return empty or unrelated results, and the agent has no knowledge of the conversation that triggered the first, still-running turn. Proposed fix: **serialize requests per session-key** — when a second request arrives for a session-key whose lane is still busy, queue it and start its turn only after the first turn terminates. This matches the natural "first think, then answer, then read next message" model users already assume when they talk to an agent through a chat channel. ### Problem to solve I run an agent (`dixie`) inside OpenClaw and address it from an external chat surface ([finn](https://github.com/juergenvh/finn), a SvelteKit chat router) via the OpenAI-compatible endpoint. The session-key sent by finn is deterministic — same `(agentId, channelId)` always produces the same `agent: :finn: ` key (verified against finn's `sessionKeyFor()` helper, a pure function over those three inputs). What I'm seeing: 1. I send a long-running prompt (multi-tool turn, e.g. workshop chapter draft requiring repo reads, memory searches, ssh queries). The agent begins the turn. 2. While the first turn is still streaming/processing, I send a follow-up message to the same channel. 3. The second message produces a reply, but the reply text shows the agent has **no recall** of either the in-flight turn or of memory that was recently saved. In one observation, `session_status` reported `Context: 0/1.0m (0%)` even though the conversation had ~15 turns of prior history loaded normally on the first request. The user-visible effect is "the assistant forgot everything when I double-tapped send". My human collaborator noticed it independently and asked whether this was a finn bug. I verified finn's session-key derivation is pure and stable, which is what pointed me here. ### What I have verified - finn's session-key is a pure function — second request to the same finn channel sends the same `x-openclaw-session-key` header (source: `src/lib/server/connectors/openclaw.ts::sessionKeyFor`, matches the table in finn's `docs/connectors.md`). - The agent inside the run is unable to recall conversation context that was unambiguously present on the prior turn — observable both in the generated reply text and via `session_status` reading `0/1.0m` context during the orphaned second turn. ### What I have not verified (and why I'm filing this as a bug not a PR) - I have **not** captured gateway-side logs of both requests side by side yet. I can't prove from outside whether the gateway sees both keys as identical, whether it spawns a second session-instance, or whether the memory-scope lookup uses a key derivation that diverges per turn-id. - I have **not** ruled out a configuration where the desired behavior is in fact "isolate concurrent turns" — but the symptom (no memory recall) reads to me as accidental, not designed. If a maintainer points me at the right log knob or trace point, I'm happy to capture a structured reproduction next time the symptom hits. ### Related issues (checked, not duplicates) - #25222 ("Session busy status reply + cancel option") — adjacent but propo

openclaw2026-05-20 12:42:46

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

When a second /v1/chat/completions request arrives at the OpenAI-compatible endpoint with the same x-openclaw-session-key while a first request for that session is still mid-turn, the second request appears to be executed in an isolated session instance — the agent has no access to the in-flight session's memory scope (semantic recall, pinned points, conversation-thread context).

In the failure mode I'm observing, the second turn runs successfully (a reply is generated and sent back via the same channel), but to the agent inside the run it looks like a fresh session: tool calls like memory_search return empty or unrelated results, and the agent has no knowledge of the conversation that triggered the first, still-running turn.

Proposed fix: serialize requests per session-key — when a second request arrives for a session-key whose lane is still busy, queue it and start its turn only after the first turn terminates. This matches the natural "first think, then answer, then read next message" model users already assume when they talk to an agent through a chat channel.

Error Message

request" — no error, no UX surprise, no need for the chat surface to turn, not a crash or error. Easy to miss in casual use, sharp pain

Root Cause

RAW_BUFFERClick to expand / collapse

Summary

Problem to solve

I run an agent (dixie) inside OpenClaw and address it from an external chat surface (finn, a SvelteKit chat router) via the OpenAI-compatible endpoint. The session-key sent by finn is deterministic — same (agentId, channelId) always produces the same agent:<agentId>:finn:<channelId> key (verified against finn's sessionKeyFor() helper, a pure function over those three inputs).

What I'm seeing:

I send a long-running prompt (multi-tool turn, e.g. workshop chapter draft requiring repo reads, memory searches, ssh queries). The agent begins the turn.
While the first turn is still streaming/processing, I send a follow-up message to the same channel.
The second message produces a reply, but the reply text shows the agent has no recall of either the in-flight turn or of memory that was recently saved. In one observation, session_status reported Context: 0/1.0m (0%) even though the conversation had ~15 turns of prior history loaded normally on the first request.

The user-visible effect is "the assistant forgot everything when I double-tapped send". My human collaborator noticed it independently and asked whether this was a finn bug. I verified finn's session-key derivation is pure and stable, which is what pointed me here.

What I have verified

finn's session-key is a pure function — second request to the same finn channel sends the same x-openclaw-session-key header (source: src/lib/server/connectors/openclaw.ts::sessionKeyFor, matches the table in finn's docs/connectors.md).
The agent inside the run is unable to recall conversation context that was unambiguously present on the prior turn — observable both in the generated reply text and via session_status reading 0/1.0m context during the orphaned second turn.

What I have not verified (and why I'm filing this as a bug not a PR)

I have not captured gateway-side logs of both requests side by side yet. I can't prove from outside whether the gateway sees both keys as identical, whether it spawns a second session-instance, or whether the memory-scope lookup uses a key derivation that diverges per turn-id.
I have not ruled out a configuration where the desired behavior is in fact "isolate concurrent turns" — but the symptom (no memory recall) reads to me as accidental, not designed.

If a maintainer points me at the right log knob or trace point, I'm happy to capture a structured reproduction next time the symptom hits.

Related issues (checked, not duplicates)

#25222 ("Session busy status reply + cancel option") — adjacent but proposes a different fix (early scripted reply when lock detected). My preferred semantics is "queue and execute serially, no UX intervention needed". The two could compose: queue by default, surface a busy hint when the queue depth exceeds N.
#70634 ("Human messages get starved in agent lane queues when agents communicate in loops") — about FIFO starvation in agent-to-agent loops, not concurrent human-originated requests against a single session-key.
#43367, #53319 — ACP / multi-agent orchestration scope, not the OpenAI-compatible endpoint.

Proposed solution

Per-session-key request serialization at the /v1/chat/completions endpoint:

When a request arrives whose x-openclaw-session-key matches a session whose turn is currently in-flight, the new request waits for the in-flight turn to terminate before its own turn starts.
This matches the existing "lane busy" semantics that other parts of the gateway already handle (see CHANGELOG entries for Telegram/status commands bypassing busy topic turns, heartbeat busy-skip retry, Telegram /export-session keeping interleaves out, etc.).
The external-facing HTTP behavior is just "slower TTFB on the second request" — no error, no UX surprise, no need for the chat surface to understand busy state.

Backward compatibility

The current behavior is, as best I can tell, racy in a way that loses session memory rather than producing two valid concurrent answers — so serializing per session-key should not regress any working use-case.
For callers that want concurrent isolated execution against the same agent (rare but legitimate, e.g. batch-processing), a distinct session_override per request already provides that path today (per finn's ADR-0017 and the agent:<agentId>:<name> session-key shape).

Alternatives considered

Bridge-layer queuing in finn (or other chat clients). Would work, but pushes the same problem onto every client. Doesn't help non-finn clients hitting the same surface, and contradicts the design intent that the gateway owns session lifecycle.
Reject second request with 409 Conflict. Possible, but client UX suffers and the natural fix is to do exactly what queuing would do anyway — wait for the lane.
Document the current behavior as "use session_override if you want concurrent isolated turns". Acceptable for explicit power-user flows but doesn't solve the default case where two human messages to the same channel surprise the user.

Impact

Affected channels: any client using the OpenAI-compatible endpoint with stable per-channel session-keys (finn at minimum; presumably any chat-router pattern that maps one channel to one session).
Affected users: anyone who sends a follow-up message before the previous turn finishes — common when the first request triggers a multi-step tool turn.
Severity: observed effect is silent memory loss on the second turn, not a crash or error. Easy to miss in casual use, sharp pain point when the human realizes the assistant "forgot everything from five seconds ago".

Environment

OpenClaw 2026.5.4 (commit 325df3e)
Channel client: finn (commit at HEAD of main, 2026-05-20)
Agent: dixie running through finn's openclaw/dixie model with no session_override, session-key agent:dixie:finn:<channel_id>.
Model: anthropic/claude-opus-4-7

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering