openclaw - ✅(Solved) Fix [Bug]: chat final event can be suppressed when lifecycle registry shift misses after streamed output [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74614Fetched 2026-04-30 06:22:18
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

A completed chat run can leave the Control UI/TUI stuck in a running or streaming state because the lifecycle terminal handler may return early after streamed assistant deltas were already delivered, suppressing the final chat event.

Error Message

  1. Hit a lifecycle terminal event (phase: "end" or phase: "error") where chatRunState.registry.peek(evt.runId) previously found a link, but chatRunState.registry.shift(evt.runId) returns undefined.

Root Cause

A completed chat run can leave the Control UI/TUI stuck in a running or streaming state because the lifecycle terminal handler may return early after streamed assistant deltas were already delivered, suppressing the final chat event.

Fix Action

Fix / Workaround

A local dist patch using this approach stopped the stuck-streaming symptom without changing transcript content or producing duplicate final assistant messages in the observed case.

PR fix notes

PR #74678: fix(gateway/chat): emit terminal chat final via fallback when registry entry drifts between peek and shift

Description (problem / solution / changelog)

Problem

Fixes #74614.

When chatRunState.registry.peek(evt.runId) succeeds but the subsequent registry.shift(evt.runId) returns undefined (race between concurrent terminal lifecycle events for the same run), the gateway lifecycle handler was returning early before calling emitChatFinal. This left Control UI and TUI stuck in streaming / running state even though the assistant text had already arrived.

// Before: returns early and swallows the terminal event
const finished = chatRunState.registry.shift(evt.runId);
if (!finished) {
  clearAgentRunContext(evt.runId);
  return;  // ← silent drop
}
emitChatFinal(finished.sessionKey, ...);

Fix

Invert the condition so that when shift() misses, the code falls through to the same sessionKey/eventRunId fallback path used when chatLink is absent — the terminal event is never silently suppressed.

// After: fallback to sessionKey/eventRunId when shift() misses
if (finished) {
  emitChatFinal(finished.sessionKey, finished.clientRunId, ...);
} else {
  // Registry entry drifted — use the fallback path
  emitChatFinal(sessionKey, eventRunId, ...);
}

Changes

  • src/gateway/server-chat.ts — invert the shift-miss branch to fall through to fallback emitChatFinal
  • src/gateway/server-chat.agent-events.test.ts — add regression test for the peek-succeeds-shift-misses race
  • CHANGELOG.md — Unreleased entry

Tests

53/53 server-chat.agent-events tests pass locally including the new regression test.

Test Files  1 passed (1)
     Tests  53 passed (53)

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/gateway/server-chat.agent-events.test.ts (modified, +37/-0)
  • src/gateway/server-chat.ts (modified, +19/-7)

Code Example

const chatLink = chatRunState.registry.peek(evt.runId);
// ... sessionKey/clientRunId/eventRunId resolved from chatLink or fallbacks ...

if (chatLink) {
  const finished = chatRunState.registry.shift(evt.runId);
  if (!finished) {
    clearAgentRunContext(evt.runId);
    return;
  }
  emitChatFinal(finished.sessionKey, finished.clientRunId, ...);
} else {
  emitChatFinal(sessionKey, eventRunId, ...);
}

---

Observed symptoms:
- Full assistant reply text was visible from streamed deltas.
- Session transcript recorded completed assistant messages with stopReason: "stop".
- Session store later reported status: "done".
- Follow-up turns continued working on the same session.
- The UI/TUI activity state sometimes remained `streaming | connected` / running instead of retiring to idle.

Upstream code path observed in src/gateway/server-chat.ts on main:

const chatLink = chatRunState.registry.peek(evt.runId);
const eventSessionKey =
  typeof evt.sessionKey === "string" && evt.sessionKey.trim() ? evt.sessionKey : undefined;
const sessionKey =
  chatLink?.sessionKey ?? eventSessionKey ?? resolveSessionKeyForRun(evt.runId);
const clientRunId = chatLink?.clientRunId ?? evt.runId;
const eventRunId = chatLink?.clientRunId ?? evt.runId;

if (isControlUiVisible && sessionKey) {
  if (!isAborted) {
    if (chatLink) {
      const finished = chatRunState.registry.shift(evt.runId);
      if (!finished) {
        clearAgentRunContext(evt.runId);
        return;
      }
      emitChatFinal(finished.sessionKey, finished.clientRunId, ...);
    } else {
      emitChatFinal(sessionKey, eventRunId, ...);
    }
  }
}

---

if (chatLink) {
  const finished = chatRunState.registry.shift(evt.runId);
  if (finished) {
    emitChatFinal(finished.sessionKey, finished.clientRunId, ...);
  } else {
    emitChatFinal(sessionKey, eventRunId, ...);
  }
} else {
  emitChatFinal(sessionKey, eventRunId, ...);
}
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

A completed chat run can leave the Control UI/TUI stuck in a running or streaming state because the lifecycle terminal handler may return early after streamed assistant deltas were already delivered, suppressing the final chat event.

Steps to reproduce

  1. Start a long-running chat run through the OpenClaw gateway with live chat streaming enabled.
  2. Let assistant deltas stream successfully to the client so the response text is visible.
  3. Hit a lifecycle terminal event (phase: "end" or phase: "error") where chatRunState.registry.peek(evt.runId) previously found a link, but chatRunState.registry.shift(evt.runId) returns undefined.
  4. Observe that emitChatFinal(...) is skipped and the client can remain in streaming / running / Stop state even though the assistant text arrived.

Expected behavior

A terminal lifecycle event should always retire the visible chat run when enough session/run context exists to do so. If the registry entry drifted between peek() and shift(), the gateway should use the already-resolved sessionKey / eventRunId fallback and still emit the final chat event.

Actual behavior

The current terminal lifecycle path can return early before emitting chat final:

const chatLink = chatRunState.registry.peek(evt.runId);
// ... sessionKey/clientRunId/eventRunId resolved from chatLink or fallbacks ...

if (chatLink) {
  const finished = chatRunState.registry.shift(evt.runId);
  if (!finished) {
    clearAgentRunContext(evt.runId);
    return;
  }
  emitChatFinal(finished.sessionKey, finished.clientRunId, ...);
} else {
  emitChatFinal(sessionKey, eventRunId, ...);
}

If peek() succeeds but shift() misses, the code clears context and returns instead of using the same fallback path used when chatLink is absent.

OpenClaw version

Observed against 2026.4.26-era gateway behavior; current upstream main still contains the same code shape in src/gateway/server-chat.ts.

Operating system

Ubuntu 24.04 / Linux 6.17.0-22-generic x86_64

Install method

npm global

Model

venice/openai-gpt-54 in the observed local incident. The bug is in gateway lifecycle finalization and is not model-specific.

Provider / routing chain

OpenClaw gateway -> Venice/OpenAI-compatible provider in the observed local incident.

Additional provider/model setup details

The provider had already streamed assistant text successfully. The failure mode is downstream of provider generation: gateway lifecycle finalization skips emitChatFinal(...) after the streamed text is visible.

Logs, screenshots, and evidence

Observed symptoms:
- Full assistant reply text was visible from streamed deltas.
- Session transcript recorded completed assistant messages with stopReason: "stop".
- Session store later reported status: "done".
- Follow-up turns continued working on the same session.
- The UI/TUI activity state sometimes remained `streaming | connected` / running instead of retiring to idle.

Upstream code path observed in src/gateway/server-chat.ts on main:

const chatLink = chatRunState.registry.peek(evt.runId);
const eventSessionKey =
  typeof evt.sessionKey === "string" && evt.sessionKey.trim() ? evt.sessionKey : undefined;
const sessionKey =
  chatLink?.sessionKey ?? eventSessionKey ?? resolveSessionKeyForRun(evt.runId);
const clientRunId = chatLink?.clientRunId ?? evt.runId;
const eventRunId = chatLink?.clientRunId ?? evt.runId;

if (isControlUiVisible && sessionKey) {
  if (!isAborted) {
    if (chatLink) {
      const finished = chatRunState.registry.shift(evt.runId);
      if (!finished) {
        clearAgentRunContext(evt.runId);
        return;
      }
      emitChatFinal(finished.sessionKey, finished.clientRunId, ...);
    } else {
      emitChatFinal(sessionKey, eventRunId, ...);
    }
  }
}

Impact and severity

Affected: users of live gateway chat surfaces (Control UI/TUI/websocket clients) when the run registry entry drifts before terminal lifecycle handling. Severity: Medium to high. The generated answer may be visible, but the client can remain stuck in a running/streaming state and operators may think the run is still active. Frequency: Intermittent; depends on registry/lifecycle timing. Consequence: stuck Stop / streaming UI state, confusing operator feedback, possible need to restart or manually clear state.

Additional information

Suggested low-risk fix:

When chatLink exists but registry.shift(evt.runId) returns undefined, do not return before finalizing. Fall back to the already-resolved sessionKey and eventRunId and emit the final event once:

if (chatLink) {
  const finished = chatRunState.registry.shift(evt.runId);
  if (finished) {
    emitChatFinal(finished.sessionKey, finished.clientRunId, ...);
  } else {
    emitChatFinal(sessionKey, eventRunId, ...);
  }
} else {
  emitChatFinal(sessionKey, eventRunId, ...);
}

A local dist patch using this approach stopped the stuck-streaming symptom without changing transcript content or producing duplicate final assistant messages in the observed case.

Related-but-not-identical issue found before filing:

  • #42011 describes Control UI stuck on Stop after embedded timeout / missing terminal lifecycle. This report is narrower: it identifies a specific terminal lifecycle finalization path where peek() can succeed, shift() can miss, and emitChatFinal(...) is suppressed.

extent analysis

TL;DR

The issue can be fixed by modifying the terminal lifecycle handling to emit the final chat event even when registry.shift(evt.runId) returns undefined, using the already-resolved sessionKey and eventRunId as a fallback.

Guidance

  • Identify the specific code path in src/gateway/server-chat.ts where chatLink exists but registry.shift(evt.runId) returns undefined, and modify it to emit the final chat event using the fallback values.
  • Verify that the modified code handles the case where registry.shift(evt.runId) returns undefined without clearing the agent run context prematurely.
  • Test the modified code with the suggested fix to ensure it resolves the stuck-streaming symptom without introducing duplicate final assistant messages.
  • Review related issues, such as #42011, to ensure the fix does not introduce new problems.

Example

if (chatLink) {
  const finished = chatRunState.registry.shift(evt.runId);
  if (finished) {
    emitChatFinal(finished.sessionKey, finished.clientRunId, ...);
  } else {
    emitChatFinal(sessionKey, eventRunId, ...);
  }
} else {
  emitChatFinal(sessionKey, eventRunId, ...);
}

Notes

The suggested fix is a low-risk modification to the existing code, and a local dist patch has already been tested to stop the stuck-streaming symptom without changing transcript content or producing duplicate final assistant messages.

Recommendation

Apply the suggested workaround by modifying the terminal lifecycle handling to emit the final chat event using the fallback values when registry.shift(evt.runId) returns undefined. This fix is a targeted solution to the specific issue and has been tested to resolve the stuck-streaming symptom.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A terminal lifecycle event should always retire the visible chat run when enough session/run context exists to do so. If the registry entry drifted between peek() and shift(), the gateway should use the already-resolved sessionKey / eventRunId fallback and still emit the final chat event.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING