openclaw - ✅(Solved) Fix TUI: WebSocket connection silently stalls — messages stuck in client, no disconnect detected [1 pull requests, 1 participants]

wbavon · 2026-04-02T10:51:43Z

[openclaw] PR 59800: fix tui : preserve pending sends and busy-state visibility - Repository: openclaw/openclaw - Author: vincentkoc - State: closed | merged:… # PR #59800: fix(tui): preserve pending sends and busy-state visibility - Repository: openclaw/openclaw - Author: vincentkoc - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/59800 ## Description (problem / solution / changelog) ## Summary - Problem: the TUI could lose track of optimistic local sends during history reload/reconnect paths, show confusing busy/error state during fallback/terminal-error transitions, and waste horizontal width on long links and paths. - Why it matters: users could see prompts disappear and later reappear, get stuck in unclear run state, and struggle to read or copy long terminal output. - What changed: pending local user turns are preserved and reconciled through transcript rebuilds, active-run/error cleanup is more coherent, `Esc`/editor handling is covered more directly, and chat rendering reclaims width for long links and paths. - What did NOT change (scope boundary): this PR does not add full Pi-style runtime-owned steer/follow-up queues or a new pending queue panel; it stays focused on stabilizing the existing TUI state model. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [ ] API / contracts - [x] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Related #59014 - Related #59627 - Related #59570 - Related #55300 - [x] This PR fixes a bug or regression ## Root Cause / Regression History (if applicable) - Root cause: the TUI relied on optimistic local-send state that was not durably reconciled with history rebuilds and run lifecycle transitions, so reload/error paths could desynchronize visible user turns from real run state. - Missing detection / guardrail: there was no focused coverage for pending-send reconciliation across history rebuilds and not enough direct tests around the TUI error/final cleanup paths. - Prior context (`git blame`, prior PR, issue, or refactor if known): issue reports in #59014, #59627, #59570, and #55300 all point at state-coherence failures in the TUI. - Why this regressed now: the optimistic-send path assumed fast run attribution and simple finalization, which breaks under reconnect/history replay and fallback/error timing. - If unknown, what was ruled out: not just markdown rendering; the disappearing-send behavior came from TUI transcript/state handling. ## Regression Test Plan (if applicable) - Coverage level that should have caught this: - [x] Unit test - [ ] Seam / integration test - [ ] End-to-end test - [ ] Existing coverage already sufficient - Target test or file: `src/tui/components/chat-log.test.ts`, `src/tui/tui-command-handlers.test.ts`, `src/tui/tui-event-handlers.test.ts`, `src/tui/tui-session-actions.test.ts`, `src/tui/tui.test.ts`, `src/tui/components/custom-editor.test.ts` - Scenario the test should lock in: pending local sends survive history rebuilds until a matching run is anchored or dropped, run/error cleanup returns the TUI to a coherent state, and the editor key handling stays stable. - Why this is the smallest reliable guardrail: these are TUI-local state-machine bugs, so focused unit coverage hits the failure path directly without network flake. - Existing test that already covers this (if any): N/A - If no new test is added, why not: N/A ## User-visible / Behavior Changes - Pending local sends no longer disappear during history reload/reconnect windows. - Busy/error state is less likely to get stuck or look idle at the wrong time. - Long links and paths get more usable width in chat rendering. - TUI editor key handling now has direct regression coverage. ## Diagram (if applicable) ```text Before: [user send] -> [optimistic local state] -> [history reload or error transition] -> [message/status can disappear or desync] After: [user send] -> [tracked pending local state] -> [history rebuild reconciles it] -> [run anchors or drops explicitly] -> [status stays coherent] ``` ## Security Impact (required) - New permissions/capabilities? (`Yes/No`): No - Secrets/tokens handling changed? (`Yes/No`): No - New/changed network calls? (`Yes/No`): No - Command/tool execution surface changed? (`Yes/No`): No - Data access scope changed? (`Yes/No`): No - If any `Yes`, explain risk + mitigation: N/A ## Repro + Verification ### Environment - OS: macOS - Runtime/container: Node 22 / pnpm worktree - Model/provider: N/A - Integration/channel (if any): TUI + gateway chat client - Relevant config (redacted): default TUI config ### Steps 1. Send a local prompt, then trigger a session reload/reconnect or history rebuild path. 2. Exercise error/final cleanup paths for the active run. 3. Render long links

openclaw2026-04-02 10:51:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#59627•Fetched 2026-04-08 02:42:18

View on GitHub

Comments

Participants

Timeline

Reactions

Author

wbavon

Participants

wbavon

Timeline (top)

cross-referenced ×1

Root Cause

Examined the source code in method-scopes-0x4tgmV6.js. The current tick-watch mechanism is passive only:

startTickWatch() {
    // ...
    this.tickTimer = setInterval(() => {
        if (Date.now() - this.lastTick > this.tickIntervalMs * 2)
            this.ws?.close(4e3, "tick timeout");
    }, interval);
}

This only checks whether the gateway is still sending ticks to the client. It does not detect:

Client-side event loop stalls — if the Node.js event loop is blocked, incoming ticks may sit in the buffer and lastTick never updates, but setInterval also does not fire, so the stall goes undetected.
Half-open connections — the TCP connection may be alive (ticks arriving at OS level) but the application layer is not processing them.
Send-side failures — there is no mechanism to detect that outbound chat.send requests are failing or timing out silently.

Fix Action

Fixed

Fixed by PR: fix(tui): preserve pending sends and busy-state visibility (https://github.com/openclaw/openclaw/pull/59800)

PR fix notes

PR #59800: fix(tui): preserve pending sends and busy-state visibility

Repository: openclaw/openclaw
Author: vincentkoc
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/59800

Description (problem / solution / changelog)

Summary

Problem: the TUI could lose track of optimistic local sends during history reload/reconnect paths, show confusing busy/error state during fallback/terminal-error transitions, and waste horizontal width on long links and paths.
Why it matters: users could see prompts disappear and later reappear, get stuck in unclear run state, and struggle to read or copy long terminal output.
What changed: pending local user turns are preserved and reconciled through transcript rebuilds, active-run/error cleanup is more coherent, Esc/editor handling is covered more directly, and chat rendering reclaims width for long links and paths.
What did NOT change (scope boundary): this PR does not add full Pi-style runtime-owned steer/follow-up queues or a new pending queue panel; it stays focused on stabilizing the existing TUI state model.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Related #59014
Related #59627
Related #59570
Related #55300
This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

Root cause: the TUI relied on optimistic local-send state that was not durably reconciled with history rebuilds and run lifecycle transitions, so reload/error paths could desynchronize visible user turns from real run state.
Missing detection / guardrail: there was no focused coverage for pending-send reconciliation across history rebuilds and not enough direct tests around the TUI error/final cleanup paths.
Prior context (git blame, prior PR, issue, or refactor if known): issue reports in #59014, #59627, #59570, and #55300 all point at state-coherence failures in the TUI.
Why this regressed now: the optimistic-send path assumed fast run attribution and simple finalization, which breaks under reconnect/history replay and fallback/error timing.
If unknown, what was ruled out: not just markdown rendering; the disappearing-send behavior came from TUI transcript/state handling.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/tui/components/chat-log.test.ts, src/tui/tui-command-handlers.test.ts, src/tui/tui-event-handlers.test.ts, src/tui/tui-session-actions.test.ts, src/tui/tui.test.ts, src/tui/components/custom-editor.test.ts
Scenario the test should lock in: pending local sends survive history rebuilds until a matching run is anchored or dropped, run/error cleanup returns the TUI to a coherent state, and the editor key handling stays stable.
Why this is the smallest reliable guardrail: these are TUI-local state-machine bugs, so focused unit coverage hits the failure path directly without network flake.
Existing test that already covers this (if any): N/A
If no new test is added, why not: N/A

User-visible / Behavior Changes

Pending local sends no longer disappear during history reload/reconnect windows.
Busy/error state is less likely to get stuck or look idle at the wrong time.
Long links and paths get more usable width in chat rendering.
TUI editor key handling now has direct regression coverage.

Diagram (if applicable)

Before:
[user send] -> [optimistic local state] -> [history reload or error transition] -> [message/status can disappear or desync]

After:
[user send] -> [tracked pending local state] -> [history rebuild reconciles it] -> [run anchors or drops explicitly] -> [status stays coherent]

Security Impact (required)

New permissions/capabilities? (Yes/No): No
Secrets/tokens handling changed? (Yes/No): No
New/changed network calls? (Yes/No): No
Command/tool execution surface changed? (Yes/No): No
Data access scope changed? (Yes/No): No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: macOS
Runtime/container: Node 22 / pnpm worktree
Model/provider: N/A
Integration/channel (if any): TUI + gateway chat client
Relevant config (redacted): default TUI config

Steps

Send a local prompt, then trigger a session reload/reconnect or history rebuild path.
Exercise error/final cleanup paths for the active run.
Render long links/paths in the TUI transcript.

Expected

Pending local sends remain visible and reconcile cleanly.
Busy/error state returns to a sensible status.
Long terminal-style text keeps more horizontal width.

Actual

Before this change, pending sends could disappear and later reappear, busy/error handling could become misleading, and transcript padding wasted horizontal space.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios: focused TUI tests passed, pnpm build passed, and the rebased branch was run locally for interactive TUI validation.
Edge cases checked: pending-send reconciliation on history rebuild, no-active-run Esc/abort handling, and error/final cleanup paths.
What you did not verify: full Pi-style queued steer/follow-up runtime semantics; that remains follow-up work.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No): Yes
Config/env changes? (Yes/No): No
Migration needed? (Yes/No): No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: TUI state handling remains easy to regress because optimistic sends, history replay, and run lifecycle events are loosely coupled.
- Mitigation: added focused tests around chat-log reconciliation, editor handling, and event/session cleanup.
Risk: full Pi-style queue/runtime semantics are still not present.
- Mitigation: call that scope boundary out explicitly rather than implying this PR solves the larger parity effort.

Changed files

CHANGELOG.md (modified, +1/-0)
src/tui/components/assistant-message.ts (modified, +1/-1)
src/tui/components/chat-log.test.ts (modified, +57/-0)
src/tui/components/chat-log.ts (modified, +104/-1)
src/tui/components/custom-editor.test.ts (added, +32/-0)
src/tui/components/custom-editor.ts (modified, +5/-0)
src/tui/components/markdown-message.ts (modified, +5/-4)
src/tui/components/pending-messages.test.ts (added, +25/-0)
src/tui/components/pending-messages.ts (added, +35/-0)
src/tui/tui-types.ts (modified, +9/-0)

Code Example

startTickWatch() {
    // ...
    this.tickTimer = setInterval(() => {
        if (Date.now() - this.lastTick > this.tickIntervalMs * 2)
            this.ws?.close(4e3, "tick timeout");
    }, interval);
}

RAW_BUFFERClick to expand / collapse

Bug Description

When using openclaw tui, the WebSocket connection can silently stall. The TUI appears connected (no disconnect message shown), but user-typed messages are not delivered to the gateway. The user has to close and reopen the TUI window to restore communication.

Steps to Reproduce

Open openclaw tui and chat normally
Wait for a period (observed after ~5-10 minutes of the agent processing long tasks)
Type a message — it appears in the TUI input but never reaches the gateway
Type more messages — same result, no response from the agent
Open a new terminal, run openclaw tui again
All previously stuck messages are delivered at once, and the agent responds to all of them

Evidence from Gateway Logs

Gateway logs show the original TUI connection (conn=26932be5…5dd4) had its last activity at 18:12:48, then went completely silent. No chat.send requests were received between 18:12 and 18:16, despite the user actively typing messages during that window. A new connection (conn=bffeaa43…37ba) appeared at 18:16:40 when the user reopened TUI, and all queued messages arrived.

Root Cause Analysis

Examined the source code in method-scopes-0x4tgmV6.js. The current tick-watch mechanism is passive only:

startTickWatch() {
    // ...
    this.tickTimer = setInterval(() => {
        if (Date.now() - this.lastTick > this.tickIntervalMs * 2)
            this.ws?.close(4e3, "tick timeout");
    }, interval);
}

This only checks whether the gateway is still sending ticks to the client. It does not detect:

Client-side event loop stalls — if the Node.js event loop is blocked, incoming ticks may sit in the buffer and lastTick never updates, but setInterval also does not fire, so the stall goes undetected.
Half-open connections — the TCP connection may be alive (ticks arriving at OS level) but the application layer is not processing them.
Send-side failures — there is no mechanism to detect that outbound chat.send requests are failing or timing out silently.

Suggested Fix

Active client-side ping: Send periodic WebSocket pings from the TUI client and expect pongs within a timeout. If no pong is received, force-close and reconnect.
Send timeout detection: If a chat.send request does not receive a response within N seconds, show a warning in the TUI and attempt reconnection.
Input watchdog: If the user types a message and no gateway acknowledgment arrives within a reasonable timeout, display a "connection may be stale" warning.

Environment

OpenClaw version: 2026.3.23-1
OS: macOS (Apple Silicon)
Connection: local WebSocket (ws://127.0.0.1:18789)
TUI client: openclaw tui

extent analysis

TL;DR

Implement an active client-side ping mechanism to detect and recover from silent WebSocket connection stalls.

Guidance

Introduce a periodic WebSocket ping from the TUI client to the gateway, expecting a pong response within a specified timeout.
Implement send timeout detection for chat.send requests, displaying a warning and attempting reconnection if no response is received within a reasonable timeframe.
Consider adding an input watchdog to detect stale connections when user-typed messages do not receive gateway acknowledgments.

Example

// Example of active client-side ping
setInterval(() => {
    ws.ping();
    const pingTimeout = setTimeout(() => {
        if (ws.readyState === 1) { // Connection is still open
            console.log('Ping timeout, forcing reconnect');
            ws.close(1000, 'ping timeout');
        }
    }, 30000); // 30-second ping timeout
    ws.on('pong', () => {
        clearTimeout(pingTimeout);
    });
}, 10000); // Send ping every 10 seconds

Notes

The provided example is a basic illustration and may require adjustments based on the specific WebSocket library and implementation details.

Recommendation

Apply the suggested workaround by implementing an active client-side ping mechanism to detect and recover from silent connection stalls, as this directly addresses the identified issue of the TUI client not detecting when the WebSocket connection has become unresponsive.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#API rate limit #retriever error #indexing error #inference speed #output truncation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix TUI: WebSocket connection silently stalls — messages stuck in client, no disconnect detected [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #59800: fix(tui): preserve pending sends and busy-state visibility

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause / Regression History (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

Code Example

Bug Description

Steps to Reproduce

Evidence from Gateway Logs

Root Cause Analysis

Suggested Fix

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING