openclaw - 💡(How to fix) Fix [Bug]: Control UI / WebChat local websocket disconnects with code=1001 during long-running tasks; reconnect recovers via chat.history instead of stable live updates [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#60930Fetched 2026-04-08 02:45:30
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0

On local macOS usage with Control UI / webchat (client=openclaw-control-ui webchat v2026.3.28), the UI websocket disconnects with code=1001 and reconnects immediately. This tends to happen around long-running subagent / embedded-agent task completion and direct-announce retry windows.

The gateway itself does not appear to crash. The visible behavior is:

  • webchat connection drops with code=1001
  • a new webchat connection is established almost immediately
  • long-running task state is sometimes only fully visible after reconnect / chat.history

This makes the issue look like a UI/live-stream instability problem rather than a provider connectivity problem.

Error Message

2026-04-04T23:26:56.639+08:00 [warn] Subagent announce completion direct announce agent call transient failure, retrying 2/4 in 5s: gateway timeout after 90000ms 2026-04-04T23:28:31.650+08:00 [warn] Subagent announce completion direct announce agent call transient failure, retrying 3/4 in 10s: gateway timeout after 90000ms 2026-04-04T23:30:11.660+08:00 [warn] Subagent announce completion direct announce agent call transient failure, retrying 4/4 in 20s: gateway timeout after 90000ms

Root Cause

This does not look primarily like remote LLM provider connectivity loss, because:

  • tasks often still complete successfully
  • transcripts contain the final results
  • the websocket reconnect is local loopback (127.0.0.1)
  • the issue is about live UI updates / event delivery stability

Code Example

2026-04-04T23:26:56.639+08:00 [warn] Subagent announce completion direct announce agent call transient failure, retrying 2/4 in 5s: gateway timeout after 90000ms
2026-04-04T23:28:31.650+08:00 [warn] Subagent announce completion direct announce agent call transient failure, retrying 3/4 in 10s: gateway timeout after 90000ms
2026-04-04T23:28:35.968+08:00 [ws] webchat disconnected code=1001 reason=n/a conn=...
2026-04-04T23:28:36.247+08:00 [ws] webchat connected conn=... remote=127.0.0.1 client=openclaw-control-ui webchat v2026.3.28
2026-04-04T23:28:54.863+08:00 [ws] webchat disconnected code=1001 reason=n/a conn=...
2026-04-04T23:28:55.024+08:00 [ws] webchat connected conn=... remote=127.0.0.1 client=openclaw-control-ui webchat v2026.3.28
2026-04-04T23:30:11.660+08:00 [warn] Subagent announce completion direct announce agent call transient failure, retrying 4/4 in 20s: gateway timeout after 90000ms
2026-04-04T23:35:02.150+08:00 [ws] webchat disconnected code=1001 reason=n/a conn=...
2026-04-04T23:35:02.381+08:00 [ws] webchat connected conn=... remote=127.0.0.1 client=openclaw-control-ui webchat v2026.3.28
2026-04-04T23:35:28.737+08:00 [ws] ⇄ res ✓ chat.history 50ms conn=... id=...

---

socket.once("close", (code, reason) => {
  ...
  if (client && isWebchatClient(client.connect.client))
    logWsControl.info(`webchat disconnected code=${code} reason=${logReason || "n/a"} conn=${connId}`);
})
RAW_BUFFERClick to expand / collapse

OpenClaw Issue Draft: Control UI / WebChat local websocket disconnects with code=1001 during long-running tasks, then reconnects and recovers via chat.history

Suggested title

[Bug]: Control UI / WebChat local websocket disconnects with code=1001 during long-running tasks; reconnect recovers via chat.history instead of stable live updates

Summary

On local macOS usage with Control UI / webchat (client=openclaw-control-ui webchat v2026.3.28), the UI websocket disconnects with code=1001 and reconnects immediately. This tends to happen around long-running subagent / embedded-agent task completion and direct-announce retry windows.

The gateway itself does not appear to crash. The visible behavior is:

  • webchat connection drops with code=1001
  • a new webchat connection is established almost immediately
  • long-running task state is sometimes only fully visible after reconnect / chat.history

This makes the issue look like a UI/live-stream instability problem rather than a provider connectivity problem.

Environment

  • OpenClaw local install via npm package path /opt/homebrew/lib/node_modules/openclaw
  • macOS arm64
  • Control UI / webchat surface
  • client label: openclaw-control-ui
  • webchat version observed in logs: v2026.3.28
  • local loopback gateway target: ws://127.0.0.1:18789

Observed gateway log pattern

From local ~/.openclaw/logs/gateway.log:

2026-04-04T23:26:56.639+08:00 [warn] Subagent announce completion direct announce agent call transient failure, retrying 2/4 in 5s: gateway timeout after 90000ms
2026-04-04T23:28:31.650+08:00 [warn] Subagent announce completion direct announce agent call transient failure, retrying 3/4 in 10s: gateway timeout after 90000ms
2026-04-04T23:28:35.968+08:00 [ws] webchat disconnected code=1001 reason=n/a conn=...
2026-04-04T23:28:36.247+08:00 [ws] webchat connected conn=... remote=127.0.0.1 client=openclaw-control-ui webchat v2026.3.28
2026-04-04T23:28:54.863+08:00 [ws] webchat disconnected code=1001 reason=n/a conn=...
2026-04-04T23:28:55.024+08:00 [ws] webchat connected conn=... remote=127.0.0.1 client=openclaw-control-ui webchat v2026.3.28
2026-04-04T23:30:11.660+08:00 [warn] Subagent announce completion direct announce agent call transient failure, retrying 4/4 in 20s: gateway timeout after 90000ms
2026-04-04T23:35:02.150+08:00 [ws] webchat disconnected code=1001 reason=n/a conn=...
2026-04-04T23:35:02.381+08:00 [ws] webchat connected conn=... remote=127.0.0.1 client=openclaw-control-ui webchat v2026.3.28
2026-04-04T23:35:28.737+08:00 [ws] ⇄ res ✓ chat.history 50ms conn=... id=...

There were also earlier same-day occurrences:

  • 22:02:19 disconnected code=1001 -> immediate reconnect
  • 23:09:13 disconnected code=1001 -> immediate reconnect

What we checked in local code

In local built gateway code (dist/gateway-cli-*.js), the relevant webchat disconnect log is emitted inside the socket close handler:

socket.once("close", (code, reason) => {
  ...
  if (client && isWebchatClient(client.connect.client))
    logWsControl.info(`webchat disconnected code=${code} reason=${logReason || "n/a"} conn=${connId}`);
})

This appears to be logging an observed close event, not obviously forcing close(1001) from the gateway side.

Interpretation

This currently looks more like:

  1. local Control UI / webchat websocket gets closed and re-established
  2. the disconnect is correlated with long-running task completion / direct-announce retry pressure
  3. UI state is recovered after reconnect / chat.history

This does not look primarily like remote LLM provider connectivity loss, because:

  • tasks often still complete successfully
  • transcripts contain the final results
  • the websocket reconnect is local loopback (127.0.0.1)
  • the issue is about live UI updates / event delivery stability

Suspected areas

Backend / gateway

  • embedded agent live stream / direct announce path under long-running completion pressure
  • any websocket event delivery path that may stall or degrade until reconnect
  • timeout / retry interaction between announce delivery and webchat live subscriptions

Frontend / Control UI

  • reconnect behavior and why existing local websocket is replaced with 1001
  • whether live subscriptions degrade during long-running tasks
  • whether chat.history reconciliation is masking missed realtime events

Expected behavior

  • local webchat websocket should remain stable during long-running tasks
  • if reconnect is unavoidable, live state reconciliation should be automatic and lossless
  • users should not need reconnect / refresh / chat.history to see task completion consistently

Actual behavior

  • websocket disconnects with code=1001
  • reconnect happens quickly
  • long-running completion visibility seems tied to reconnect / history refresh

Impact

This makes Control UI feel flaky during long-running or subagent-heavy workflows, even when backend task execution itself succeeds.

Related symptoms

This may be related to other live-update / reconnect / long-task visibility issues where:

  • completions surface only after refresh
  • webchat misses live updates during reconnect
  • timeout / announce behavior is misleading even though underlying session transcripts are correct

extent analysis

TL;DR

The most likely fix or workaround for the Control UI / WebChat local websocket disconnects is to investigate and optimize the embedded agent live stream and direct announce path under long-running completion pressure.

Guidance

  1. Review websocket close handler: Examine the socket.once("close", ...) handler in the gateway code to ensure it's not forcing a close with code 1001.
  2. Investigate embedded agent live stream: Look into the embedded agent live stream and direct announce path to identify potential bottlenecks or issues that may cause the websocket to disconnect during long-running tasks.
  3. Optimize timeout and retry settings: Adjust the timeout and retry settings for announce delivery and webchat live subscriptions to prevent degradation during long-running tasks.
  4. Improve reconnect behavior: Enhance the reconnect behavior in the Control UI to ensure seamless live state reconciliation and lossless event delivery.
  5. Verify websocket event delivery: Monitor websocket event delivery paths to detect any stalls or degradation that may occur during long-running tasks.

Example

No specific code snippet can be provided without further investigation, but the following example illustrates how to handle websocket close events:

socket.once("close", (code, reason) => {
  if (code === 1001) {
    // Investigate why the websocket is closing with code 1001
    console.log("Websocket closed with code 1001");
  }
});

Notes

The root cause of the issue is still uncertain and may require further investigation. The provided guidance is based on the information available in the issue description.

Recommendation

Apply a workaround by optimizing the embedded agent live stream and direct announce path under long-running completion pressure, as this seems to be the most likely cause of the websocket disconnects.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • local webchat websocket should remain stable during long-running tasks
  • if reconnect is unavoidable, live state reconciliation should be automatic and lossless
  • users should not need reconnect / refresh / chat.history to see task completion consistently

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Control UI / WebChat local websocket disconnects with code=1001 during long-running tasks; reconnect recovers via chat.history instead of stable live updates [1 participants]