codex - ✅(Solved) Fix App-server disconnects remote TUI mid-turn when outbound queue fills (128 messages) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#18203Fetched 2026-04-17 08:31:24
View on GitHub
Comments
0
Participants
1
Timeline
7
Reactions
0
Author
Participants
Timeline (top)
labeled ×5cross-referenced ×1unlabeled ×1

Error Message

ERROR: remote app server at ws://127.0.0.1:<port>/ transport failed: WebSocket protocol error: Connection reset without closing handshake

Root Cause

root cause is in send_message_to_connection() at codex-rs/app-server/src/transport/mod.rs line 308-316. WebSocket clients are marked disconnectable (websocket.rs:176 sets disconnect_sender: Some(...)), so when try_send() returns TrySendError::Full on the 128-slot bounded channel, the connection is terminated immediately.

PR fix notes

PR #18265: Avoid instant remote disconnects under websocket queue pressure

Description (problem / solution / changelog)

Fixes #18203.

Why

Remote TUI clients connected through codex app-server --listen ws://... can receive bursts of outbound turn/progress notifications. Before this change, a full per-connection writer queue was treated as an immediate slow-client signal, so a healthy WebSocket client that was only momentarily behind could be disconnected mid-turn.

What Changed

Disconnectable WebSocket connections now get a bounded grace window when their outbound writer queue is full. If the queue drains within that window, the message is delivered and the connection stays open. If it remains full, the connection is cancelled so the outbound router cannot be blocked indefinitely.

The implementation also adds a per-connection bounded overflow channel for WebSocket sends. This keeps the shared outbound router from waiting on one slow connection, preserves message order for that connection while the overflow drains, and avoids creating an unbounded backlog of detached send waiters.

When a connection times out or its writer closes, the overflow worker now notifies the outbound router so the connection is removed from routing promptly instead of waiting for the later ConnectionClosed event. The overflow-channel-full path also uses the same grace window before disconnecting, so short bursts that saturate both bounded queues do not immediately drop the client.

The stdio and in-process transports keep their existing reliable-delivery behavior; the grace/overflow path applies to disconnectable WebSocket connections.

Testing

Added transport tests covering:

  • temporary WebSocket queue pressure draining without disconnecting
  • per-connection message ordering while overflow messages drain
  • disconnect after the queue remains full past the grace window
  • applying the grace window before disconnecting when both the writer queue and overflow queue are saturated

I also manually confirmed that the repro methodology from #18203 disconnects before the fix and stays connected with this change.

Changed files

  • codex-rs/app-server/src/in_process.rs (modified, +2/-0)
  • codex-rs/app-server/src/lib.rs (modified, +50/-36)
  • codex-rs/app-server/src/transport/mod.rs (modified, +447/-59)

Code Example

ERROR: remote app server at `ws://127.0.0.1:<port>/` transport failed:
  WebSocket protocol error: Connection reset without closing handshake

---

disconnecting slow connection after outbound queue filled: ConnectionId(0)

---

codex app-server --listen ws://127.0.0.1:0
# note the printed ws:// URL

---

codex --remote ws://127.0.0.1:<port>

---

Run `rg -n '120|timeout' ~/.codex` and answer briefly.
RAW_BUFFERClick to expand / collapse

What version of Codex CLI is running?

v0.121.0

What subscription do you have?

None, API Keys

Which model were you using?

gpt-5.4

What platform is your computer?

Darwin 25.4.0 arm64 arm

What terminal emulator and version are you using (if applicable)?

Warp

What issue are you seeing?

The app-server's WebSocket transport disconnects remote TUI clients instantly when its 128-message outbound queue fills, instead of applying backpressure. This makes --remote mode unusable for any turn that produces moderate output volume.

The TUI exits with:

ERROR: remote app server at `ws://127.0.0.1:<port>/` transport failed:
  WebSocket protocol error: Connection reset without closing handshake

The app-server logs:

disconnecting slow connection after outbound queue filled: ConnectionId(0)

This reproduces on stock upstream Codex with a single remote TUI client and no third-party tooling.

What steps can reproduce the bug?

Terminal 1 — start the app-server:

codex app-server --listen ws://127.0.0.1:0
# note the printed ws:// URL

Terminal 2 — attach the remote TUI:

codex --remote ws://127.0.0.1:<port>

Then send a prompt that produces moderate streaming output:

Run `rg -n '120|timeout' ~/.codex` and answer briefly.

The TUI disconnects within seconds. Smaller prompts sometimes survive, but any turn with realistic tool output triggers it.

What is the expected behavior?

The remote TUI should remain connected during normal turns. A client that falls 128 messages behind during streaming is momentarily slower than the producer, not stuck. it should not be terminated instantly.

Stdio clients already handle this correctly: they use blocking .send().await (stdio.rs:35 sets disconnect_sender: None), so they never hit the disconnect path. WebSocket clients should get equivalent resilience, either through a larger queue, a send timeout, or per-connection send tasks.

Additional information

root cause is in send_message_to_connection() at codex-rs/app-server/src/transport/mod.rs line 308-316. WebSocket clients are marked disconnectable (websocket.rs:176 sets disconnect_sender: Some(...)), so when try_send() returns TrySendError::Full on the 128-slot bounded channel, the connection is terminated immediately.

The outbound router is a single select! loop (lib.rs:643), so blocking on a slow WebSocket is not safe. But the remote_control transport already solves this differently, remote_control/websocket.rs uses per-stream BoundedOutboundBuffer with backpressure instead of instant disconnect.

Design options that preserve the non-blocking router:

  1. Per-connection send tasks. Structurally similar to remote_control's existing approach.
  2. Larger queue + send_timeout. Simplest fix, small blast radius.
  3. Priority-based notification dropping under queue pressure. opt_out_notification_methods already has the right shape.

The existing test broadcast_does_not_block_on_slow_connection (line 878) uses channel capacity 1 and validates instant disconnect as expected behavior. That test encodes the wrong contract — it should validate a grace period, not instant termination.

Related: #13949 (static analysis of same path), #15355 (local ingress feature request).

Reproduced on codex-cli 0.120.0 and 0.121.0, both stock upstream and with a third-party wrapper using --remote mode.

extent analysis

TL;DR

The most likely fix for the instant disconnection of remote TUI clients when the outbound queue fills is to implement per-connection send tasks or increase the queue size with a send timeout.

Guidance

  • Review the send_message_to_connection() function at codex-rs/app-server/src/transport/mod.rs line 308-316 to understand how WebSocket clients are marked disconnectable.
  • Consider implementing per-connection send tasks, similar to the approach used in remote_control/websocket.rs, to provide backpressure instead of instant disconnect.
  • Alternatively, increase the queue size and implement a send timeout to prevent instant disconnection when the queue is full.
  • Update the existing test broadcast_does_not_block_on_slow_connection to validate a grace period instead of instant termination.

Example

No code snippet is provided as the issue does not contain sufficient information to create a specific example.

Notes

The provided information suggests that the issue is specific to WebSocket clients and does not affect stdio clients. The root cause is identified as the send_message_to_connection() function, and the suggested design options provide possible solutions.

Recommendation

Apply a workaround by increasing the queue size and implementing a send timeout, as this is the simplest fix with a small blast radius. This approach can help prevent instant disconnection of remote TUI clients when the outbound queue fills.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING