codex - 💡(How to fix) Fix WebSocket fallback should recover after transient transport failures

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

For temporary failures, the service or network may recover shortly after fallback. In that case, users would expect Codex to eventually try WebSockets again without needing to restart the app/session.

At the same time, the recovery behavior should not make permanently broken proxy environments worse by repeatedly trying WebSockets forever.

RAW_BUFFERClick to expand / collapse

What problem are you seeing?

When supports_websockets = true is configured for a Responses provider, Codex correctly starts by using the WebSocket transport. However, if a WebSocket streaming request disconnects and exhausts the configured stream_max_retries, the session falls back to HTTPS/SSE and appears to keep WebSockets disabled for the rest of that Codex session.

This is helpful for environments where WebSockets are genuinely unsupported, such as some proxies, but it also means a transient network/provider interruption can leave an otherwise healthy session on HTTPS/SSE until Codex is restarted.

Why this matters

For temporary failures, the service or network may recover shortly after fallback. In that case, users would expect Codex to eventually try WebSockets again without needing to restart the app/session.

At the same time, the recovery behavior should not make permanently broken proxy environments worse by repeatedly trying WebSockets forever.

Suggested behavior

Keep the existing supports_websockets configuration and avoid adding a new user-facing setting.

A possible approach:

  • after WebSocket fallback activates, disable WebSockets only for a cooldown window instead of permanently for the session
  • allow a small bounded number of recovery attempts per session
  • after the recovery budget is exhausted, leave the session on HTTPS/SSE
  • reset the recovery budget only after a WebSocket stream reaches response.completed, not just after the WebSocket connection opens

This would let transient failures recover automatically while preserving the current fallback behavior for proxies or environments that consistently fail WebSocket transport.

Prototype branch

I prepared a small prototype branch here:

https://github.com/qindongliang/codex/tree/codex/websocket-fallback-recovery

Commit:

067853cb79 Recover websocket transport after fallback cooldown

The branch uses cooldown-based recovery and resets the recovery budget after a successful WebSocket completion. I could not open a PR because this repository currently limits pull request creation to collaborators.

Local testing

From the prototype branch:

  • just fmt
  • git diff --check
  • RUSTC="$HOME/.rustup/toolchains/1.93.0-aarch64-apple-darwin/bin/rustc" "$HOME/.rustup/toolchains/1.93.0-aarch64-apple-darwin/bin/cargo" test -p codex-core websocket_
  • PATH="$HOME/.rustup/toolchains/1.93.0-aarch64-apple-darwin/bin:$PATH" RUSTC="$HOME/.rustup/toolchains/1.93.0-aarch64-apple-darwin/bin/rustc" just fix -p codex-core

The targeted WebSocket test run passed: 6 unit tests and 48 filtered integration tests.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - 💡(How to fix) Fix WebSocket fallback should recover after transient transport failures