codex - ✅(Solved) Fix `--remote` resume fails on large saved sessions after large `thread/resume` response [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#19837Fetched 2026-04-28 06:36:11
View on GitHub
Comments
1
Participants
2
Timeline
16
Reactions
1
Author
Assignees
Timeline (top)
labeled ×5unlabeled ×3renamed ×2assigned ×1

Error Message

Error: Failed to resume session from /home/<user>/.codex/sessions/2026/04/23/rollout-2026-04-23T19-13-22-<thread-id>.jsonl

Stack backtrace: 0: <unknown> 1: <unknown> 2: <unknown> 3: <unknown> 4: <unknown> 5: <unknown> 6: <unknown> 7: <unknown>

Fix Action

Fix / Workaround

Operator workaround:

PR fix notes

PR #19920: Allow large remote app-server resume responses

Description (problem / solution / changelog)

Why

Remote TUI resume uses the app-server websocket client. That client inherited tungstenite's default 16 MiB frame limit, so a large saved session could make thread/resume return a single JSON-RPC response frame that the client rejected before the TUI could deserialize or render it.

Fixes #19837

What Changed

  • Configure the remote app-server websocket client with a bounded 128 MiB max frame/message size.
  • Preserve the concrete remote worker exit reason when completing pending requests after a transport/read failure instead of replacing it with a generic channel-closed error.
  • Add a regression test that sends a single >16 MiB JSON-RPC response frame and verifies the typed request succeeds.

Note: This isn't a perfect fix. It really just moves the limit to a much larger value. I looked at a bunch of other potential fixes (both server-side and client-side), and they all involved significant complexity, had backward-compatibility impact, or impacted performance of common use cases. This simple fix should address the vast majority of remote use cases.

Verification

I reproed the problem locally using a long rollout. Verified that fix addresses connection drop.

Changed files

  • codex-rs/app-server-client/src/lib.rs (modified, +49/-0)
  • codex-rs/app-server-client/src/remote.rs (modified, +72/-38)

Code Example

Error: Failed to resume session from /home/<user>/.codex/sessions/2026/04/23/rollout-2026-04-23T19-13-22-<thread-id>.jsonl

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>

---

codex --remote ws://127.0.0.1:<port> resume <thread-id>

---

codex resume <thread-id>

---

codex resume <thread-id>
RAW_BUFFERClick to expand / collapse

What version of Codex CLI is running?

v0.125.0

What subscription do you have?

PRO

Which model were you using?

gpt-5.5

What platform is your computer?

Ubuntu 24.04.4 LTS

What terminal emulator and version are you using (if applicable)?

Terminal

What issue are you seeing?

codex --remote <ws> resume <thread-id> fails on large saved sessions. The same session resumes with plain codex resume <thread-id>.

Observed user-facing error:

Error: Failed to resume session from /home/<user>/.codex/sessions/2026/04/23/rollout-2026-04-23T19-13-22-<thread-id>.jsonl

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>

The visible output does not include a caused-by chain, and codex-tui.log does not show the underlying failure reason.

Environment:

  • codex-cli 0.125.0
  • Ubuntu 24.04, x86_64
  • Isolated CODEX_HOME copied from the user's auth/config only
  • codex --remote ws://127.0.0.1:<port> resume <thread-id> through a WebSocket proxy to a stock Codex app-server child

What was observed on the wire for the smallest failing fixture:

  • The visible TUI opens the WebSocket and completes the upgrade.
  • It sends normal startup RPCs (initialize, account/thread/model reads).
  • It sends thread/resume.
  • The app-server side returns a thread/resume result.
  • The response is large: the WebSocket frame length observed by strace was about 16 MiB for a 56 MiB JSONL fixture.
  • The visible TUI exits with Failed to resume session... just after it begins receiving that large response.

This does not look like a bad saved-session file:

  • Plain non-remote codex resume <thread-id> works on the original large session.
  • Plain non-remote codex resume <thread-id> also stayed alive on the generated 56 MiB fixture until manually interrupted.

What steps can reproduce the bug?

  1. Prepare a Codex saved-session JSONL large enough to produce a large remote thread/resume response. In our bisect, 52 MiB passed and 56 MiB failed.

  2. Start a Codex app-server reachable over WebSocket. In our setup a WebSocket proxy sits in front of a stock Codex app-server child.

  3. Run:

    codex --remote ws://127.0.0.1:<port> resume <thread-id>
  4. Observe that the TUI exits with Failed to resume session from ....

  5. Run:

    codex resume <thread-id>
  6. Observe that the same saved session opens without the remote path.

Measured fixture results:

JSONL sizeResult
4 MiBPass
16 MiBPass
32 MiBPass
48 MiBPass
52 MiBPass
56 MiBFail
64 MiBFail
70.8 MiB full fixture, cli_version edited to 0.125.0Fail

The original session recorded cli_version: 0.122.0; editing the copied fixture header to 0.125.0 did not change the failure, so this does not appear to be caused only by the recorded version stamp.

What is the expected behavior?

codex --remote <ws> resume <thread-id> should handle large saved sessions on parity with plain codex resume <thread-id>.

If a large remote resume cannot be supported, the TUI should report a specific cause, such as a response-size limit, timeout, decode failure, or transport close reason. The current output hides the actionable cause.

Additional information

Operator workaround:

codex resume <thread-id>

That bypasses --remote, so it is usable for manual recovery but does not work for tools that need the remote app-server transport.

extent analysis

TL;DR

The issue can be worked around by using the codex resume <thread-id> command instead of codex --remote <ws> resume <thread-id> for large saved sessions.

Guidance

  • The failure occurs when the response size exceeds a certain threshold (around 56 MiB), suggesting a potential issue with handling large responses over the WebSocket connection.
  • To verify the issue, try reproducing the bug with a large saved-session JSONL file (>= 56 MiB) and observe the TUI exit with a Failed to resume session error.
  • The codex resume <thread-id> command can be used as a temporary workaround for manual recovery, but it does not support the remote app-server transport.
  • Investigate the WebSocket connection and response handling in the Codex app-server to identify the root cause of the issue.

Example

No code snippet is provided as the issue is related to the interaction between the Codex CLI and the app-server over WebSocket, and the exact implementation details are not specified.

Notes

The issue may be related to the WebSocket connection or response handling in the Codex app-server, but further investigation is needed to determine the root cause. The provided workaround only addresses the symptom, not the underlying issue.

Recommendation

Apply the workaround by using codex resume <thread-id> for large saved sessions, as it allows for manual recovery, but note that this does not support the remote app-server transport.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - ✅(Solved) Fix `--remote` resume fails on large saved sessions after large `thread/resume` response [1 pull requests, 1 comments, 2 participants]