codex - 💡(How to fix) Fix codex exec resume <missing-uuid> surfaces noisy "thread <uuid> not found"; fall back to thread/start instead [1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#22064Fetched 2026-05-11 03:19:48
View on GitHub
Comments
0
Participants
1
Timeline
7
Reactions
0
Author
Participants
Timeline (top)
labeled ×4unlabeled ×2cross-referenced ×1

This is part of a small batch of UX patches a downstream Codex wrapper (cdx exo) accumulated to keep codex exec output clean. Related issues filed earlier: #22054 (rate-limit telemetry gating), #22055 (rollout-persistence log demotion), #22059 (MCP startup notification collapsing), #22061 (local-provider reasoning-summary heuristic), #22063 (provider discovery_url).

Error Message

codex exec resume <uuid> returns the supplied id verbatim to thread/resume without first verifying that the thread actually exists in the active \$CODEX_HOME. When the rollout has been pruned, was created in a different home, or never existed at all (e.g. an automation typo'd a UUID), this surfaces the underlying "thread <uuid> not found" error to stderr: Error: thread/read: thread/read failed: thread not loaded: bc0d74c3-91c6-4dd9-b7e3-e69965ce762d (code -32600) Probe the thread store with thread/read (include_turns: false) before issuing thread/resume. If the probe reports the thread is missing, log a single human-readable line and fall back to the existing thread/start path; otherwise propagate the existence-error back to the user as today.

  • codex-rs/exec/tests/suite/resume.rs: end-to-end test exec_resume_with_missing_uuid_falls_back_to_new_thread asserting (a) success exit, (b) the human-readable notice, (c) the absence of the historical noisy error, and (d) a freshly-generated thread id in the new rollout.

Root Cause

That's a user-visible signal the harness is suppressing because there is no clean way to opt into the existing fall-back behaviour.

Fix Action

Fix / Workaround

Output after the patch:

This is part of a small batch of UX patches a downstream Codex wrapper (cdx exo) accumulated to keep codex exec output clean. Related issues filed earlier: #22054 (rate-limit telemetry gating), #22055 (rollout-persistence log demotion), #22059 (MCP startup notification collapsing), #22061 (local-provider reasoning-summary heuristic), #22063 (provider discovery_url).

RAW_BUFFERClick to expand / collapse

Problem

codex exec resume <uuid> returns the supplied id verbatim to thread/resume without first verifying that the thread actually exists in the active \$CODEX_HOME. When the rollout has been pruned, was created in a different home, or never existed at all (e.g. an automation typo'd a UUID), this surfaces the underlying "thread <uuid> not found" error to stderr:

``` $ codex exec resume bc0d74c3-91c6-4dd9-b7e3-e69965ce762d "hi" Error: thread/read: thread/read failed: thread not loaded: bc0d74c3-91c6-4dd9-b7e3-e69965ce762d (code -32600) ```

This is mostly a UX problem: the function resolve_resume_thread_id already has a graceful fall-through path for the \\--last / non-UUID cases (it returns Ok(None) and the caller transparently starts a fresh thread via thread/start). The UUID branch is the only path that skips the existence check.

Concrete pain in downstream harnesses

The cdx exo TUI (a Codex wrapper) currently filters this exact line out with a regex:

```js if (/^(items:\s*)?thread\s+[0-9a-f-]+\s+not found$/i.test(trimmed)) return false; ```

That's a user-visible signal the harness is suppressing because there is no clean way to opt into the existing fall-back behaviour.

Proposal

Probe the thread store with thread/read (include_turns: false) before issuing thread/resume. If the probe reports the thread is missing, log a single human-readable line and fall back to the existing thread/start path; otherwise propagate the existence-error back to the user as today.

Output after the patch:

``` $ codex exec resume bc0d74c3-91c6-4dd9-b7e3-e69965ce762d "hi" No previous session \`bc0d74c3-91c6-4dd9-b7e3-e69965ce762d\` found; starting a new thread. [normal turn output...] ```

The fall-back is identical to what --last already does when no thread matches the cwd filter (Ok(None) returned to the caller), so this just extends that behaviour to the explicit-id case.

The probe substring-matches three stable invalid-request strings (not found, no rollout found, not loaded) that the app-server emits depending on which layer notices the missing thread first. Each is pinned by existing upstream tests in app-server/tests/suite/v2/{compaction,mcp_*}.rs and tui/src/app/session_lifecycle.rs, so the substring pattern is stable. Genuine app-server errors that don't match those substrings are still propagated unchanged.

Reference implementation

Reference branch on a fork: team-wcv/codex@feat/thread-resume-fallback (one commit, +133/-1).

Key changes:

  • codex-rs/exec/src/lib.rs: introduce probe_resume_thread_exists and call it from resolve_resume_thread_id before returning a UUID verbatim.
  • codex-rs/exec/tests/suite/resume.rs: end-to-end test exec_resume_with_missing_uuid_falls_back_to_new_thread asserting (a) success exit, (b) the human-readable notice, (c) the absence of the historical noisy error, and (d) a freshly-generated thread id in the new rollout.

The full `exec` resume suite (8 tests) passes, plus clippy/fmt clean.

Alternative protocol-level design (out of scope, noted for completeness)

If you'd prefer a structural fix at the protocol layer rather than the exec-layer probe, a richer alternative would be:

  • Add resume_id?: ThreadId to ThreadStartParams. When set, app-server tries to resume that thread and falls back to a fresh thread on ThreadNotFound.
  • Emit a thread/resumeFailed notification carrying { thread_id, reason } so live JSON-RPC clients can render "Resume failed; started new thread" without parsing stderr.

That's a larger surface change (new param, new notification, plumbing through process_start_request) and would benefit from explicit design input. Happy to author it if you'd like; otherwise the smaller exec-layer fix above already removes the user-visible pain for codex exec resume consumers.

Context

This is part of a small batch of UX patches a downstream Codex wrapper (cdx exo) accumulated to keep codex exec output clean. Related issues filed earlier: #22054 (rate-limit telemetry gating), #22055 (rollout-persistence log demotion), #22059 (MCP startup notification collapsing), #22061 (local-provider reasoning-summary heuristic), #22063 (provider discovery_url).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING