codex - 💡(How to fix) Fix TUI /resume picker can block on global rollout scan despite cwd filter [2 comments, 2 participants]

majiayu000 · 2026-05-10T16:37:12Z

[codex] What version of Codex CLI is running? codex-cli 0.130.0 Local source checkout used for code-path inspection: openai/codex at cac5354455 . What subscrip… ### What version of Codex CLI is running? `codex-cli 0.130.0` Local source checkout used for code-path inspection: `openai/codex` at `cac5354455`. ### What subscription do you have? N/A for this report. This is local CLI/TUI resume picker behavior before a model request is made. ### Which model were you using? N/A. The observed latency is in local session listing / resume picker loading. ### What platform is your computer? `Darwin 25.2.0 arm64 arm` ### What terminal emulator and version are you using (if applicable)? VS Code integrated terminal, `vscode 1.109.5`. ### What issue are you seeing? The TUI `/resume` picker can be slow in profiles with many local rollout files because the cwd-filtered resume list still goes through a filesystem-first scan of the global rollout tree. This is a narrower CLI/TUI variant of the larger local-history performance class discussed in #18693, and it also overlaps with resume listing correctness issues such as #20165 and #21619. The specific problem here is the first-screen `/resume` list path: even when the picker is scoped to the current working directory, the storage layout and listing path can force global rollout scanning before the user sees the picker results. The current flow appears to be: 1. `/resume` opens the TUI resume picker. 2. The picker builds a `thread/list` request with a cwd filter and source filter. 3. The request still sets `use_state_db_only: false`. 4. The backend uses filesystem-first listing so it can repair / validate SQLite metadata. 5. For `updated_at` ordering, the rollout list path must scan files because updated time is not encoded in filenames. 6. After selecting a session, cold resume can additionally load the full rollout JSONL into memory. Relevant code paths from current `main`: - `codex-rs/tui/src/resume_picker.rs`: `thread_list_params` includes cwd/source filters but sets `use_state_db_only: false`. - `codex-rs/rollout/src/recorder.rs`: `list_threads_with_db_fallback` performs filesystem-first listing and read-repair before returning filtered listings. - `codex-rs/rollout/src/list.rs`: the `UpdatedAt` sort path documents that it must scan files up to the scan cap because updated_at is not encoded in filenames. - `codex-rs/thread-store/src/local/read_thread.rs` and `codex-rs/rollout/src/recorder.rs`: cold resume history loading ultimately calls `RolloutRecorder::load_rollout_items`, which uses `tokio::fs::read_to_string(path)` for the rollout. Local measurement from my profile: ```text ~/.codex/sessions rollout files: 4829 ~/.codex/sessions size: 2.1G largest rollout file: 141,283,019 bytes state_5.sqlite threads: 4541 total, 330 cli/vscode, 1 for the current cwd raw stat/sort of ~/.codex/sessions rollout files: 10.70s ``` For comparison, Claude Code stores transcripts under per-project directories: ```text ~/.claude/projects/ / .jsonl ``` On the same machine, scanning one heavy Claude project directory and sorting by mtime took about `0.74s`. The point is not that Claude's whole profile is smaller; the full `~/.claude/projects` tree has 6317 jsonl files. The faster common path comes from physical per-project partitioning, so "continue current directory" does not need to inspect all projects first. ### What steps can reproduce the bug? 1. Accumulate many Codex local rollout files under `~/.codex/sessions`, especially across multiple projects. 2. Make sure only a small subset belongs to the current working directory. 3. Open Codex TUI in one project directory. 4. Run `/resume`. 5. Observe that the resume picker can take noticeable time to show or refresh results. 6. Inspect the listing path: the picker passes a cwd filter, but because `use_state_db_only` is false, the backend can still scan the global rollout tree and repair/validate metadata before returning the page. A local way to estimate the worst-case scan pressure is: ```bash find ~/.codex/sessions -type f -name 'rollout-*.jsonl' | wc -l find ~/.codex/sessions -type f -name 'rollout-*.jsonl' \ | while IFS= read -r f; do stat -f '%m %z %N' "$f"; done \ | sort -nr \ | head -25 >/dev/null ``` ### What is the expected behavior? Opening `/resume` in the TUI should make the common current-directory picker path fast and bounded by current-project metadata, not by the number of rollout files across all projects. In particular, if the SQLite state DB already has current cwd/source/provider metadata, the first page should be able to render from that index without waiting for a global filesystem scan/repair. ### Additional information Suggested phased fix: 1. For the default TUI `/resume` current-cwd list, try `use_state_db_only: true` first. 2. If the DB returns no usable results, errors, or the user explicitly selects a global/show-all mode, fall back to the existing filesystem scan path. 3. Move filesystem

Code Example

~/.codex/sessions rollout files: 4829
~/.codex/sessions size: 2.1G
largest rollout file: 141,283,019 bytes
state_5.sqlite threads: 4541 total, 330 cli/vscode, 1 for the current cwd
raw stat/sort of ~/.codex/sessions rollout files: 10.70s

---

~/.claude/projects/<sanitized-cwd>/<session-id>.jsonl

---

find ~/.codex/sessions -type f -name 'rollout-*.jsonl' | wc -l
find ~/.codex/sessions -type f -name 'rollout-*.jsonl' \
  | while IFS= read -r f; do stat -f '%m %z %N' "$f"; done \
  | sort -nr \
  | head -25 >/dev/null

What version of Codex CLI is running?

codex-cli 0.130.0

Local source checkout used for code-path inspection: openai/codex at cac5354455.

What subscription do you have?

N/A for this report. This is local CLI/TUI resume picker behavior before a model request is made.

Which model were you using?

N/A. The observed latency is in local session listing / resume picker loading.

What platform is your computer?

Darwin 25.2.0 arm64 arm

What terminal emulator and version are you using (if applicable)?

VS Code integrated terminal, vscode 1.109.5.

What issue are you seeing?

The TUI /resume picker can be slow in profiles with many local rollout files because the cwd-filtered resume list still goes through a filesystem-first scan of the global rollout tree.

This is a narrower CLI/TUI variant of the larger local-history performance class discussed in #18693, and it also overlaps with resume listing correctness issues such as #20165 and #21619. The specific problem here is the first-screen /resume list path: even when the picker is scoped to the current working directory, the storage layout and listing path can force global rollout scanning before the user sees the picker results.

The current flow appears to be:

/resume opens the TUI resume picker.
The picker builds a thread/list request with a cwd filter and source filter.
The request still sets use_state_db_only: false.
The backend uses filesystem-first listing so it can repair / validate SQLite metadata.
For updated_at ordering, the rollout list path must scan files because updated time is not encoded in filenames.
After selecting a session, cold resume can additionally load the full rollout JSONL into memory.

Relevant code paths from current main:

codex-rs/tui/src/resume_picker.rs: thread_list_params includes cwd/source filters but sets use_state_db_only: false.
codex-rs/rollout/src/recorder.rs: list_threads_with_db_fallback performs filesystem-first listing and read-repair before returning filtered listings.
codex-rs/rollout/src/list.rs: the UpdatedAt sort path documents that it must scan files up to the scan cap because updated_at is not encoded in filenames.
codex-rs/thread-store/src/local/read_thread.rs and codex-rs/rollout/src/recorder.rs: cold resume history loading ultimately calls RolloutRecorder::load_rollout_items, which uses tokio::fs::read_to_string(path) for the rollout.

Local measurement from my profile:

~/.codex/sessions rollout files: 4829
~/.codex/sessions size: 2.1G
largest rollout file: 141,283,019 bytes
state_5.sqlite threads: 4541 total, 330 cli/vscode, 1 for the current cwd
raw stat/sort of ~/.codex/sessions rollout files: 10.70s

For comparison, Claude Code stores transcripts under per-project directories:

~/.claude/projects/<sanitized-cwd>/<session-id>.jsonl

On the same machine, scanning one heavy Claude project directory and sorting by mtime took about 0.74s. The point is not that Claude's whole profile is smaller; the full ~/.claude/projects tree has 6317 jsonl files. The faster common path comes from physical per-project partitioning, so "continue current directory" does not need to inspect all projects first.

What steps can reproduce the bug?

Accumulate many Codex local rollout files under ~/.codex/sessions, especially across multiple projects.
Make sure only a small subset belongs to the current working directory.
Open Codex TUI in one project directory.
Run /resume.
Observe that the resume picker can take noticeable time to show or refresh results.
Inspect the listing path: the picker passes a cwd filter, but because use_state_db_only is false, the backend can still scan the global rollout tree and repair/validate metadata before returning the page.

A local way to estimate the worst-case scan pressure is:

find ~/.codex/sessions -type f -name 'rollout-*.jsonl' | wc -l
find ~/.codex/sessions -type f -name 'rollout-*.jsonl' \
  | while IFS= read -r f; do stat -f '%m %z %N' "$f"; done \
  | sort -nr \
  | head -25 >/dev/null

What is the expected behavior?

Opening /resume in the TUI should make the common current-directory picker path fast and bounded by current-project metadata, not by the number of rollout files across all projects.

In particular, if the SQLite state DB already has current cwd/source/provider metadata, the first page should be able to render from that index without waiting for a global filesystem scan/repair.

Additional information

Suggested phased fix:

For the default TUI /resume current-cwd list, try use_state_db_only: true first.
If the DB returns no usable results, errors, or the user explicitly selects a global/show-all mode, fall back to the existing filesystem scan path.
Move filesystem repair/reconciliation for normal picker opening to a background task so it does not block the first visible page.
Longer term: add a per-cwd sidecar/index or encode enough metadata to avoid global rollout scans for project-scoped listing.
Separately, consider making cold resume avoid synchronously reading very large rollout files in full before the UI can recover.

This keeps the existing correctness fallback while making the common interactive path closer to a DB-first, project-scoped resume picker.

If this direction matches the maintainers' intent, I am happy to send a small invited PR for the Phase 1 change.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

codex - 💡(How to fix) Fix TUI /resume picker can block on global rollout scan despite cwd filter [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

What version of Codex CLI is running?

What subscription do you have?

Which model were you using?

What platform is your computer?

What terminal emulator and version are you using (if applicable)?

What issue are you seeing?

What steps can reproduce the bug?

What is the expected behavior?

Additional information

Still need to ship something?

TRENDING

codex - 💡(How to fix) Fix TUI /resume picker can block on global rollout scan despite cwd filter [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

What version of Codex CLI is running?

What subscription do you have?

Which model were you using?

What platform is your computer?

What terminal emulator and version are you using (if applicable)?

What issue are you seeing?

What steps can reproduce the bug?

What is the expected behavior?

Additional information

Still need to ship something?

RELATED_DISCOVERY

TRENDING