codex - 💡(How to fix) Fix Codex App crashes after 0.130 → 0.131 auto-update: logs_2.sqlite migrations modified in place (sqlx checksum drift) + 30s GUI backfill cap incompatible with 900s backend lease

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Root cause A — logs_2.sqlite migrations modified in place

Fix Action

Fix / Workaround

Symptom B (fires after Symptom A is patched)

Code Example

Codex cannot access its local database
  Location: /mnt/c/Users/<user>/.codex/state_5.sqlite
  Cause: failed to initialize state runtime at /mnt/c/Users/<user>/.codex:
         migration 1 was previously applied but has been modified

---

timed out waiting for state db backfill at /mnt/c/Users/<user>/.codex
  after 30s (status: running)

---

python codex-repair.py fix --apply

---

python codex-repair.py extract-checksums --json > codex-checksums-0.131.0.json
RAW_BUFFERClick to expand / collapse

What version of the Codex App are you using (From “About Codex” dialog)?

26.519.2081.0 (Codex.exe ProductVersion: 26.519.21041) Bundled backend CLI (Linux ELF run inside WSL2): codex-cli 0.131.0-alpha.9 Last known-good version before this crash: 0.130.0-alpha.5

What subscription do you have?

ChatGPT Pro

What platform is your computer?

Microsoft Windows NT 10.0.26200.0 x64 Detail: Windows 11 Enterprise, build 26200, 64-bit. Codex App backend runs as a Linux ELF inside WSL2 (Ubuntu-24.04, kernel 6.6.87.2-microsoft-standard-WSL2).

What issue are you seeing?

After Codex Desktop auto-updated from 0.130.0-alpha.5 to 0.131.0-alpha.9, the app refuses to start. Two distinct bugs fire in sequence:

Symptom A (fires immediately on launch)

Codex cannot access its local database
  Location: /mnt/c/Users/<user>/.codex/state_5.sqlite
  Cause: failed to initialize state runtime at /mnt/c/Users/<user>/.codex:
         migration 1 was previously applied but has been modified

The "Repair Codex local data now? [y/N]" prompt is destructive — accepting wipes thread metadata. Declining leaves Codex unusable.

Symptom B (fires after Symptom A is patched)

timed out waiting for state db backfill at /mnt/c/Users/<user>/.codex
  after 30s (status: running)

The GUI gives up after 30 s even though the backend's own backfill lease is 900 s (PR #11377), and the backend would have completed in ~50 s on my install (325 active sessions + 40 archived = ~3.5 GB total session jsonl).

Root cause A — logs_2.sqlite migrations modified in place

The SQL bytes of migration 1 (logs) and migration 2 (logs feedback log body) were edited in place between 0.130.x and 0.131.x. sqlx hashes each migration's SQL bytes with SHA-384 at build time, stores the hash in the binary, and refuses any DB whose stored checksum doesn't match — even when the resulting final table schema is fully forward-compatible. This violates sqlx's documented "migrations are immutable once published" contract.

Concrete checksum drift (full anchor list in attached codex-checksums-0.131.0.json):

MigrationDB-stored hash (post-0.130)Binary-embedded hash (0.131)
logs_2 m1 logsF477E605…009639EAFE599BE9…
logs_2 m2 logs feedback log body5C82B1A6…CF6C93AF074A9022…

All 32 state_5.sqlite migration checksums do match between versions, so this is isolated to logs_2.sqlite. Both old and new SQL produce the same final 12-column logs table — the difference is in the SQL bytes themselves, not in any schema-meaningful change.

Root cause B — hard-coded 30 s GUI backfill cap

The GUI startup gate waits for state_5.sqlite.backfill_state.status='complete' with a hard-coded 30 s deadline. The string literal "timed out waiting for state db backfill at {} after {}s (status: {})" is in the binary, but there is no env var, config knob, or CLI flag controlling that 30 s. The backend's own lease is 900 s, so the GUI cap is internally inconsistent with the backend's design.

For users with non-trivial session histories (a few hundred MB+), a cold backfill routinely takes 30–120 s, so the GUI gives up while the backend is still making progress.

0.132.0 does NOT fix either

The rust-v0.132.0 changelog (latest as of 2026-05-20) lists 24 bug fixes; none touches the migrator validation path or the GUI startup timeout.

The closest upstream work is PR #16924 (merged 2026-04-06), which relaxes the migrator only when the DB has migrations the binary doesn't know about (DB newer than binary). That does NOT cover the symmetric case here, where the binary has the same migration's SQL bytes hashing to a different value than what's recorded in the DB.

What steps can reproduce the bug?

  1. Install Codex Desktop 0.130.0-alpha.5 (or any 0.130.x release) and use it daily for ≥1 day so logs_2.sqlite accumulates rows and _sqlx_migrations is populated with checksums computed from that version's migration SQL bytes.
  2. Let auto-update jump to 0.131.x (0.131.0-alpha.9 in my case via MSIX OpenAI.Codex 26.519.2081.0).
  3. Launch Codex → immediate migration 1 was previously applied but has been modified crash (Symptom A).
  4. Manually UPDATE _sqlx_migrations SET checksum = ? in logs_2.sqlite for version IN (1, 2) with the binary-expected values, then relaunch. With > ~50 MB total session jsonl on disk: GUI hits the 30 s backfill timeout (Symptom B).

A reproducible recovery toolkit at https://github.com/xdifu/codex-repair extracts the binary-expected checksums automatically (via SHA-384 anchor scanning + DB description-based cluster localization) and applies the schema-verified fixes. python codex-repair.py doctor against any affected install reports the same drift; python codex-repair.py extract-checksums --json produces the full evidence list attached here.

Investigation summary

  1. Located the actual backend binary: %USERPROFILE%\.codex\bin\wsl\<hash>\codex (Linux ELF in WSL2, not the MSIX-bundled Windows Codex.exe). The crash path /mnt/c/... was the WSL view of the Windows drive.
  2. Extracted all 33 embedded migration checksums (32 for state_5.sqlite, 2 for logs_2.sqlite) from the backend ELF by scanning for (sql, sha384(sql)) byte-adjacency anchors.
  3. Diffed extracted checksums against each DB's _sqlx_migrations rows — found mismatch only for logs_2 m1 and m2.
  4. Verified the actual logs table schema (PRAGMA table_info(logs) shows all 12 expected columns including feedback_log_body, thread_id, process_uuid, estimated_bytes) is fully compatible with both the old and new migration SQL — proving the change is cosmetic.
  5. Rewrote _sqlx_migrations.checksum for the 2 affected rows. Symptom A cleared; Symptom B appeared.
  6. Confirmed Symptom B's 30 s timeout is hard-coded by grepping the binary for the timeout literal and related env-var names; no config path exists.
  7. Backfilled manually from Python (parsing sessions/**/*.jsonl first-line session_meta), bypassing the 30 s GUI cap, then UPDATE backfill_state SET status='complete'. Codex started cleanly with full thread history intact.

Full archeology and the 5-phase timeline in docs/root-cause-analysis.md.

What is the expected behavior?

Symptom A: a Codex App update should never fail to open a 0.130.x-created logs_2.sqlite when the final table schema is fully forward-compatible. Either:

  • Fix 1 (preferred, OpenAI-internal hygiene): never modify a published migration. Express the new desired schema as 003_…sql / 004_…sql rather than editing 001_…sql / 002_…sql in place. This is the canonical sqlx approach and avoids any client-side compatibility shim.

  • Fix 2 (defensive, in codex-rs/state/src/runtime.rs): in the migrator's VersionMismatch arm, resolve the binary-expected SQL for the failing migration, parse its CREATE TABLE / ALTER TABLE / ADD COLUMN targets, run PRAGMA table_info(<table>) on the live DB, and if every expected column is already present, UPDATE _sqlx_migrations SET checksum = <new_hash> and log a warning. This is symmetric to PR #16924, which already forgives the opposite direction.

  • Fix 3 (minimum-effort escape hatch): accept CODEX_TOLERATE_MODIFIED_MIGRATIONS=1 env var that triggers Fix 2's code path explicitly.

Symptom B: the GUI's 30 s backfill cap should either be removed (let the backend run to its own 900 s lease boundary with a progress indicator) or made configurable via ~/.codex/config.toml (e.g. [startup] backfill_timeout_secs = 30). The current 30 s vs 900 s mismatch between GUI and backend is internally inconsistent and breaks any install with non-trivial session history.

Additional information

Platform reach

This is not a Windows-only bug, even though Windows users hit it first/worst:

  • Symptom A (sqlx migration checksum drift): 100% platform-agnostic. The migration SQL bytes and their SHA-384 hashes are computed at compile time from codex-rs/state/migrations/logs_2/*.sql and baked into the Rust binary, so every platform's binary embeds the same hashes. Any user — macOS, Linux, or Windows — going 0.130.x → 0.131.x with prior logs_2.sqlite history hits the same migration 1 was previously applied but has been modified wall. Mac users have already reported drift symptoms in #20608, #18364, #17304.

  • Symptom B (30 s GUI backfill timeout): the 30 s constant itself is also in the cross-platform Rust source. But the practical trigger rate is much higher on Windows because the backend runs inside WSL2 and accesses sessions/*.jsonl via the 9P protocol over /mnt/c/ — roughly 5–10× slower than native APFS/ext4. A 200-MB history that backfills in ~8 s on macOS will routinely take 40–120 s on Windows. Mac users with multi-GB histories or spinning-rust HDDs are still latently affected.

A fix should target the cross-platform Rust source (codex-rs/state/src/runtime.rs), not any Windows-specific code path.

User-side recovery (already works today, no upstream fix needed)

A standalone Python toolkit is published under Apache-2.0 at https://github.com/xdifu/codex-repair.

Capabilities:

  1. Auto-locates the active backend binary (no hard-coded hash subdir).
  2. Extracts every embedded migration checksum by scanning the ELF for (sql, sha384(sql)) anchors, using DB-known migration description strings as a cluster locator (robust across future binary versions — no version-pinned constants).
  3. Diffs against each DB's _sqlx_migrations.
  4. Verifies schema compatibility via PRAGMA table_info before rewriting any checksum (refuses unsafe updates).
  5. Reproduces backfill in Python independent of the 30 s GUI cap, then marks backfill_state.status='complete' with full session metadata.
  6. Has a --use-isolated-copy mode that copies the DBs to a temp dir before reading, so it's safe to run a diagnose pass while Codex is open.

Usage by an affected user is one command:

python codex-repair.py fix --apply

Run history on my install: full repair from initial crash to healthy state (365 threads indexed, no errors) completed in under 5 minutes once root cause was identified.

Related upstream issues / PRs

  • #23251WSL CLI cannot share Windows Codex App CODEX_HOME: migration 1 was previously applied but has been modified (open; my own earlier report; describes the WSL-sharing subset of this same root cause)
  • #17304Desktop project sidebar hides active threads after state DB migration drift (open; family of related drift bugs)
  • #17354, #17540, #18364, #19873 — overlapping sidebar / thread-disappearing reports stemming from _sqlx_migrations drift after auto-update
  • #16924fix(sqlite): don't hard fail migrator if DB is newer (merged; opposite direction)
  • #11377feat: prevent double backfill (introduced 900 s lease — the figure that makes the 30 s GUI cap so glaringly inconsistent)
  • #16877Make thread metadata updates tolerate pending backfill (open)
  • #13772Move sqlite logs to a dedicated database (context for why logs_2.sqlite exists as a separate DB from state_5.sqlite)

Attached evidence

codex-checksums-0.131.0.json — full list of all 34 migration anchors (32 state_5 + 2 logs_2) extracted from my 0.131.0-alpha.9 backend binary at %USERPROFILE%\.codex\bin\wsl\7945a00f33bdc140\codex. Anyone with the same backend version can reproduce by running:

python codex-repair.py extract-checksums --json > codex-checksums-0.131.0.json

and diffing against mine to confirm identical hashes per migration.

Note

I am happy to contribute the schema-diff helper or a CODEX_TOLERATE_MODIFIED_MIGRATIONS runtime flag as a PR upstream if maintainers consider Fix 2 or Fix 3 the right direction — per the contributing guide, I'll wait for an explicit invitation before opening one.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - 💡(How to fix) Fix Codex App crashes after 0.130 → 0.131 auto-update: logs_2.sqlite migrations modified in place (sqlx checksum drift) + 30s GUI backfill cap incompatible with 900s backend lease