codex - 💡(How to fix) Fix Multi-terminal codex CLI freezes due to SQLite lock contention with no BUSY retry [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#20213Fetched 2026-04-30 06:31:49
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Author
Timeline (top)
labeled ×4unlabeled ×2commented ×1

Running multiple codex CLI instances against the same $CODEX_HOME causes TUI freezes: input echo lags by seconds, and streamed assistant output can deadlock entirely (only ctrl-C recovers). The root cause is contention on the shared state_5.sqlite and logs_2.sqlite, combined with the absence of SQLITE_BUSY retry logic in codex-rs/state/src/runtime/logs.rs.

Error Message

the error to upstream stream-handling code, which appears to drop the exponential backoff before propagating the error to TUI. abandon the channel — log the error and keep reading SSE.

Root Cause

Summary

Running multiple codex CLI instances against the same $CODEX_HOME causes TUI freezes: input echo lags by seconds, and streamed assistant output can deadlock entirely (only ctrl-C recovers). The root cause is contention on the shared state_5.sqlite and logs_2.sqlite, combined with the absence of SQLITE_BUSY retry logic in codex-rs/state/src/runtime/logs.rs.

RAW_BUFFERClick to expand / collapse

Summary

Running multiple codex CLI instances against the same $CODEX_HOME causes TUI freezes: input echo lags by seconds, and streamed assistant output can deadlock entirely (only ctrl-C recovers). The root cause is contention on the shared state_5.sqlite and logs_2.sqlite, combined with the absence of SQLITE_BUSY retry logic in codex-rs/state/src/runtime/logs.rs.

Environment

  • codex-cli 0.125.0
  • macOS 26.3.1, Apple Silicon
  • Single $CODEX_HOME shared across multiple terminals (default usage)

Reproduction

  1. Open 2+ terminals, all running codex against the same $CODEX_HOME.
  2. In each terminal, send a prompt at roughly the same instant.
  3. Observe: TUI input lag, streamed output stalling mid-response, occasional permanent freeze requiring kill.

Evidence

  • logs_2.sqlite (the OTel trace sink) grew to 249 MB / 45,000+ rows in ~1.5 days of normal use; every SSE chunk emits a TRACE row.
  • state_5.sqlite is in WAL mode with busy_timeout = 5s (set in state/src/runtime.rs), but no BUSY retry is implemented in state/src/runtime/logs.rs::insert_logs. On contention the call surfaces the error to upstream stream-handling code, which appears to drop the channel and leave the TUI waiting forever.
  • Truncating logs_2.sqlite (DELETE FROM logs; VACUUM;) immediately reduces the freeze frequency by ~80% — confirming the OTel sink is the dominant contender.
  • Even after truncation, simultaneous prompts across terminals still produce ~1-2s input lag, indicating residual contention on state_5.sqlite (threads / agent_jobs writes).

Suggested fixes

  1. Add BUSY retry to insert_logs (and other sqlx writes against logs_2.sqlite / state_5.sqlite): wrap in a bounded retry loop with exponential backoff before propagating the error to TUI.
  2. Increase busy_timeout from 5 s to 30 s (or make it configurable). 5 s is too short for 200 MB+ WAL files.
  3. Per-process OTel sink: write logs_2.sqlite to a per-PID file (e.g. logs_2.<pid>.sqlite) and merge offline. OTel traces are write-only and rarely read interactively, so sharding is safe.
  4. Bound the OTel TRACE level by default: writing every SSE chunk at TRACE inflates the DB by ~150 MB/day. Default to INFO and let users opt into TRACE.
  5. Treat sink-write failure as non-fatal for the stream pipeline: even if the log insert fails, the model stream consumer should not abandon the channel — log the error and keep reading SSE.

Bonus issue (related, may want a separate ticket)

With $CODEX_HOME set to a non-default directory, codex appears to load hooks from both $CODEX_HOME/hooks.json and ~/.codex/hooks.json, firing each Stop hook twice. Confirmed by paired log entries with identical timestamps in the hook script's own log file.

extent analysis

TL;DR

Implementing a BUSY retry mechanism in insert_logs and increasing the busy_timeout can help mitigate the TUI freezes caused by contention on shared SQLite databases.

Guidance

  • Implement a bounded retry loop with exponential backoff in insert_logs to handle BUSY errors when writing to logs_2.sqlite and state_5.sqlite.
  • Increase the busy_timeout from 5s to a higher value (e.g., 30s) or make it configurable to reduce contention.
  • Consider implementing a per-process OTel sink to reduce write contention on logs_2.sqlite.
  • Review the OTel TRACE level and consider bounding it to reduce the database size and write frequency.

Example

// Example of a bounded retry loop with exponential backoff
let mut retry_count = 0;
let max_retries = 5;
let initial_backoff = 100; // milliseconds
while retry_count < max_retries {
    match insert_logs(db, log) {
        Ok(_) => break,
        Err(sqlx::Error::Database(db_error)) if db_error.code() == "SQLITE_BUSY" => {
            retry_count += 1;
            let backoff = initial_backoff * 2.pow(retry_count);
            std::thread::sleep(std::time::Duration::from_millis(backoff));
        }
        Err(e) => {
            // Handle other errors
            break;
        }
    }
}

Notes

The provided example is a simplified illustration of a retry mechanism and may need to be adapted to the specific requirements of the codex application.

Recommendation

Apply the suggested fix of implementing a BUSY retry mechanism in insert_logs to mitigate the TUI freezes, as it directly addresses the root cause of the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - 💡(How to fix) Fix Multi-terminal codex CLI freezes due to SQLite lock contention with no BUSY retry [1 comments, 2 participants]