codex - 💡(How to fix) Fix codex-cli stalls for exactly 300s between tool result and next model request (stream disconnected)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

2026-05-20T12:26:44.545840Z WARN codex_core::session::turn: stream disconnected - retrying sampling request (1/5 in 197ms)... 2026-05-20T13:17:50.046571Z WARN codex_core::session::turn: stream disconnected - retrying sampling request (1/5 in 218ms)...

  • Multiple codex_core::tools::router: error=Exit code: 1 errors in the same session (tool execution failures)

Root Cause

This is NOT a network issue because:

  • The same local proxy handles Claude Code traffic without any stalls
  • The retry after 300s succeeds in <250ms every time
  • time.idle values on normal requests (22-134s) show the model endpoint is responsive

Fix Action

Fix / Workaround

After client: close, the turn should immediately either:

  1. Dispatch the next tool call from the response, or
  2. Send the next client: new request with tool results, or
  3. Complete the turn

Suggested investigation areas:

  1. The 300s constant in codex_core::session::turn — is this a stream keepalive/watchdog timer?
  2. Possible race condition between the response parser completing and the turn state machine's next-step dispatch
  3. Whether app-server event consumer lagged indicates backpressure that blocks the turn from advancing
  4. Whether the responses wire API path has a different state machine than the chat path

Code Example

{
  "schemaVersion": 1,
  "generatedAt": "1779332610s since unix epoch",
  "overallStatus": "fail",
  "codexVersion": "0.132.0",
  "checks": {
    "app_server.status": {
      "id": "app_server.status",
      "category": "app-server",
      "status": "ok",
      "summary": "background server is not running",
      "details": {
        "control socket": "C:\\Users\\zhouj\\.codex\\app-server-control\\app-server-control.sock",
        "daemon state dir": "C:\\Users\\zhouj\\.codex\\app-server-daemon",
        "mode": "ephemeral",
        "pid file": "C:\\Users\\zhouj\\.codex\\app-server-daemon\\app-server.pid (missing)",
        "settings": "C:\\Users\\zhouj\\.codex\\app-server-daemon\\settings.json (missing)",
        "status": "not running",
        "update-loop pid file": "C:\\Users\\zhouj\\.codex\\app-server-daemon\\app-server-updater.pid (missing)"
      },
      "remediation": null,
      "durationMs": 0
    },
    "auth.credentials": {
      "id": "auth.credentials",
      "category": "auth",
      "status": "ok",
      "summary": "auth is configured",
      "details": {
        "auth file": "C:\\Users\\zhouj\\.codex\\auth.json",
        "auth storage mode": "File",
        "stored API key": "true",
        "stored ChatGPT tokens": "false",
        "stored agent identity": "false",
        "stored auth mode": "api_key"
      },
      "remediation": null,
      "durationMs": 0
    },
    "config.load": {
      "id": "config.load",
      "category": "config",
      "status": "ok",
      "summary": "config loaded",
      "details": {
        "CODEX_HOME": "C:\\Users\\zhouj\\.codex",
        "config.toml": "C:\\Users\\zhouj\\.codex\\config.toml",
        "config.toml parse": "ok",
        "enabled feature flags": "shell_tool, shell_snapshot, terminal_resize_reflow, sqlite, hooks, enable_request_compression, multi_agent, apps, tool_search, tool_suggest, plugins, plugin_hooks, in_app_browser, browser_use, browser_use_external, computer_use, plugin_sharing, image_generation, skill_mcp_dependency_install, steer, guardian_approval, goals, collaboration_modes, tool_call_mcp_elicitation, personality, fast_mode, tui_app_server, workspace_dependencies",
        "feature flag overrides": "goals=true",
        "feature flags enabled": "28",
        "log dir": "C:\\Users\\zhouj\\.codex\\log",
        "mcp servers": "0",
        "model": "gpt-5.5",
        "model provider": "custom",
        "sqlite home": "C:\\Users\\zhouj\\.codex"
      },
      "remediation": null,
      "durationMs": 0
    },
    "installation": {
      "id": "installation",
      "category": "install",
      "status": "ok",
      "summary": "installation looks consistent",
      "details": {
        "PATH codex #1": "C:\\Users\\zhouj\\AppData\\Roaming\\npm\\codex",
        "PATH codex #2": "C:\\Users\\zhouj\\AppData\\Roaming\\npm\\codex.cmd",
        "PATH codex entries": "2",
        "current executable": "C:\\Users\\zhouj\\AppData\\Roaming\\npm\\node_modules\\@openai\\codex\\node_modules\\@openai\\codex-win32-x64\\vendor\\x86_64-pc-windows-msvc\\codex\\codex.exe",
        "install context": "npm",
        "managed by bun": "false",
        "managed by npm": "true",
        "managed package root": "C:\\Users\\zhouj\\AppData\\Roaming\\npm\\node_modules\\@openai\\codex",
        "npm update target": "C:\\Users\\zhouj\\AppData\\Roaming\\npm\\node_modules\\@openai\\codex"
      },
      "remediation": null,
      "durationMs": 358
    },
    "mcp.config": {
      "id": "mcp.config",
      "category": "mcp",
      "status": "ok",
      "summary": "no MCP servers configured",
      "details": {},
      "remediation": null,
      "durationMs": 0
    },
    "network.env": {
      "id": "network.env",
      "category": "network",
      "status": "ok",
      "summary": "network-related environment looks readable",
      "details": {
        "proxy env vars present": "NO_PROXY, no_proxy"
      },
      "remediation": null,
      "durationMs": 0
    }
  }
}

---

model_provider = "custom"
   model = "gpt-5.5"
   model_reasoning_effort = "xhigh"
   sandbox_mode = "danger-full-access"
   
   [model_providers.custom]
   name = "custom"
   wire_api = "responses"
   supports_websockets = false
   requires_openai_auth = true
   base_url = "http://127.0.0.1:15721/v1"

2. Start a codex session with a complex task that requires multiple tool calls (file reads, shell commands, code edits)

3. Let it run for 30+ minutes

4. Observe periodic 300-second gaps where no HTTP requests are sent to the model endpoint, followed by "stream disconnected - retrying sampling request (1/5)"

Detailed sequence before first stall:

12:20:40 client: close (model returned tool_call response, time.idle=29.5s)
12:20:40 ToolCall: shell_command (read test_supervisor.py)
12:20:40 ToolCall: shell_command (read test_session_adapter.py)
12:20:40 ToolCall: shell_command (git diff)
12:21:03 client: close (model responded, time.idle=22.4s)
12:21:04 ToolCall: shell_command (Get-Content SKILL.md) <- executes in ~200ms
12:21:04 client: new   <- sends tool result back to model
12:21:44 client: close time.idle=40.2s  <- model responds normally
         *** 300 seconds of silence — no client:new, no ToolCall ***
12:26:44 stream disconnected - retrying (1/5 in 197ms)
12:26:44 client: new   <- retry works instantly
The model's response at 12:21:44 completes normally. After client: close, codex_core should either issue the next client: new or complete the turn. Instead, nothing happens for exactly 300 seconds.

No thread ID available (custom provider endpoint).

### What is the expected behavior?

After `client: close`, the turn should immediately either:
1. Dispatch the next tool call from the response, or
2. Send the next `client: new` request with tool results, or
3. Complete the turn

There should be no 300-second idle gap between a successful response and the next action. The 300s value is suspiciously precise (300.003s and 300.021s), suggesting a hardcoded timeout constant in `codex_core::session::turn` that fires when the turn state machine fails to advance after receiving a complete response.

### Additional information

**Analysis:**

The 300-second value appears to be a hardcoded stream-activity watchdog in `codex_core::session::turn`. The pattern suggests a race condition where the turn state machine sometimes fails to advance after receiving a complete model response. It sits idle until the watchdog fires and forces a retry.

**This is NOT a network issue because:**
- The same local proxy handles Claude Code traffic without any stalls
- The retry after 300s succeeds in <250ms every time
- `time.idle` values on normal requests (22-134s) show the model endpoint is responsive

**Possibly related:**
- `app-server event consumer lagged; dropping ignored events skipped=93/135/231`TUI event loop under pressure, may indicate backpressure blocking the turn from advancing
- Multiple `codex_core::tools::router: error=Exit code: 1` errors in the same session (tool execution failures)
- `model_reasoning_effort = "xhigh"` produces longer model responses and more tool calls per turn, possibly increasing the chance of hitting the race condition

**Suggested investigation areas:**
1. The 300s constant in `codex_core::session::turn` — is this a stream keepalive/watchdog timer?
2. Possible race condition between the response parser completing and the turn state machine's next-step dispatch
3. Whether `app-server event consumer lagged` indicates backpressure that blocks the turn from advancing
4. Whether the `responses` wire API path has a different state machine than the `chat` path

**Log grep patterns for reproduction:**
RAW_BUFFERClick to expand / collapse

What version of Codex CLI is running?

codex-cli 0.132.0

What subscription do you have?

Plus

Which model were you using?

gpt-5.5

What platform is your computer?

Microsoft Windows NT 10.0.26100.0 x64

What terminal emulator and version are you using (if applicable)?

VS Code integrated terminal (PowerShell), codex running via openai.chatgpt VS Code extension

Codex doctor report

{
  "schemaVersion": 1,
  "generatedAt": "1779332610s since unix epoch",
  "overallStatus": "fail",
  "codexVersion": "0.132.0",
  "checks": {
    "app_server.status": {
      "id": "app_server.status",
      "category": "app-server",
      "status": "ok",
      "summary": "background server is not running",
      "details": {
        "control socket": "C:\\Users\\zhouj\\.codex\\app-server-control\\app-server-control.sock",
        "daemon state dir": "C:\\Users\\zhouj\\.codex\\app-server-daemon",
        "mode": "ephemeral",
        "pid file": "C:\\Users\\zhouj\\.codex\\app-server-daemon\\app-server.pid (missing)",
        "settings": "C:\\Users\\zhouj\\.codex\\app-server-daemon\\settings.json (missing)",
        "status": "not running",
        "update-loop pid file": "C:\\Users\\zhouj\\.codex\\app-server-daemon\\app-server-updater.pid (missing)"
      },
      "remediation": null,
      "durationMs": 0
    },
    "auth.credentials": {
      "id": "auth.credentials",
      "category": "auth",
      "status": "ok",
      "summary": "auth is configured",
      "details": {
        "auth file": "C:\\Users\\zhouj\\.codex\\auth.json",
        "auth storage mode": "File",
        "stored API key": "true",
        "stored ChatGPT tokens": "false",
        "stored agent identity": "false",
        "stored auth mode": "api_key"
      },
      "remediation": null,
      "durationMs": 0
    },
    "config.load": {
      "id": "config.load",
      "category": "config",
      "status": "ok",
      "summary": "config loaded",
      "details": {
        "CODEX_HOME": "C:\\Users\\zhouj\\.codex",
        "config.toml": "C:\\Users\\zhouj\\.codex\\config.toml",
        "config.toml parse": "ok",
        "enabled feature flags": "shell_tool, shell_snapshot, terminal_resize_reflow, sqlite, hooks, enable_request_compression, multi_agent, apps, tool_search, tool_suggest, plugins, plugin_hooks, in_app_browser, browser_use, browser_use_external, computer_use, plugin_sharing, image_generation, skill_mcp_dependency_install, steer, guardian_approval, goals, collaboration_modes, tool_call_mcp_elicitation, personality, fast_mode, tui_app_server, workspace_dependencies",
        "feature flag overrides": "goals=true",
        "feature flags enabled": "28",
        "log dir": "C:\\Users\\zhouj\\.codex\\log",
        "mcp servers": "0",
        "model": "gpt-5.5",
        "model provider": "custom",
        "sqlite home": "C:\\Users\\zhouj\\.codex"
      },
      "remediation": null,
      "durationMs": 0
    },
    "installation": {
      "id": "installation",
      "category": "install",
      "status": "ok",
      "summary": "installation looks consistent",
      "details": {
        "PATH codex #1": "C:\\Users\\zhouj\\AppData\\Roaming\\npm\\codex",
        "PATH codex #2": "C:\\Users\\zhouj\\AppData\\Roaming\\npm\\codex.cmd",
        "PATH codex entries": "2",
        "current executable": "C:\\Users\\zhouj\\AppData\\Roaming\\npm\\node_modules\\@openai\\codex\\node_modules\\@openai\\codex-win32-x64\\vendor\\x86_64-pc-windows-msvc\\codex\\codex.exe",
        "install context": "npm",
        "managed by bun": "false",
        "managed by npm": "true",
        "managed package root": "C:\\Users\\zhouj\\AppData\\Roaming\\npm\\node_modules\\@openai\\codex",
        "npm update target": "C:\\Users\\zhouj\\AppData\\Roaming\\npm\\node_modules\\@openai\\codex"
      },
      "remediation": null,
      "durationMs": 358
    },
    "mcp.config": {
      "id": "mcp.config",
      "category": "mcp",
      "status": "ok",
      "summary": "no MCP servers configured",
      "details": {},
      "remediation": null,
      "durationMs": 0
    },
    "network.env": {
      "id": "network.env",
      "category": "network",
      "status": "ok",
      "summary": "network-related environment looks readable",
      "details": {
        "proxy env vars present": "NO_PROXY, no_proxy"
      },
      "remediation": null,
      "durationMs": 0
    }
  }
}

What issue are you seeing?

Codex CLI periodically stalls for exactly 300 seconds (5 minutes) during a turn, showing "Working (Xm Ys)" with no network activity. After the 300s timeout, it logs stream disconnected - retrying sampling request (1/5) and resumes normally. This happens multiple times per session (5 occurrences across ~15 hours), making long-running agentic tasks unreliable.

From TUI log (~/.codex/log/codex-tui.log):

Instance 1 — exact 300.003s gap: 2026-05-20T12:21:44.542667Z INFO codex_core::client: close time.busy=10.9ms time.idle=40.2s *** 300 seconds of silence — no client:new, no ToolCall *** 2026-05-20T12:26:44.545840Z WARN codex_core::session::turn: stream disconnected - retrying sampling request (1/5 in 197ms)... 2026-05-20T12:26:44.746956Z INFO codex_core::client: new <- retry succeeds immediately

Instance 2 — exact 300.021s gap: 2026-05-20T13:12:50.025022Z INFO codex_core::client: close time.busy=25.9ms time.idle=58.0s *** 300 seconds of silence *** 2026-05-20T13:17:50.046571Z WARN codex_core::session::turn: stream disconnected - retrying sampling request (1/5 in 218ms)... 2026-05-20T13:17:50.267074Z INFO codex_core::client: new <- retry succeeds immediately

All 5 occurrences in this session:

  • 2026-05-20T12:26:44 (retry in 197ms)
  • 2026-05-20T13:17:50 (retry in 218ms)
  • 2026-05-20T15:42:53 (retry in 210ms)
  • 2026-05-20T15:52:04 (retry in 180ms)
  • 2026-05-21T00:56:48 (retry in 196ms)

The retry always succeeds in under 250ms, proving the network path is healthy. The stall is entirely internal to codex_core.

Additional context:

  • app-server event consumer lagged; dropping ignored events skipped=93/135/231 appears in the same session
  • The same local proxy (127.0.0.1:15721) handles Claude Code traffic without any stalls
  • Average time.idle across all 583 requests in this session is 85s (normal model thinking time)
  • Large time.idle values up to 1262s occur during active HTTP streams WITHOUT triggering "stream disconnected", confirming the 300s watchdog only fires between requests

What steps can reproduce the bug?

  1. Configure codex with a custom provider endpoint:

    model_provider = "custom"
    model = "gpt-5.5"
    model_reasoning_effort = "xhigh"
    sandbox_mode = "danger-full-access"
    
    [model_providers.custom]
    name = "custom"
    wire_api = "responses"
    supports_websockets = false
    requires_openai_auth = true
    base_url = "http://127.0.0.1:15721/v1"
  2. Start a codex session with a complex task that requires multiple tool calls (file reads, shell commands, code edits)

  3. Let it run for 30+ minutes

  4. Observe periodic 300-second gaps where no HTTP requests are sent to the model endpoint, followed by "stream disconnected - retrying sampling request (1/5)"

Detailed sequence before first stall:

12:20:40 client: close (model returned tool_call response, time.idle=29.5s) 12:20:40 ToolCall: shell_command (read test_supervisor.py) 12:20:40 ToolCall: shell_command (read test_session_adapter.py) 12:20:40 ToolCall: shell_command (git diff) 12:21:03 client: close (model responded, time.idle=22.4s) 12:21:04 ToolCall: shell_command (Get-Content SKILL.md) <- executes in ~200ms 12:21:04 client: new <- sends tool result back to model 12:21:44 client: close time.idle=40.2s <- model responds normally *** 300 seconds of silence — no client:new, no ToolCall *** 12:26:44 stream disconnected - retrying (1/5 in 197ms) 12:26:44 client: new <- retry works instantly The model's response at 12:21:44 completes normally. After client: close, codex_core should either issue the next client: new or complete the turn. Instead, nothing happens for exactly 300 seconds.

No thread ID available (custom provider endpoint).

What is the expected behavior?

After client: close, the turn should immediately either:

  1. Dispatch the next tool call from the response, or
  2. Send the next client: new request with tool results, or
  3. Complete the turn

There should be no 300-second idle gap between a successful response and the next action. The 300s value is suspiciously precise (300.003s and 300.021s), suggesting a hardcoded timeout constant in codex_core::session::turn that fires when the turn state machine fails to advance after receiving a complete response.

Additional information

Analysis:

The 300-second value appears to be a hardcoded stream-activity watchdog in codex_core::session::turn. The pattern suggests a race condition where the turn state machine sometimes fails to advance after receiving a complete model response. It sits idle until the watchdog fires and forces a retry.

This is NOT a network issue because:

  • The same local proxy handles Claude Code traffic without any stalls
  • The retry after 300s succeeds in <250ms every time
  • time.idle values on normal requests (22-134s) show the model endpoint is responsive

Possibly related:

  • app-server event consumer lagged; dropping ignored events skipped=93/135/231 — TUI event loop under pressure, may indicate backpressure blocking the turn from advancing
  • Multiple codex_core::tools::router: error=Exit code: 1 errors in the same session (tool execution failures)
  • model_reasoning_effort = "xhigh" produces longer model responses and more tool calls per turn, possibly increasing the chance of hitting the race condition

Suggested investigation areas:

  1. The 300s constant in codex_core::session::turn — is this a stream keepalive/watchdog timer?
  2. Possible race condition between the response parser completing and the turn state machine's next-step dispatch
  3. Whether app-server event consumer lagged indicates backpressure that blocks the turn from advancing
  4. Whether the responses wire API path has a different state machine than the chat path

Log grep patterns for reproduction:

grep "stream disconnected" ~/.codex/log/codex-tui.log
grep "client: close\|client: new" ~/.codex/log/codex-tui.log
grep "app-server event consumer lagged" ~/.codex/log/codex-tui.log
Session stats:

Log spans: 2026-05-05 to 2026-05-21
Total client new/close pairs: 1255
Stream disconnects: 5
Average time.idle: 85s
Max time.idle (without triggering disconnect): 5463s

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - 💡(How to fix) Fix codex-cli stalls for exactly 300s between tool result and next model request (stream disconnected)