claude-code - 💡(How to fix) Fix [codex plugin 1.0.2] Broker IPC hangs on Windows (~20% failure rate) — skill dispatch never completes [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#48520Fetched 2026-04-16 06:57:53
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Timeline (top)
labeled ×5commented ×1

Invoking /codex:rescue, /codex:review, /codex:adversarial-review through the skill dispatch hangs indefinitely about 20% of the time. The turn never completes — no output, no error, no timeout. Direct invocation of codex exec from Git Bash always works.

Error Message

Invoking /codex:rescue, /codex:review, /codex:adversarial-review through the skill dispatch hangs indefinitely about 20% of the time. The turn never completes — no output, no error, no timeout. Direct invocation of codex exec from Git Bash always works.

  1. Treat waitForBrokerEndpoint() timeout as fatal — surface an error to Claude Code instead of hanging Promise forever For users running Claude Code on Windows, this silently burns 20% of codex-related interactions. Hard to diagnose without plugin internals, and no user-visible error means users often blame their own prompts.

Root Cause

Tracing through the plugin source (paths under ~/.claude/plugins/cache/openai-codex/codex/1.0.2/):

User /codex:rescue
  → skill wrapper (rescue.md)
    → node codex-companion.mjs task
      → withAppServer(cwd, fn)                    [codex.mjs:607]
        → CodexAppServerClient.connect(cwd)        [codex.mjs:610]
          → ensureBrokerSession(cwd)               [broker-lifecycle.mjs:113]
            → spawnBrokerProcess()                 [broker-lifecycle.mjs:59-70]
            → waitForBrokerEndpoint(2000ms)        [broker-lifecycle.mjs:149]
              ⨯ HANGS HERE on Windows

Fix Action

Workaround

Bypass the skill entirely and call codex exec directly from bash. Example wrapper:

```bash

direct codex exec + zombie cleanup

shopt -s nullglob for dir in /tmp/cxc-*; do PID=$(cat "$dir/broker.pid" 2>/dev/null) if [[ -n "$PID" ]] && kill -0 "$PID" 2>/dev/null; then continue; fi AGE_MIN=$(( ( $(date +%s) - $(stat -c %Y "$dir") ) / 60 )) [[ $AGE_MIN -ge 30 ]] && rm -rf "$dir" done codex exec "$PROMPT" ```

This is stable across 100+ invocations, confirming the issue is in the broker/IPC layer, not codex CLI itself.

Code Example

User /codex:rescue
  → skill wrapper (rescue.md)
    → node codex-companion.mjs task
withAppServer(cwd, fn)                    [codex.mjs:607]
CodexAppServerClient.connect(cwd)        [codex.mjs:610]
ensureBrokerSession(cwd)               [broker-lifecycle.mjs:113]
spawnBrokerProcess()                 [broker-lifecycle.mjs:59-70]
waitForBrokerEndpoint(2000ms)        [broker-lifecycle.mjs:149]
HANGS HERE on Windows
RAW_BUFFERClick to expand / collapse

Environment

  • OS: Windows 11 Pro 10.0.26200
  • Claude Code: 2.1.109
  • Shell: Git Bash
  • Node: v24.14.0
  • codex plugin: openai-codex/codex v1.0.2 (installed 2026-04-04)
  • codex CLI: 0.118.0 (npm global)

Summary

Invoking /codex:rescue, /codex:review, /codex:adversarial-review through the skill dispatch hangs indefinitely about 20% of the time. The turn never completes — no output, no error, no timeout. Direct invocation of codex exec from Git Bash always works.

Observed Behavior

  • Skill dispatch: ~80% succeed normally, ~20% hang for >20 minutes with no output
  • No stderr, no recovery — user must abandon the turn
  • Happens across all three commands (rescue, review, adversarial-review)
  • Over one day of normal use: 80+ invocations, 18+ hangs

Forensic Evidence

After a hang, the following zombie state is left behind:

  1. Stale broker directory /tmp/cxc-<id>/ persists with:
    • broker.pid containing a PID that is no longer alive
    • broker.log containing only deprecation warnings, no IPC activity
    • No *.sock file (socket never bound, or cleaned on crash)
  2. Empty state directory /tmp/codex-companion/<workspace-hash>/ exists but jobs/ was never created — indicating writeJobFile() was never reached.
  3. PID in broker.pid is a genuine zombiekill -0 $PID returns failure, but directory is not cleaned.

Root Cause Analysis

Tracing through the plugin source (paths under ~/.claude/plugins/cache/openai-codex/codex/1.0.2/):

User /codex:rescue
  → skill wrapper (rescue.md)
    → node codex-companion.mjs task
      → withAppServer(cwd, fn)                    [codex.mjs:607]
        → CodexAppServerClient.connect(cwd)        [codex.mjs:610]
          → ensureBrokerSession(cwd)               [broker-lifecycle.mjs:113]
            → spawnBrokerProcess()                 [broker-lifecycle.mjs:59-70]
            → waitForBrokerEndpoint(2000ms)        [broker-lifecycle.mjs:149]
              ⨯ HANGS HERE on Windows

Hypotheses (Windows-specific)

  1. Unix socket path emulation: brokerEndpoint at /tmp/cxc-*/*.sock is created under Windows' emulated /tmp. Under Node.js on Windows, unix socket connection is unreliable — especially when the broker child process is spawned with detached: true, stdio: ["ignore", logFd, logFd] (broker-lifecycle.mjs:59-70). File descriptor writes are not atomic on Windows.
  2. Line buffering + CRLF: handleLine() in app-server.mjs:117-128 splits JSON-RPC messages on \n. If the broker emits \r\n on Windows, upstream parsers may drop notifications — including the completion signal.
  3. Race condition in scheduleInferredCompletion() (codex.mjs:367-388): uses a 250ms setTimeout to detect "inferred completion" against state.pendingCollaborations and state.activeSubagentTurns. A missed notification + stale set membership → turn never completes → captureTurn() (codex.mjs:553-605) never resolves.
  4. Broker race on cold start: first connection after a new broker spawn is where the 20% failure clusters. Warm broker (previous task succeeded) tends to stay stable.

Minimal Repro Attempt

No reliable repro — it's intermittent. Rough steps:

  1. Leave codex plugin idle for 30+ minutes (broker dies)
  2. Invoke /codex:rescue "any prompt"
  3. ~1 in 5 invocations hangs indefinitely; rest complete normally

Workaround

Bypass the skill entirely and call codex exec directly from bash. Example wrapper:

```bash

direct codex exec + zombie cleanup

shopt -s nullglob for dir in /tmp/cxc-*; do PID=$(cat "$dir/broker.pid" 2>/dev/null) if [[ -n "$PID" ]] && kill -0 "$PID" 2>/dev/null; then continue; fi AGE_MIN=$(( ( $(date +%s) - $(stat -c %Y "$dir") ) / 60 )) [[ $AGE_MIN -ge 30 ]] && rm -rf "$dir" done codex exec "$PROMPT" ```

This is stable across 100+ invocations, confirming the issue is in the broker/IPC layer, not codex CLI itself.

Requested Fix Direction

  1. Treat waitForBrokerEndpoint() timeout as fatal — surface an error to Claude Code instead of hanging Promise forever
  2. Add CRLF normalization in handleLine() before JSON-RPC parse
  3. On Windows, fall back from unix socket to named pipe (\.\pipe\codex-*) which is native Windows IPC and better supported by Node.js
  4. Add pre-flight cleanup of stale /tmp/cxc-* directories in ensureBrokerSession()

Impact

For users running Claude Code on Windows, this silently burns 20% of codex-related interactions. Hard to diagnose without plugin internals, and no user-visible error means users often blame their own prompts.

extent analysis

TL;DR

The most likely fix for the intermittent hanging issue with the codex plugin on Windows is to modify the waitForBrokerEndpoint() function to treat timeouts as fatal and surface an error, and to add CRLF normalization in handleLine().

Guidance

  1. Modify waitForBrokerEndpoint(): Treat timeouts as fatal by rejecting the promise with an error, allowing the issue to be surfaced to the user instead of hanging indefinitely.
  2. Add CRLF normalization: In handleLine(), normalize line endings to \n before parsing JSON-RPC messages to prevent issues with Windows' \r\n line endings.
  3. Consider using named pipes: On Windows, consider falling back from unix sockets to named pipes (\.\pipe\codex-*) for IPC, which is native to Windows and better supported by Node.js.
  4. Implement pre-flight cleanup: Add a cleanup mechanism in ensureBrokerSession() to remove stale /tmp/cxc-* directories before spawning a new broker process.

Example

No code example is provided as the issue is complex and requires modifications to the existing codebase. However, the suggested changes can be implemented by modifying the relevant functions in the codex plugin.

Notes

The provided workaround using codex exec directly from bash is stable, but a proper fix is needed to resolve the issue within the plugin. The suggested changes aim to address the root causes of the issue, including the unreliable unix socket connection on Windows and the potential race condition in scheduleInferredCompletion().

Recommendation

Apply the workaround using codex exec directly from bash until a proper fix is implemented in the codex plugin. The recommended changes to waitForBrokerEndpoint() and handleLine() should be prioritized to address the intermittent hanging issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING