codex - 💡(How to fix) Fix multi_agent_v1.close_agent can hang for hours when closing an unresponsive subagent

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

A parent Codex Desktop thread appeared stuck for more than 8 hours after attempting to close an unresponsive subagent. The blocking point appears to be multi_agent_v1.close_agent.

This was observed while working in a local project. The project task itself had already completed; the stuck task was a follow-up code-review subagent.

Root Cause

A parent Codex Desktop thread appeared stuck for more than 8 hours after attempting to close an unresponsive subagent. The blocking point appears to be multi_agent_v1.close_agent.

This was observed while working in a local project. The project task itself had already completed; the stuck task was a follow-up code-review subagent.

Code Example

{"status":{},"timed_out":true}

---

{"target":"019e5a75-f6ef-7c23-9f26-418094bd282b"}

---

aborted by user after 29235.5s

---

~/.codex/sessions/2026/05/18/rollout-2026-05-18T20-38-23-019e3b18-36e0-7a02-adb4-1362ca3d426d.jsonl

---

~/.codex/sessions/2026/05/24/rollout-2026-05-24T22-49-00-019e5a75-f6ef-7c23-9f26-418094bd282b.jsonl

---

~/.codex/session_index.jsonl:56
{"id":"019e5a75-f6ef-7c23-9f26-418094bd282b","thread_name":"Review Task 5 iOS charts","updated_at":"2026-05-24T14:49:20.571331Z"}
RAW_BUFFERClick to expand / collapse

Summary

A parent Codex Desktop thread appeared stuck for more than 8 hours after attempting to close an unresponsive subagent. The blocking point appears to be multi_agent_v1.close_agent.

This was observed while working in a local project. The project task itself had already completed; the stuck task was a follow-up code-review subagent.

Environment

  • macOS 26.3.1 arm64
  • Codex Desktop
  • Parent session cli_version: 0.131.0-alpha.9
  • Subagent session cli_version: 0.133.0-alpha.1
  • Commit involved in reviewed workspace: 61a66df7e829de9fab98f783c2f6d7bb56c6d92f

What Happened

A parent thread spawned a worker subagent for a code review task:

  • Agent id: 019e5a75-f6ef-7c23-9f26-418094bd282b
  • Thread name: Review Task 5 iOS charts

The parent then called wait_agent twice with timeout_ms=600000. Both calls returned:

{"status":{},"timed_out":true}

The parent then called:

{"target":"019e5a75-f6ef-7c23-9f26-418094bd282b"}

via multi_agent_v1.close_agent.

Actual Result

close_agent blocked for 29235.5s and only returned after the user interrupted the turn:

aborted by user after 29235.5s

This made the parent Codex thread look dead/frozen for roughly 8h 7m. The user eventually created a new worktree/session to continue.

Expected Result

close_agent should return promptly, or at least have a bounded timeout/failure result, especially when the subagent has already produced no status and has been interrupted/aborted. It should not block the parent turn for hours.

Evidence Available Locally

Parent session log:

~/.codex/sessions/2026/05/18/rollout-2026-05-18T20-38-23-019e3b18-36e0-7a02-adb4-1362ca3d426d.jsonl

Relevant lines:

  • 8797: first wait_agent
  • 8798: timed out with empty status
  • 8807: second wait_agent
  • 8808: timed out with empty status
  • 8813: close_agent call
  • 8815: close_agent returned after 29235.5s

Subagent session log:

~/.codex/sessions/2026/05/24/rollout-2026-05-24T22-49-00-019e5a75-f6ef-7c23-9f26-418094bd282b.jsonl

The subagent log had only 4 lines: session metadata, task_started, user interruption, and turn_aborted. There was no assistant output or tool call output from the subagent before it was aborted.

Session index entry:

~/.codex/session_index.jsonl:56
{"id":"019e5a75-f6ef-7c23-9f26-418094bd282b","thread_name":"Review Task 5 iOS charts","updated_at":"2026-05-24T14:49:20.571331Z"}

Additional Observation

A process listing after the incident showed multiple long-lived Codex/MCP child processes, including node_repl, xcodebuildmcp, and mcp-server-mobile, some running for 9+ hours. This supports, but does not by itself prove, a possible cleanup/resource leak around subagent/MCP lifecycle.

Suggested Fix

  • Add a hard timeout to close_agent.
  • Make close_agent idempotent for already-aborted or unresponsive agents.
  • Ensure MCP/node_repl child processes are detached or cleaned up without blocking the parent thread indefinitely.
  • Consider returning a structured failure status instead of blocking the parent turn while waiting for shutdown.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING