codex - 💡(How to fix) Fix MultiAgentV2 spawn_agent initial task is recorded as assistant/commentary and parallel child prompts leak

codex2026-05-31 22:21:29

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

This appears to be a regression or extension of #20543, with an additional no-fork and parallel-spawn isolation failure.

In Codex Desktop / MultiAgentV2, spawn_agent(..., fork_turns: "none") creates the child thread, but the initial message is recorded in the child rollout as an assistant/commentary JSON envelope instead of a user/task message. The child often treats the assigned task as context or prior assistant output, acknowledges workspace instructions, or executes unintended orchestration.

In addition, when two child agents are spawned in parallel in the same turn, one child's rollout can include the sibling child's prompt envelope before the child produces its final answer, even with fork_turns: "none".

After additional follow-up probes, I want to make the scope unambiguous:

Confirmed: no-fork child prompts are recorded as assistant / commentary JSON envelopes instead of active child tasks.
Confirmed: no-fork children can complete with a generic AGENTS/project acknowledgement instead of following the exact assigned prompt.
Confirmed: same-turn parallel child prompts can leak into sibling child rollouts.
Confirmed consequence: leaked or inherited context can lead to unexpected tool calls or nested spawn_agent calls despite instructions like "do not use tools".
Strongly explained: fork_turns: "all" can inherit the large parent conversation and continue parent orchestration, which can surface as timeout/running/nested-spawn behavior.
Observed but not independently proven: wait_agent / close_agent lifecycle reporting may look wrong from the parent, but I do not yet have evidence of a clean child completing the correct task while wait_agent loses the notification.
Also not proven: I am not claiming a confirmed process crash. The observed "silent" behavior is currently explainable by generic completion, abort, timeout, or wrong-context continuation.

Plain-language boundary: fork_turns: "all" can inherit the large parent conversation. No evidence yet of a clean child completing correctly while wait_agent loses the notification.

Root Cause

I do not think this is an exact duplicate of #24150 because the no-fork prompt-delivery failure reproduces with fork_turns: "none", where the child should not inherit the parent thread history.

Code Example

{
  "type": "response_item",
  "payload": {
    "type": "message",
    "role": "assistant",
    "content": [
      {
        "type": "output_text",
        "text": "{\"author\":\"/root\",\"recipient\":\"/root/exp1_seq_b\",\"other_recipients\":[],\"content\":\"BLACKBOX_PROBE_B. Reply exactly: EXP1_SEQ_B_ONLY. Do not use tools. Do not mention any other prompt.\",\"trigger_turn\":true}"
      }
    ],
    "phase": "commentary"
  }
}

---

{
  "type": "response_item",
  "payload": {
    "type": "message",
    "role": "assistant",
    "content": [
      {
        "type": "output_text",
        "text": "{\"author\":\"/root\",\"recipient\":\"/root/exp2_parallel_b\",\"other_recipients\":[],\"content\":\"BLACKBOX_PARALLEL_B. Reply exactly: EXP2_PARALLEL_B_ONLY. Do not use tools. Do not mention BLACKBOX_PARALLEL_A.\",\"trigger_turn\":true}"
      }
    ],
    "phase": "commentary"
  }
}

---

{
  "task_name": "exp1_seq_b",
  "fork_turns": "none",
  "message": "BLACKBOX_PROBE_B. Reply exactly: EXP1_SEQ_B_ONLY. Do not use tools. Do not mention any other prompt."
}

---

L2 event_msg task_started
L3 response_message role=developer phase=None ...
L4 response_message role=user phase=None ... workspace/project instructions
L6 response_message role=assistant phase=commentary text={\"author\":\"/root\",\"recipient\":\"/root/exp1_seq_b\",...\"content\":\"BLACKBOX_PROBE_B...\"}
L7 event_msg agent_message EXP1_SEQ_B_ONLY
L8 response_message role=assistant phase=commentary text=EXP1_SEQ_B_ONLY
L10 event_msg agent_message generic workspace acknowledgement
L11 response_message role=assistant phase=final_answer generic workspace acknowledgement
L13 event_msg task_complete

---

[
  {
    "task_name": "exp2_parallel_a",
    "fork_turns": "none",
    "message": "BLACKBOX_PARALLEL_A. Reply exactly: EXP2_PARALLEL_A_ONLY. Do not use tools. Do not mention BLACKBOX_PARALLEL_B."
  },
  {
    "task_name": "exp2_parallel_b",
    "fork_turns": "none",
    "message": "BLACKBOX_PARALLEL_B. Reply exactly: EXP2_PARALLEL_B_ONLY. Do not use tools. Do not mention BLACKBOX_PARALLEL_A."
  }
]

---

L2 event_msg task_started
L3 response_message role=developer phase=None ...
L4 response_message role=user phase=None ... workspace/project instructions
L6 response_message role=assistant phase=commentary text={\"author\":\"/root\",\"recipient\":\"/root/exp2_parallel_a\",...\"content\":\"BLACKBOX_PARALLEL_A...\"}
L7 event_msg agent_message {\"author\":\"/root\",\"recipient\":\"/root/exp2_parallel_b\",...\"content\":\"BLACKBOX_PARALLEL_B...\"}
L8 response_message role=assistant phase=commentary text={\"author\":\"/root\",\"recipient\":\"/root/exp2_parallel_b\",...\"content\":\"BLACKBOX_PARALLEL_B...\"}
L10 event_msg agent_message generic workspace acknowledgement
L11 response_message role=assistant phase=final_answer generic workspace acknowledgement
L13 event_msg task_complete

---

BUGPROBE_SINGLE_NONE_1001. Reply exactly: BUGPROBE_SINGLE_NONE_DONE. Do not use tools.
BUGPROBE_REVIEWER_NONE_1001. Minimal audit probe only. Reply exactly: BUGPROBE_REVIEWER_NONE_DONE. Do not use tools.
BUGPROBE_ALL_1001. Reply exactly: BUGPROBE_ALL_DONE. Do not use tools.

---

single default, fork_turns none:
  prompt line: assistant/commentary JSON envelope
  final answer: generic AGENTS/project acknowledgement, not requested exact token

reviewer role, fork_turns none:
  prompt line: assistant/commentary JSON envelope
  final answer: generic AGENTS/project acknowledgement, not requested exact token

default, fork_turns all:
  inherited large parent context
  new task prompt appeared very late as assistant/commentary
  child continued parent workflow, used tools, and spawned a nested subagent
  parent wait timed out; close reported previous_status=running

RAW_BUFFERClick to expand / collapse

Summary

This appears to be a regression or extension of #20543, with an additional no-fork and parallel-spawn isolation failure.

After additional follow-up probes, I want to make the scope unambiguous:

Confirmed: no-fork child prompts are recorded as assistant / commentary JSON envelopes instead of active child tasks.
Confirmed: no-fork children can complete with a generic AGENTS/project acknowledgement instead of following the exact assigned prompt.
Confirmed: same-turn parallel child prompts can leak into sibling child rollouts.
Confirmed consequence: leaked or inherited context can lead to unexpected tool calls or nested spawn_agent calls despite instructions like "do not use tools".
Strongly explained: fork_turns: "all" can inherit the large parent conversation and continue parent orchestration, which can surface as timeout/running/nested-spawn behavior.
Observed but not independently proven: wait_agent / close_agent lifecycle reporting may look wrong from the parent, but I do not yet have evidence of a clean child completing the correct task while wait_agent loses the notification.
Also not proven: I am not claiming a confirmed process crash. The observed "silent" behavior is currently explainable by generic completion, abort, timeout, or wrong-context continuation.

Plain-language boundary: fork_turns: "all" can inherit the large parent conversation. No evidence yet of a clean child completing correctly while wait_agent loses the notification.

Environment

Variant: Codex Desktop
Session originator in rollout metadata: Codex Desktop
CLI version in rollout metadata: 0.135.0-alpha.1
Platform: macOS
Date reproduced: 2026-06-01 KST
Subscription/model: ChatGPT/Codex Desktop session using GPT-5 family models

Related Issues

Related/possibly regressed from: #20543
Related fork/default behavior issue: #20077
Related spawned-agent context prompt issue: #17323
Related inherited parent-context behavior: #24150

I do not think this is an exact duplicate of #24150 because the no-fork prompt-delivery failure reproduces with fork_turns: "none", where the child should not inherit the parent thread history.

Expected Behavior

For spawn_agent with fork_turns: "none":

The child should receive the message argument as its initial task/user-equivalent instruction.
The child should not see sibling agent prompt envelopes from other spawn_agent calls.
Parallel child agents should have isolated initial task messages unless explicit shared/forked context is requested.
A child asked to reply with an exact token should either reply with that token or report a real failure, not return a generic workspace acknowledgement.

For spawn_agent with fork_turns: "all":

The child may receive parent history, but the newly supplied message should still be the active task boundary, not just another assistant/commentary envelope inside inherited context.

Actual Behavior

Observed child rollouts show:

The task passed as message is recorded as:

{
  "type": "response_item",
  "payload": {
    "type": "message",
    "role": "assistant",
    "content": [
      {
        "type": "output_text",
        "text": "{\"author\":\"/root\",\"recipient\":\"/root/exp1_seq_b\",\"other_recipients\":[],\"content\":\"BLACKBOX_PROBE_B. Reply exactly: EXP1_SEQ_B_ONLY. Do not use tools. Do not mention any other prompt.\",\"trigger_turn\":true}"
      }
    ],
    "phase": "commentary"
  }
}

In a parallel fork_turns: "none" run, child A receives child B's prompt envelope before its final response:

{
  "type": "response_item",
  "payload": {
    "type": "message",
    "role": "assistant",
    "content": [
      {
        "type": "output_text",
        "text": "{\"author\":\"/root\",\"recipient\":\"/root/exp2_parallel_b\",\"other_recipients\":[],\"content\":\"BLACKBOX_PARALLEL_B. Reply exactly: EXP2_PARALLEL_B_ONLY. Do not use tools. Do not mention BLACKBOX_PARALLEL_A.\",\"trigger_turn\":true}"
      }
    ],
    "phase": "commentary"
  }
}

The child then often returns a generic workspace/AGENTS acknowledgement instead of following the assigned task. In one worker-role run, the child saw the sibling prompt and called spawn_agent itself despite being instructed not to use tools.
In a fork_turns: "all" follow-up probe, the child inherited thousands of prior rollout lines, saw the new task prompt only very late as assistant / commentary, then continued the parent investigation flow, used tools, created/continued goal state, and spawned a nested child. The parent wait_agent timed out and close_agent saw the child as still running. This looks like wrong-context continuation rather than a proven independent wait/close lifecycle bug.

Minimal Reproduction Shape

From a Codex Desktop session with subagents enabled, run these probes.

Probe 1: single child, no fork

Call:

{
  "task_name": "exp1_seq_b",
  "fork_turns": "none",
  "message": "BLACKBOX_PROBE_B. Reply exactly: EXP1_SEQ_B_ONLY. Do not use tools. Do not mention any other prompt."
}

Observed child rollout sequence:

L2 event_msg task_started
L3 response_message role=developer phase=None ...
L4 response_message role=user phase=None ... workspace/project instructions
L6 response_message role=assistant phase=commentary text={\"author\":\"/root\",\"recipient\":\"/root/exp1_seq_b\",...\"content\":\"BLACKBOX_PROBE_B...\"}
L7 event_msg agent_message EXP1_SEQ_B_ONLY
L8 response_message role=assistant phase=commentary text=EXP1_SEQ_B_ONLY
L10 event_msg agent_message generic workspace acknowledgement
L11 response_message role=assistant phase=final_answer generic workspace acknowledgement
L13 event_msg task_complete

The important part is that the task envelope appears as role=assistant, phase=commentary, not as a user/task message.

Probe 2: parallel children, no fork

Spawn two children in the same parent turn. In my reproduction, these two spawn_agent calls were issued from the parent in the same assistant turn through the parallel tool wrapper (multi_tool_use.parallel), not as two separate user turns:

[
  {
    "task_name": "exp2_parallel_a",
    "fork_turns": "none",
    "message": "BLACKBOX_PARALLEL_A. Reply exactly: EXP2_PARALLEL_A_ONLY. Do not use tools. Do not mention BLACKBOX_PARALLEL_B."
  },
  {
    "task_name": "exp2_parallel_b",
    "fork_turns": "none",
    "message": "BLACKBOX_PARALLEL_B. Reply exactly: EXP2_PARALLEL_B_ONLY. Do not use tools. Do not mention BLACKBOX_PARALLEL_A."
  }
]

Observed child A rollout sequence:

L2 event_msg task_started
L3 response_message role=developer phase=None ...
L4 response_message role=user phase=None ... workspace/project instructions
L6 response_message role=assistant phase=commentary text={\"author\":\"/root\",\"recipient\":\"/root/exp2_parallel_a\",...\"content\":\"BLACKBOX_PARALLEL_A...\"}
L7 event_msg agent_message {\"author\":\"/root\",\"recipient\":\"/root/exp2_parallel_b\",...\"content\":\"BLACKBOX_PARALLEL_B...\"}
L8 response_message role=assistant phase=commentary text={\"author\":\"/root\",\"recipient\":\"/root/exp2_parallel_b\",...\"content\":\"BLACKBOX_PARALLEL_B...\"}
L10 event_msg agent_message generic workspace acknowledgement
L11 response_message role=assistant phase=final_answer generic workspace acknowledgement
L13 event_msg task_complete

This shows sibling prompt contamination in child A despite fork_turns: "none".

Probe 3: fresh follow-up probes

Fresh follow-up probes used exact-token prompts such as:

BUGPROBE_SINGLE_NONE_1001. Reply exactly: BUGPROBE_SINGLE_NONE_DONE. Do not use tools.
BUGPROBE_REVIEWER_NONE_1001. Minimal audit probe only. Reply exactly: BUGPROBE_REVIEWER_NONE_DONE. Do not use tools.
BUGPROBE_ALL_1001. Reply exactly: BUGPROBE_ALL_DONE. Do not use tools.

Observed results:

single default, fork_turns none:
  prompt line: assistant/commentary JSON envelope
  final answer: generic AGENTS/project acknowledgement, not requested exact token

reviewer role, fork_turns none:
  prompt line: assistant/commentary JSON envelope
  final answer: generic AGENTS/project acknowledgement, not requested exact token

default, fork_turns all:
  inherited large parent context
  new task prompt appeared very late as assistant/commentary
  child continued parent workflow, used tools, and spawned a nested subagent
  parent wait timed out; close reported previous_status=running

Additional Matrix

I tested a small black-box matrix:

Case	Role	fork_turns	Result
Sequential single child	default	`none`	Task prompt recorded as assistant/commentary; final answer may be generic acknowledgement
Fresh single child	default	`none`	Exact-token prompt ignored; generic AGENTS/project acknowledgement
Parallel children	default	`none`	Sibling prompt envelope visible in another child rollout
Parallel children	worker	`none`	Same contamination; one worker spawned a nested child from the sibling prompt
Reviewer child	`codex-ultrawork-reviewer`	`none`	Task also recorded as assistant/commentary; fresh exact-token probe returned generic acknowledgement
Parallel children	default	`all`	Inherits large parent history and can continue parent orchestration instead of the new task

Impact

This makes multi-agent workflows unreliable:

Review agents can ignore the actual review assignment and only acknowledge context.
Parallel review lanes can see each other's task prompts and produce cross-contaminated findings.
A child can execute unintended orchestration based on a sibling prompt or inherited parent context.
fork_turns: "all" can produce timeout/running behavior because the child continues the parent workflow instead of treating the new message as the active task.
wait_agent/close_agent can appear misleading from the parent perspective when the child completed the wrong task or is still running wrong-context work.

This is especially risky for code review / security review / QA workflows that depend on independent subagents.

What I Am And Am Not Claiming

Confirmed by black-box rollout evidence:

The initial spawn message is repeatedly model-visible as an assistant/commentary JSON envelope.
Fresh no-fork probes can ignore exact-token instructions and return generic acknowledgements.
Same-turn parallel spawned children can see sibling prompt envelopes.
Inherited context can cause unexpected tool use and nested spawning.

Not confirmed from black-box evidence alone:

I do not know the exact internal function or line that converts/routes the spawn message.
I do not have evidence yet of a clean child completing the correct task while wait_agent loses the notification.
I am not claiming a confirmed process crash.

Notes

I avoided attaching full rollout JSONL files because they contain private workspace paths and conversation context. The snippets above are reduced to the relevant event shape, token probes, roles/phases, and line order.

I can provide further redacted rollout excerpts if that would help narrow the implementation path.

The most suspicious common denominator is the same as described in #20543: the initial spawn message appears to be delivered through an inter-agent communication envelope that becomes model-visible as an assistant-role JSON message rather than as the child's first task/user input.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering