claude-code - ✅(Solved) Fix [BUG] (Async) Subagents stopping early [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#47936Fetched 2026-04-15 06:38:07
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
labeled ×5

Error Message

Error Messages/Logs

Root Cause

This is a recurring pattern, seems to happen in about 14-30% of agent runs. I do not think this is a prompting issue because it always happens after some tool results - the subagent uses a tool, gets the result, and just stops without producing any output.

PR fix notes

PR #541: fix(mcp): disable llm_chat background=true (upstream bug #47936)

Description (problem / solution / changelog)

Summary

llm_chat / llm_chat-replybackground=true 옵션을 서버 레벨에서 block. 호출 시 descriptive error throw.

배경 — Opus 4.7의 잘못된 일반화

mcp__llm__chat(background:true) 호출 후 parent agent가 턴을 종료하고 결과를 영영 안 기다리는 문제. 근본 원인:

도구완료 알림 경로
`Bash(run_in_background:true)`Claude Code harness가 `<task-notification>` 를 parent에게 자동 송출 → 턴 종료해도 안전
`mcp__llm__chat(background:true)`없음. MCP 프로토콜상 server-initiated wakeup 표준 부재. 클라이언트가 `status`/`result`로 pull 해야 함

Opus 4.7이 두 도구의 `background` 키워드를 동일시해서 "깨워줄 것"으로 잘못 일반화 → 턴 종료. 결과적으로 subagent 작업이 abandoned.

Upstream 대응 전까지 MCP background 경로 자체를 봉쇄해서 혼동의 근원 제거.

변경 내용

  • `handleChat`: `args.background === true` 면 descriptive error throw (upstream #47936 참조 포함)
  • `handleChatReply`: 동일
  • Tool schema description 갱신 — `background` 가 DISABLED 임을 명시, 이유/근거 링크 포함

Verification (self-proof)

stdio JSON-RPC 로 실제 서버 띄워서 확인:

``` {"id":2,"method":"tools/call","params":{"name":"chat","arguments":{"prompt":"hello","background":true}}} ```

→ `{"result":{"isError":true,"content":[{"text":"Error: llm chat background=true is DISABLED. ..."}]}}`

`chat-reply` 동일 확인. `initialize` 응답 정상.

Rollback

Upstream #47936 해결되거나 MCP notification-based wakeup 도입되면:

  • `handleChat` / `handleChatReply` 의 `if (background) throw` 블록 제거
  • Schema description 원복

Refs

  • anthropics/claude-code#47936 — [BUG] (Async) Subagents stopping early
  • anthropics/claude-code#47518 — Feature request: visibility into scheduled wakeups

Test plan

  • TypeScript typecheck 통과 (해당 파일 에러 0)
  • stdio MCP 서버 실행 후 `background:true` 요청 → `isError:true` + 명확한 메시지 (직접 관측)
  • `background:false` / 생략 시 기존 동작 유지 (코드 경로 변경 없음)

🤖 Generated with Claude Code

Co-Authored-By: Zhuge [email protected]

Changed files

  • mcp-servers/llm/llm-mcp-server.ts (modified, +28/-4)

Code Example

<task-notification>
<task-id>aaa583525301d9735</task-id>
<tool-use-id>toolu_011KhgSzp2v9Cd9iANWVUUYf</tool-use-id>
<output-file>/tmp/claude-1000/-home-user/tasks/aaa583525301d9735.output</output-file>
<status>completed</status>
<summary>Agent "Research TERN-501 and TERN-801" completed</summary>
<usage><total_tokens>64000</total_tokens><tool_uses>44</tool_uses><duration_ms>143711</duration_ms></usage>
</task-notification>

---

<task-notification>
<task-id>a9969619664ec9d75</task-id>
<tool-use-id>toolu_01HTpSYExfZarv9benv4oFAE</tool-use-id>
<output-file>/tmp/claude-1000/-home-user/tasks/a9969619664ec9d75.output</output-file>
<status>completed</status>
<summary>Agent "Research ORX142 and ORX489" completed</summary>
<result>
All 61 citations passed verification with zero errors and zero warnings. The research notes file is complete and verified.
The research findings have been written to `/home/user/working/research_notes/orexia_assets.md`.
</result>
</task-notification>

---

[assistant] stop_reason: None
TEXT: "Let me read the S-1 for patent expiry details..."
[assistant] stop_reason: None  
TOOL: mcp__local__read_source(source_id: src189)
[assistant] stop_reason: None
TOOL: mcp__local__read_source(source_id: src185)
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

Subagents spawned via the Task tool with run_in_background: true can stop executing before completing their work, and the Claude Agent SDK reports them as <status>completed</status> to the parent agent. The parent has no reliable way to distinguish a subagent that finished successfully from one that was terminated prematurely.

I see subagents that:

  • Make anywhere from 5-40 tool calls over 2-10 minutes
  • Were still actively making tool calls when execution stopped (stop_reason: None on final messages)
  • Never reached the final step of their instructions (writing output to a file)
  • Were reported to the parent as <status>completed</status>

Essentially, the agent stopped mid-work, and the SDK told the parent it "completed."

This is a recurring pattern, seems to happen in about 14-30% of agent runs. I do not think this is a prompting issue because it always happens after some tool results - the subagent uses a tool, gets the result, and just stops without producing any output.

What Should Happen?

  1. The SDK should not report completed when a subagent was terminated before it chose to stop. The stop_reason: None on the subagent's final messages indicates the agent did not choose end_turn -- something external ended the session. The task notification should reflect this (e.g., <status>terminated</status> or <status>interrupted</status>).

  2. The task notification should include a stop_reason or termination_reason field so the parent agent can more easily determine what happened and decide whether to retry.

Error Messages/Logs

The task notification for the prematurely stopped subagent:

<task-notification>
<task-id>aaa583525301d9735</task-id>
<tool-use-id>toolu_011KhgSzp2v9Cd9iANWVUUYf</tool-use-id>
<output-file>/tmp/claude-1000/-home-user/tasks/aaa583525301d9735.output</output-file>
<status>completed</status>
<summary>Agent "Research TERN-501 and TERN-801" completed</summary>
<usage><total_tokens>64000</total_tokens><tool_uses>44</tool_uses><duration_ms>143711</duration_ms></usage>
</task-notification>

Compare to a subagent that actually finished its work (also reports completed, but includes a <result> block):

<task-notification>
<task-id>a9969619664ec9d75</task-id>
<tool-use-id>toolu_01HTpSYExfZarv9benv4oFAE</tool-use-id>
<output-file>/tmp/claude-1000/-home-user/tasks/a9969619664ec9d75.output</output-file>
<status>completed</status>
<summary>Agent "Research ORX142 and ORX489" completed</summary>
<result>
All 61 citations passed verification with zero errors and zero warnings. The research notes file is complete and verified.
The research findings have been written to `/home/user/working/research_notes/orexia_assets.md`.
</result>
</task-notification>

The only observable difference is the presence/absence of <result>. The <status> is identical in both cases.

Subagent's final messages before it stopped (from the subagent transcript research-assistant_aaa583525301d9735.jsonl):

The subagent's last three assistant messages all have stop_reason: None (not end_turn), and the agent was actively issuing tool calls:

[assistant] stop_reason: None
TEXT: "Let me read the S-1 for patent expiry details..."
[assistant] stop_reason: None  
TOOL: mcp__local__read_source(source_id: src189)
[assistant] stop_reason: None
TOOL: mcp__local__read_source(source_id: src185)

The agent never chose to stop. It was still working when execution ended.

Steps to Reproduce

  1. Create a parent agent th1at spawns a subagent via the Task tool with run_in_background: true.
  2. Give the subagent a prompt that requires many sequential tool calls before producing output. For example: "Search for and read 15 source documents about [topic], then write a comprehensive summary with citations to /home/user/working/research_notes/output.md."
  3. The subagent will begin actively researching (search, read, search, read...). At some point, execution stops.
  4. The parent receives a task notification with <status>completed</status> and no <result> block.
  5. The output file was never created. The parent discovers this only by checking the filesystem.

Note that it's a relatively transient issue, just one that causes a lot of waste.

Claude Model

Opus

Is this a regression?

No, this never worked

Last Working Version

No response

Claude Code Version

2.1.104

Platform

Anthropic API

Operating System

Ubuntu/Debian Linux

Terminal/Shell

Non-interactive/CI environment

Additional Information

No response

extent analysis

TL;DR

The Claude Agent SDK should be modified to report a terminated or interrupted status when a subagent is stopped prematurely, rather than reporting it as completed.

Guidance

  • Review the subagent's final messages to determine if it was actively working when execution stopped, indicated by stop_reason: None and recent tool calls.
  • Modify the task notification to include a stop_reason or termination_reason field to help the parent agent determine what happened and decide whether to retry.
  • Compare the task notifications for completed and terminated subagents to identify differences, such as the presence or absence of a <result> block.
  • Test the subagent with a prompt that requires many sequential tool calls to reproduce the issue and verify the fix.

Example

No code snippet is provided as the issue is related to the Claude Agent SDK's behavior and not a specific code implementation.

Notes

The issue is transient and occurs in about 14-30% of agent runs, making it challenging to reproduce and debug. The fix should focus on modifying the SDK to report the correct status and provide more information about the termination reason.

Recommendation

Apply a workaround by modifying the task notification to include a stop_reason or termination_reason field, allowing the parent agent to better handle terminated subagents. This will help mitigate the issue until a permanent fix is implemented in the Claude Agent SDK.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING