claude-code - ✅(Solved) Fix [BUG] (Async) Subagents stopping early [1 pull requests, 1 participants]

rudra-sett · 2026-04-14T14:02:42Z

[claude-code] PR 541: fix mcp : disable llm chat background=true upstream bug 47936 - Repository: 2lab-ai/soma-work - Author: zhuge-liang-bot bot - State: clos… # PR #541: fix(mcp): disable llm_chat background=true (upstream bug #47936) - Repository: 2lab-ai/soma-work - Author: zhuge-liang-bot[bot] - State: closed | merged: False - Link: https://github.com/2lab-ai/soma-work/pull/541 ## Description (problem / solution / changelog) ## Summary `llm_chat` / `llm_chat-reply` 의 `background=true` 옵션을 서버 레벨에서 block. 호출 시 descriptive error throw. ## 배경 — Opus 4.7의 잘못된 일반화 `mcp__llm__chat(background:true)` 호출 후 parent agent가 **턴을 종료**하고 결과를 영영 안 기다리는 문제. 근본 원인: | 도구 | 완료 알림 경로 | |------|---------------| | \`Bash(run_in_background:true)\` | Claude Code harness가 \` \` 를 parent에게 자동 송출 → 턴 종료해도 안전 | | \`mcp__llm__chat(background:true)\` | **없음**. MCP 프로토콜상 server-initiated wakeup 표준 부재. 클라이언트가 \`status\`/\`result\`로 pull 해야 함 | Opus 4.7이 두 도구의 \`background\` 키워드를 동일시해서 "깨워줄 것"으로 잘못 일반화 → 턴 종료. 결과적으로 subagent 작업이 abandoned. Upstream 대응 전까지 MCP background 경로 자체를 봉쇄해서 혼동의 근원 제거. ## 변경 내용 - \`handleChat\`: \`args.background === true\` 면 descriptive error throw (upstream #47936 참조 포함) - \`handleChatReply\`: 동일 - Tool schema description 갱신 — \`background\` 가 DISABLED 임을 명시, 이유/근거 링크 포함 ## Verification (self-proof) stdio JSON-RPC 로 실제 서버 띄워서 확인: \`\`\` {"id":2,"method":"tools/call","params":{"name":"chat","arguments":{"prompt":"hello","background":true}}} \`\`\` → \`{"result":{"isError":true,"content":[{"text":"Error: llm chat background=true is DISABLED. ..."}]}}\` \`chat-reply\` 동일 확인. \`initialize\` 응답 정상. ## Rollback Upstream #47936 해결되거나 MCP notification-based wakeup 도입되면: - \`handleChat\` / \`handleChatReply\` 의 \`if (background) throw\` 블록 제거 - Schema description 원복 ## Refs - anthropics/claude-code#47936 — [BUG] (Async) Subagents stopping early - anthropics/claude-code#47518 — Feature request: visibility into scheduled wakeups ## Test plan - [x] TypeScript typecheck 통과 (해당 파일 에러 0) - [x] stdio MCP 서버 실행 후 \`background:true\` 요청 → \`isError:true\` + 명확한 메시지 (직접 관측) - [x] \`background:false\` / 생략 시 기존 동작 유지 (코드 경로 변경 없음) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Zhuge ## Changed files - `mcp-servers/llm/llm-mcp-server.ts` (modified, +28/-4) ### Preflight Checklist - [x] I have searched [existing issues](https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet - [x] This is a single bug report (please file separate reports for different bugs) - [x] I am using the latest version of Claude Code ### What's Wrong? Subagents spawned via the Task tool with run_in_background: true can stop executing before completing their work, and the Claude Agent SDK reports them as completed to the parent agent. The parent has no reliable way to distinguish a subagent that finished successfully from one that was terminated prematurely. I see subagents that: - Make anywhere from 5-40 tool calls over 2-10 minutes - Were still actively making tool calls when execution stopped (stop_reason: None on final messages) - Never reached the final step of their instructions (writing output to a file) - Were reported to the parent as completed Essentially, the agent stopped mid-work, and the SDK told the parent it "completed." This is a recurring pattern, seems to happen in about 14-30% of agent runs. I do not think this is a prompting issue because it always happens after some tool results - the subagent uses a tool, gets the result, and just stops without producing any output. ### What Should Happen? 1. The SDK should not report completed when a subagent was terminated before it chose to stop. The stop_reason: None on the subagent's final messages indicates the agent did not choose end_turn -- something external ended the session. The task notification should reflect this (e.g., terminated or interrupted ). 2. The task notification should include a stop_reason or termination_reason field so the parent agent can more easily determine what happened and decide whether to retry. ### Error Messages/Logs The task notification for the prematurely stopped subagent: ``` aaa583525301d9735 toolu_011KhgSzp2v9Cd9iANWVUUYf /tmp/claude-1000/-home-user/tasks/aaa583525301d9735.output completed Agent "Research TERN-501 and TERN-801" completed 64000 44 143711 ``` Compare to a subagent that actually finished its work (also reports completed, but includes a block): ``` a9969619664ec9d75 toolu_01HTpSYExfZarv9benv4oFAE /tmp/claude-1000/-home-user/task

claude-code2026-04-14 14:02:42

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#47936•Fetched 2026-04-15 06:38:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

rudra-sett

Participants

rudra-sett

Timeline (top)

labeled ×5

Error Message

Error Messages/Logs

Root Cause

This is a recurring pattern, seems to happen in about 14-30% of agent runs. I do not think this is a prompting issue because it always happens after some tool results - the subagent uses a tool, gets the result, and just stops without producing any output.

PR fix notes

PR #541: fix(mcp): disable llm_chat background=true (upstream bug #47936)

Repository: 2lab-ai/soma-work
Author: zhuge-liang-bot[bot]
State: closed | merged: False
Link: https://github.com/2lab-ai/soma-work/pull/541

Description (problem / solution / changelog)

Summary

llm_chat / llm_chat-reply 의 background=true 옵션을 서버 레벨에서 block. 호출 시 descriptive error throw.

배경 — Opus 4.7의 잘못된 일반화

mcp__llm__chat(background:true) 호출 후 parent agent가 턴을 종료하고 결과를 영영 안 기다리는 문제. 근본 원인:

도구	완료 알림 경로
`Bash(run_in_background:true)`	Claude Code harness가 `<task-notification>` 를 parent에게 자동 송출 → 턴 종료해도 안전
`mcp__llm__chat(background:true)`	없음. MCP 프로토콜상 server-initiated wakeup 표준 부재. 클라이언트가 `status`/`result`로 pull 해야 함

Opus 4.7이 두 도구의 `background` 키워드를 동일시해서 "깨워줄 것"으로 잘못 일반화 → 턴 종료. 결과적으로 subagent 작업이 abandoned.

Upstream 대응 전까지 MCP background 경로 자체를 봉쇄해서 혼동의 근원 제거.

변경 내용

`handleChat`: `args.background === true` 면 descriptive error throw (upstream #47936 참조 포함)
`handleChatReply`: 동일
Tool schema description 갱신 — `background` 가 DISABLED 임을 명시, 이유/근거 링크 포함

Verification (self-proof)

stdio JSON-RPC 로 실제 서버 띄워서 확인:

``` {"id":2,"method":"tools/call","params":{"name":"chat","arguments":{"prompt":"hello","background":true}}} ```

→ `{"result":{"isError":true,"content":[{"text":"Error: llm chat background=true is DISABLED. ..."}]}}`

`chat-reply` 동일 확인. `initialize` 응답 정상.

Rollback

Upstream #47936 해결되거나 MCP notification-based wakeup 도입되면:

`handleChat` / `handleChatReply` 의 `if (background) throw` 블록 제거
Schema description 원복

Refs

anthropics/claude-code#47936 — [BUG] (Async) Subagents stopping early
anthropics/claude-code#47518 — Feature request: visibility into scheduled wakeups

Test plan

TypeScript typecheck 통과 (해당 파일 에러 0)
stdio MCP 서버 실행 후 `background:true` 요청 → `isError:true` + 명확한 메시지 (직접 관측)
`background:false` / 생략 시 기존 동작 유지 (코드 경로 변경 없음)

🤖 Generated with Claude Code

Co-Authored-By: Zhuge [email protected]

Changed files

mcp-servers/llm/llm-mcp-server.ts (modified, +28/-4)

Code Example

<task-notification>
<task-id>aaa583525301d9735</task-id>
<tool-use-id>toolu_011KhgSzp2v9Cd9iANWVUUYf</tool-use-id>
<output-file>/tmp/claude-1000/-home-user/tasks/aaa583525301d9735.output</output-file>
<status>completed</status>
<summary>Agent "Research TERN-501 and TERN-801" completed</summary>
<usage><total_tokens>64000</total_tokens><tool_uses>44</tool_uses><duration_ms>143711</duration_ms></usage>
</task-notification>

---

<task-notification>
<task-id>a9969619664ec9d75</task-id>
<tool-use-id>toolu_01HTpSYExfZarv9benv4oFAE</tool-use-id>
<output-file>/tmp/claude-1000/-home-user/tasks/a9969619664ec9d75.output</output-file>
<status>completed</status>
<summary>Agent "Research ORX142 and ORX489" completed</summary>
<result>
All 61 citations passed verification with zero errors and zero warnings. The research notes file is complete and verified.
The research findings have been written to `/home/user/working/research_notes/orexia_assets.md`.
</result>
</task-notification>

---

[assistant] stop_reason: None
TEXT: "Let me read the S-1 for patent expiry details..."
[assistant] stop_reason: None  
TOOL: mcp__local__read_source(source_id: src189)
[assistant] stop_reason: None
TOOL: mcp__local__read_source(source_id: src185)

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

Subagents spawned via the Task tool with run_in_background: true can stop executing before completing their work, and the Claude Agent SDK reports them as <status>completed</status> to the parent agent. The parent has no reliable way to distinguish a subagent that finished successfully from one that was terminated prematurely.

I see subagents that:

Make anywhere from 5-40 tool calls over 2-10 minutes
Were still actively making tool calls when execution stopped (stop_reason: None on final messages)
Never reached the final step of their instructions (writing output to a file)
Were reported to the parent as <status>completed</status>

Essentially, the agent stopped mid-work, and the SDK told the parent it "completed."

What Should Happen?

The SDK should not report completed when a subagent was terminated before it chose to stop. The stop_reason: None on the subagent's final messages indicates the agent did not choose end_turn -- something external ended the session. The task notification should reflect this (e.g., <status>terminated</status> or <status>interrupted</status>).
The task notification should include a stop_reason or termination_reason field so the parent agent can more easily determine what happened and decide whether to retry.

Error Messages/Logs

The task notification for the prematurely stopped subagent:

<task-notification>
<task-id>aaa583525301d9735</task-id>
<tool-use-id>toolu_011KhgSzp2v9Cd9iANWVUUYf</tool-use-id>
<output-file>/tmp/claude-1000/-home-user/tasks/aaa583525301d9735.output</output-file>
<status>completed</status>
<summary>Agent "Research TERN-501 and TERN-801" completed</summary>
<usage><total_tokens>64000</total_tokens><tool_uses>44</tool_uses><duration_ms>143711</duration_ms></usage>
</task-notification>

Compare to a subagent that actually finished its work (also reports completed, but includes a <result> block):

<task-notification>
<task-id>a9969619664ec9d75</task-id>
<tool-use-id>toolu_01HTpSYExfZarv9benv4oFAE</tool-use-id>
<output-file>/tmp/claude-1000/-home-user/tasks/a9969619664ec9d75.output</output-file>
<status>completed</status>
<summary>Agent "Research ORX142 and ORX489" completed</summary>
<result>
All 61 citations passed verification with zero errors and zero warnings. The research notes file is complete and verified.
The research findings have been written to `/home/user/working/research_notes/orexia_assets.md`.
</result>
</task-notification>

The only observable difference is the presence/absence of <result>. The <status> is identical in both cases.

Subagent's final messages before it stopped (from the subagent transcript research-assistant_aaa583525301d9735.jsonl):

The subagent's last three assistant messages all have stop_reason: None (not end_turn), and the agent was actively issuing tool calls:

[assistant] stop_reason: None
TEXT: "Let me read the S-1 for patent expiry details..."
[assistant] stop_reason: None  
TOOL: mcp__local__read_source(source_id: src189)
[assistant] stop_reason: None
TOOL: mcp__local__read_source(source_id: src185)

The agent never chose to stop. It was still working when execution ended.

Steps to Reproduce

Create a parent agent th1at spawns a subagent via the Task tool with run_in_background: true.
Give the subagent a prompt that requires many sequential tool calls before producing output. For example: "Search for and read 15 source documents about [topic], then write a comprehensive summary with citations to /home/user/working/research_notes/output.md."
The subagent will begin actively researching (search, read, search, read...). At some point, execution stops.
The parent receives a task notification with <status>completed</status> and no <result> block.
The output file was never created. The parent discovers this only by checking the filesystem.

Note that it's a relatively transient issue, just one that causes a lot of waste.

Claude Model

Opus

Is this a regression?

No, this never worked

Last Working Version

No response

Claude Code Version

2.1.104

Platform

Anthropic API

Operating System

Ubuntu/Debian Linux

Terminal/Shell

Non-interactive/CI environment

Additional Information

No response

extent analysis

TL;DR

The Claude Agent SDK should be modified to report a terminated or interrupted status when a subagent is stopped prematurely, rather than reporting it as completed.

Guidance

Review the subagent's final messages to determine if it was actively working when execution stopped, indicated by stop_reason: None and recent tool calls.
Modify the task notification to include a stop_reason or termination_reason field to help the parent agent determine what happened and decide whether to retry.
Compare the task notifications for completed and terminated subagents to identify differences, such as the presence or absence of a <result> block.
Test the subagent with a prompt that requires many sequential tool calls to reproduce the issue and verify the fix.

Example

No code snippet is provided as the issue is related to the Claude Agent SDK's behavior and not a specific code implementation.

Notes

The issue is transient and occurs in about 14-30% of agent runs, making it challenging to reproduce and debug. The fix should focus on modifying the SDK to report the correct status and provide more information about the termination reason.

Recommendation

Apply a workaround by modifying the task notification to include a stop_reason or termination_reason field, allowing the parent agent to better handle terminated subagents. This will help mitigate the issue until a permanent fix is implemented in the Claude Agent SDK.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

claude-code - ✅(Solved) Fix [BUG] (Async) Subagents stopping early [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Root Cause

PR fix notes

PR #541: fix(mcp): disable llm_chat background=true (upstream bug #47936)

Description (problem / solution / changelog)

Summary

배경 — Opus 4.7의 잘못된 일반화

변경 내용

Verification (self-proof)

Rollback

Refs

Test plan

Changed files

Code Example

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING