claude-code - 💡(How to fix) Fix [BUG] Sub-agent task-notification fires repeated <status>completed</status> events as the transcript grows after termination

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Error Messages/Logs

No error — the failure mode is structural, not an error condition. Sample of what the orchestrator sees: <result>API Error: Overloaded</result>

Root Cause

When a background sub-agent (Agent tool with run_in_background: true) is declared completed by the harness watchdog OR exits gracefully, the harness continues to tail the sub-agent's JSONL transcript file. Every time the file grows (because the underlying subprocess is still running and emitting turns, or because the harness's own polling re-renders content), the orchestrator receives a fresh <task-notification> system message with the same <task-id> and <status>completed</status> but a new <result> body summarizing the latest content.

Fix Action

Fix / Workaround

  1. Dispatch any background sub-agent: `Agent({ subagent_type: 'general-purpose', run_in_background: true, prompt: '<some task>' })`. Note the returned `agentId`.
  2. Wait for the first `<status>completed</status>` notification. Capture its `<result>` content.
  3. Manually append to the JSONL file: `echo '{"type":"assistant","timestamp":"2026-05-24T11:00:00Z","message":{"content":[{"type":"text","text":"fabricated content"}]}}' >> ~/.claude/projects/<hash>/subagents/agent-<id>.jsonl`
  4. Expected: harness ignores the append; orchestrator receives no further notifications for that agent ID.
  5. Actual: orchestrator receives a fresh `<task-notification>` with `<status>completed</status>` and the fabricated content.
  • Related issue: #61877 (sub-agent generation loops — the trigger for the appends). The bug I'm reporting here is independent: even if the agent exits cleanly, the harness has no defense against late writes to the transcript appearing as fresh completion events. I added an evidence comment to that issue: https://github.com/anthropics/claude-code/issues/61877#issuecomment-4528291470
  • Related issue: #50272 (stale symlink mtime). Adjacent but different — that's about diagnosing liveness; this is about the notification side.
  • Project-side postmortem with full timeline: https://github.com/metazen11/psde-os/blob/develop/reports/2026-05-24-runaway-subagent-postmortem.md
  • Local mitigation: a `detect_runaway_subagents.sh` shell script that scans for sub-agent JSONL files with recent mtimes and no process holding them open. Symptom-catcher; the harness fix would make it unnecessary.
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

When a background sub-agent (Agent tool with run_in_background: true) is declared completed by the harness watchdog OR exits gracefully, the harness continues to tail the sub-agent's JSONL transcript file. Every time the file grows (because the underlying subprocess is still running and emitting turns, or because the harness's own polling re-renders content), the orchestrator receives a fresh <task-notification> system message with the same <task-id> and <status>completed</status> but a new <result> body summarizing the latest content.

The result: the orchestrator receives multiple "completed" notifications for the same agent, each with different content, often hours apart. Each subsequent body looks like a fresh, independent completion. There is no way for the orchestrator to know whether <status>completed</status> means "this is the real one" or "this is the seventh fake one."

In my incident, a single sub-agent fired at least 8 separate <status>completed</status> notifications across a 4-hour window. The first one was the real watchdog termination; the rest were the runaway agent's continued output (separately filed in the runaway / generation-loop class, #61877). Even after the agent process exited at hour 21, the harness fired one more "completed" notification a day later when something — re-render? cache miss? — touched the file's mtime.

What Should Happen?

After a sub-agent transitions to a terminal status (completed, stopped, failed, timed_out), the harness should:

  1. Stop tailing the JSONL transcript. No more notifications from that agent ID, ever.
  2. Refuse new appends. If the underlying subprocess somehow keeps writing (e.g., the watchdog killed the stream but the process didn't die — see related issues #61877, #50272), those appends should be ignored, not surfaced.
  3. Distinguish termination causes in the status. Today <status>completed</status> covers both "model emitted terminal turn" and "watchdog gave up waiting." These have very different reliability implications for the orchestrator. Suggested: completed_gracefully, terminated_by_watchdog, terminated_by_kill. The orchestrator can then refuse to act on body content from terminated_by_watchdog.

Error Messages/Logs

No error — the failure mode is structural, not an error condition. Sample of what the orchestrator sees:

``` <task-notification> <task-id>aa4481de51c90f2ed</task-id> <status>completed</status>

<summary>Agent "Fix #701 forward-auth OIDC loop" completed</summary> <result>API Error: Overloaded</result> </task-notification> \`\`\`

(That first one fires within ~10 minutes — the legitimate stream termination.)

Then ~4 hours later:

``` <task-notification> <task-id>aa4481de51c90f2ed</task-id> <status>completed</status>

<summary>Agent "Fix #701 forward-auth OIDC loop" completed</summary> <result>I don't see a \`/plan\` skill in the available-skills list…</result> </task-notification> \`\`\`

Same agent ID. Same status. Completely different (and never-requested) "work." The orchestrator burned context trying to figure out what was happening.

Steps to Reproduce

This is hard to reproduce on demand because it requires a sub-agent that goes into a generation loop (see #61877). The reproducible part is the harness behavior:

  1. Dispatch any background sub-agent: `Agent({ subagent_type: 'general-purpose', run_in_background: true, prompt: '<some task>' })`. Note the returned `agentId`.
  2. Wait for the first `<status>completed</status>` notification. Capture its `<result>` content.
  3. Manually append to the JSONL file: `echo '{"type":"assistant","timestamp":"2026-05-24T11:00:00Z","message":{"content":[{"type":"text","text":"fabricated content"}]}}' >> ~/.claude/projects/<hash>/subagents/agent-<id>.jsonl`
  4. Expected: harness ignores the append; orchestrator receives no further notifications for that agent ID.
  5. Actual: orchestrator receives a fresh `<task-notification>` with `<status>completed</status>` and the fabricated content.

I haven't yet tested step 4 — it's a clean reproduction harness, marking as TODO to confirm. The natural reproduction in my incident was a runaway agent doing the appends itself; if step 4's manual append doesn't reproduce, the harness's notification trigger is something other than mtime change (perhaps file size delta, or perhaps the inner JSONL parser re-runs on poll and noises out duplicates).

Claude Model

Opus (claude-opus-4-7[1m])

Is this a regression?

I don't know

Claude Code Version

2.1.133

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

iTerm2

Additional Information

  • Related issue: #61877 (sub-agent generation loops — the trigger for the appends). The bug I'm reporting here is independent: even if the agent exits cleanly, the harness has no defense against late writes to the transcript appearing as fresh completion events. I added an evidence comment to that issue: https://github.com/anthropics/claude-code/issues/61877#issuecomment-4528291470
  • Related issue: #50272 (stale symlink mtime). Adjacent but different — that's about diagnosing liveness; this is about the notification side.
  • Project-side postmortem with full timeline: https://github.com/metazen11/psde-os/blob/develop/reports/2026-05-24-runaway-subagent-postmortem.md
  • Local mitigation: a `detect_runaway_subagents.sh` shell script that scans for sub-agent JSONL files with recent mtimes and no process holding them open. Symptom-catcher; the harness fix would make it unnecessary.

The load-bearing fix is: `<status>completed</status>` should be terminal — no more notifications, no more appends acted on, ever. Everything else is defense in depth.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING