claude-code - ✅(Solved) Fix claude -p exits before Task(run_in_background=true) subagents complete, truncating output [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#52917Fetched 2026-04-25 06:17:17
View on GitHub
Comments
0
Participants
1
Timeline
8
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×3labeled ×3referenced ×2

When a skill running under claude -p (print mode, non-interactive) launches sub-agents via Task(run_in_background=true), the parent agent emits its immediate text output and exits with a valid stream-json result before the background sub-agents finish. Downstream tools that capture the print-mode output see a truncated transcript — typically 5-10× shorter than what the same skill produces interactively.

This is distinct from the interactive/TUI manifestations tracked in #50572, #48657, #28221, #52856, and #44075, all of which are open. None of them specifically name the non-interactive claude -p shape, where there is no turn loop to poll or hooks to fire after the print-mode process has already exited with exit 0.

Error Message

This is the failure mode we hit in clauditor (an evaluator for Claude Code skills). We added a detect-and-warn path (clauditor#97) that loudly surfaces the truncation to users, but any skill that legitimately uses run_in_background=true for parallel research fan-out currently cannot be evaluated end-to-end. Our Tier 2 research spike (clauditor ADR) concluded that a clauditor-side fix is infeasible without upstream support, because:

  • Stream-json events for background-task lifecycle (dispatch / complete / error) so downstream parsers can detect that result isn't the true end-of-work.

Root Cause

This is the failure mode we hit in clauditor (an evaluator for Claude Code skills). We added a detect-and-warn path (clauditor#97) that loudly surfaces the truncation to users, but any skill that legitimately uses run_in_background=true for parallel research fan-out currently cannot be evaluated end-to-end. Our Tier 2 research spike (clauditor ADR) concluded that a clauditor-side fix is infeasible without upstream support, because:

Fix Action

Fix / Workaround

Observed: result message arrives shortly after the parent dispatches the background tasks. The sub-agents' findings never make it into the stream; the final synthesis message in the skill's SKILL.md never fires.

  • A --wait-for-background-tasks flag on claude -p that keeps the process alive until all dispatched Task background jobs complete (or a timeout fires).
  • A headless claude status --json (as proposed in #52856) that external tools can poll for state == idle and drained task queue.
  • Stream-json events for background-task lifecycle (dispatch / complete / error) so downstream parsers can detect that result isn't the true end-of-work.
  • A PostTask hook in print mode (as proposed in #28221 / #48657), firable before the process exits.

PR fix notes

PR #116: feat(runner): add --sync-tasks flag to force synchronous Task spawns (Tier 1.5 of #103)

Description (problem / solution / changelog)

Summary

Adds --sync-tasks to validate / grade / capture / run and a sibling EvalSpec.sync_tasks spec field. Both set CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1 in the claude -p subprocess env (a documented Anthropic control), forcing Task(run_in_background=true) spawns to run synchronously. This resolves the output-truncation failure mode for skills that use background sub-agents for parallel fanout — without modifying the skill.

Tier 1.5 of #103. Research + go/no-go decision recorded in docs/adr/transport-research-103.md; upstream feature request filed at anthropics/claude-code#52917. Closes beads issue clauditor-dqz.

What changed

  • Helper: env_with_sync_tasks() in runner.py — non-mutating per .claude/rules/non-mutating-scrub.md, composes cleanly with env_without_api_key.
  • Spec field: EvalSpec.sync_tasks: bool = False with load-time bool-guard validator per .claude/rules/constant-with-type-info.md.
  • Precedence: SkillSpec.run(sync_tasks_override=...) resolves CLI > spec > default in one place per .claude/rules/spec-cli-precedence.md. When effective, mutates the outgoing env dict to add the var; composes with any prior env_override (e.g. from --no-api-key).
  • CLI: --sync-tasks on all four skill-invoking commands. grade --baseline composes the env var into the baseline arm's run_raw call so both sides of the delta share sync mode.
  • #97 suppression: the background-task: warning is suppressed when the env var is set — spawning Task under CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1 forces it sync, so warning the user is spurious.
  • Docs: new Skill Compatibility section in skill-usage.md (matrix + fidelity caveats + --sync-tasks opt-in guidance + refactoring recipes); flag entry in cli-reference.md; spec-field entry in eval-spec-reference.md.

Fidelity caveats (called out in docs)

--sync-tasks resolves the capture side of #103 for the parallel-fanout sub-case. It does NOT close the async-fidelity gap — under the flag, you evaluate a different execution model than production ships. Async-specific logic (race conditions, late-arriving results, progress-while-async branches) stays untested, and timing/cost metrics skew. Full rationale in docs/adr/transport-research-103.md; the true async-fidelity fix still depends on upstream.

Test plan

  • uv run ruff check src/ tests/ — clean
  • uv run pytest --cov=clauditor — 2550 passed, 98.41% coverage (gate 80%)
  • New tests cover: env helper non-mutation + composition, spec field parsing + round-trip + bool-guard, SkillSpec.run precedence (CLI-only / spec-only / CLI-over-spec / env-compose / explicit-False), CLI flags on all four commands (including grade --baseline threading to run_raw), #97 warning suppression under the env var, and the negative case (env var set to "0" does NOT suppress).
  • Manual smoke against a skill with Task(run_in_background=true) to confirm end-to-end behavior matches the test-suite expectations.

Related

  • Issue: #103 (Tier 1.5 scope)
  • ADR: docs/adr/transport-research-103.md
  • Upstream: anthropics/claude-code#52917
  • Depends on existing env_override plumbing from #64 / #95.

Changed files

  • docs/cli-reference.md (modified, +2/-1)
  • docs/eval-spec-reference.md (modified, +12/-0)
  • docs/skill-usage.md (modified, +82/-0)
  • src/clauditor/cli/capture.py (modified, +20/-2)
  • src/clauditor/cli/grade.py (modified, +32/-1)
  • src/clauditor/cli/run.py (modified, +20/-2)
  • src/clauditor/cli/validate.py (modified, +16/-0)
  • src/clauditor/runner.py (modified, +35/-0)
  • src/clauditor/schemas.py (modified, +35/-0)
  • src/clauditor/spec.py (modified, +20/-2)
  • tests/test_cli.py (modified, +167/-0)
  • tests/test_runner.py (modified, +117/-0)
  • tests/test_schemas.py (modified, +47/-0)
  • tests/test_spec.py (modified, +114/-0)

Code Example

# A skill that fans out 2-3 Task(run_in_background=true) sub-agents,
# then synthesizes their findings in a final message.
claude -p --output-format stream-json --verbose "/my-parallel-research-skill"
RAW_BUFFERClick to expand / collapse

Summary

When a skill running under claude -p (print mode, non-interactive) launches sub-agents via Task(run_in_background=true), the parent agent emits its immediate text output and exits with a valid stream-json result before the background sub-agents finish. Downstream tools that capture the print-mode output see a truncated transcript — typically 5-10× shorter than what the same skill produces interactively.

This is distinct from the interactive/TUI manifestations tracked in #50572, #48657, #28221, #52856, and #44075, all of which are open. None of them specifically name the non-interactive claude -p shape, where there is no turn loop to poll or hooks to fire after the print-mode process has already exited with exit 0.

Repro

# A skill that fans out 2-3 Task(run_in_background=true) sub-agents,
# then synthesizes their findings in a final message.
claude -p --output-format stream-json --verbose "/my-parallel-research-skill"

Observed: result message arrives shortly after the parent dispatches the background tasks. The sub-agents' findings never make it into the stream; the final synthesis message in the skill's SKILL.md never fires.

Expected: either (a) the print-mode process waits for background tasks before emitting result, or (b) the stream-json schema exposes an event that downstream tools can poll on to know background work is still pending.

Context: evaluation tooling

This is the failure mode we hit in clauditor (an evaluator for Claude Code skills). We added a detect-and-warn path (clauditor#97) that loudly surfaces the truncation to users, but any skill that legitimately uses run_in_background=true for parallel research fan-out currently cannot be evaluated end-to-end. Our Tier 2 research spike (clauditor ADR) concluded that a clauditor-side fix is infeasible without upstream support, because:

  1. claude -p --help exposes no --wait / --poll / --wait-for-background flag.
  2. Stream-json emits no background_task_pending or equivalent event.
  3. Attempting to inject a synthetic follow-up turn via --resume is unreliable: #50572 documents that background shells are reaped on turn end, and #40692 notes completion notifications arrive after the stream has closed.

Suggested fixes (any one would unblock)

  • A --wait-for-background-tasks flag on claude -p that keeps the process alive until all dispatched Task background jobs complete (or a timeout fires).
  • A headless claude status --json (as proposed in #52856) that external tools can poll for state == idle and drained task queue.
  • Stream-json events for background-task lifecycle (dispatch / complete / error) so downstream parsers can detect that result isn't the true end-of-work.
  • A PostTask hook in print mode (as proposed in #28221 / #48657), firable before the process exits.

Priority from our side

P3 nice-to-have, but it's the single blocker for a category of skills (parallel research fan-out) under any automated evaluation tooling that shells out to claude -p. Happy to test any of the above against our skill fixtures if a PR goes up.

extent analysis

TL;DR

Adding a --wait-for-background-tasks flag to claude -p or introducing stream-json events for background-task lifecycle could resolve the issue of truncated transcripts in non-interactive mode.

Guidance

  • Investigate the feasibility of adding a --wait-for-background-tasks flag to claude -p to keep the process alive until all background tasks complete.
  • Consider introducing stream-json events for background-task lifecycle (dispatch, complete, error) to allow downstream parsers to detect when result isn't the true end-of-work.
  • Evaluate the proposal of a headless claude status --json for external tools to poll for state == idle and a drained task queue.
  • Review the suggested PostTask hook in print mode, as proposed in related issues, to determine its applicability to this specific problem.

Example

No specific code snippet is provided due to the nature of the issue, which focuses on the behavior of the claude command and its interaction with background tasks.

Notes

The issue highlights a specific failure mode in non-interactive mode (claude -p) that is distinct from interactive/TUI manifestations. Any solution should consider the implications for automated evaluation tooling and the ability to handle parallel research fan-out skills.

Recommendation

Apply a workaround by introducing stream-json events for background-task lifecycle, as this approach seems to offer a more comprehensive solution by providing a way for downstream tools to detect the completion of background tasks without requiring significant changes to the claude command's flags or behavior.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - ✅(Solved) Fix claude -p exits before Task(run_in_background=true) subagents complete, truncating output [1 pull requests, 1 participants]