claude-code - ✅(Solved) Fix claude -p exits before Task(run_in_background=true) subagents complete, truncating output [1 pull requests, 1 participants]

wjduenow · 2026-04-24T15:55:14Z

[claude-code] When a skill running under claude -p print mode, non-interactive launches sub-agents via Task run in background=true , the parent agent emits its… When a skill running under `claude -p` (print mode, non-interactive) launches sub-agents via `Task(run_in_background=true)`, the parent agent emits its immediate text output and exits with a valid stream-json `result` **before the background sub-agents finish**. Downstream tools that capture the print-mode output see a truncated transcript — typically 5-10× shorter than what the same skill produces interactively. This is distinct from the interactive/TUI manifestations tracked in #50572, #48657, #28221, #52856, and #44075, all of which are open. None of them specifically name the non-interactive `claude -p` shape, where there is no turn loop to poll or hooks to fire *after* the print-mode process has already exited with exit 0. # PR #116: feat(runner): add --sync-tasks flag to force synchronous Task spawns (Tier 1.5 of #103) - Repository: wjduenow/clauditor - Author: wjduenow - State: closed | merged: True - Link: https://github.com/wjduenow/clauditor/pull/116 ## Description (problem / solution / changelog) ## Summary Adds `--sync-tasks` to `validate` / `grade` / `capture` / `run` and a sibling `EvalSpec.sync_tasks` spec field. Both set `CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1` in the `claude -p` subprocess env (a [documented Anthropic control](https://docs.claude.com/en/docs/claude-code/sub-agents)), forcing `Task(run_in_background=true)` spawns to run synchronously. This resolves the output-truncation failure mode for skills that use background sub-agents for parallel fanout — without modifying the skill. Tier 1.5 of #103. Research + go/no-go decision recorded in `docs/adr/transport-research-103.md`; upstream feature request filed at [anthropics/claude-code#52917](https://github.com/anthropics/claude-code/issues/52917). Closes beads issue `clauditor-dqz`. ## What changed - **Helper**: `env_with_sync_tasks()` in `runner.py` — non-mutating per `.claude/rules/non-mutating-scrub.md`, composes cleanly with `env_without_api_key`. - **Spec field**: `EvalSpec.sync_tasks: bool = False` with load-time bool-guard validator per `.claude/rules/constant-with-type-info.md`. - **Precedence**: `SkillSpec.run(sync_tasks_override=...)` resolves CLI > spec > default in one place per `.claude/rules/spec-cli-precedence.md`. When effective, mutates the outgoing env dict to add the var; composes with any prior `env_override` (e.g. from `--no-api-key`). - **CLI**: `--sync-tasks` on all four skill-invoking commands. `grade --baseline` composes the env var into the baseline arm's `run_raw` call so both sides of the delta share sync mode. - **#97 suppression**: the `background-task:` warning is suppressed when the env var is set — spawning Task under `CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1` forces it sync, so warning the user is spurious. - **Docs**: new Skill Compatibility section in `skill-usage.md` (matrix + fidelity caveats + `--sync-tasks` opt-in guidance + refactoring recipes); flag entry in `cli-reference.md`; spec-field entry in `eval-spec-reference.md`. ## Fidelity caveats (called out in docs) `--sync-tasks` resolves the *capture* side of #103 for the parallel-fanout sub-case. It does NOT close the async-fidelity gap — under the flag, you evaluate a different execution model than production ships. Async-specific logic (race conditions, late-arriving results, progress-while-async branches) stays untested, and timing/cost metrics skew. Full rationale in `docs/adr/transport-research-103.md`; the true async-fidelity fix still depends on upstream. ## Test plan - [x] `uv run ruff check src/ tests/` — clean - [x] `uv run pytest --cov=clauditor` — 2550 passed, 98.41% coverage (gate 80%) - [x] New tests cover: env helper non-mutation + composition, spec field parsing + round-trip + bool-guard, SkillSpec.run precedence (CLI-only / spec-only / CLI-over-spec / env-compose / explicit-False), CLI flags on all four commands (including `grade --baseline` threading to `run_raw`), #97 warning suppression under the env var, and the negative case (env var set to `"0"` does NOT suppress). - [ ] Manual smoke against a skill with `Task(run_in_background=true)` to confirm end-to-end behavior matches the test-suite expectations. ## Related - Issue: #103 (Tier 1.5 scope) - ADR: `docs/adr/transport-research-103.md` - Upstream: [anthropics/claude-code#52917](https://github.com/anthropics/claude-code/issues/52917) - Depends on existing `env_override` plumbing from #64 / #95. ## Changed files - `docs/cli-reference.md` (modified, +2/-1) - `docs/eval-spec-reference.md` (modified, +12/-0) - `docs/skill-usage.md` (modified, +82/-0) - `src/clauditor/cli/capture.py` (modified, +20/-2) - `src/clauditor/cli/grade.py` (modified, +32/-1) - `src/clauditor/cli/run.py` (modified, +20/-2) - `src/clauditor/cli/validate.py` (modified, +16/-0) - `src/clauditor/runner.py` (modified, +35/-0) - `src/clauditor/schema

claude-code2026-04-24 15:55:14

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#52917•Fetched 2026-04-25 06:17:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

wjduenow

Participants

wjduenow

Timeline (top)

cross-referenced ×3labeled ×3referenced ×2

When a skill running under claude -p (print mode, non-interactive) launches sub-agents via Task(run_in_background=true), the parent agent emits its immediate text output and exits with a valid stream-json result before the background sub-agents finish. Downstream tools that capture the print-mode output see a truncated transcript — typically 5-10× shorter than what the same skill produces interactively.

This is distinct from the interactive/TUI manifestations tracked in #50572, #48657, #28221, #52856, and #44075, all of which are open. None of them specifically name the non-interactive claude -p shape, where there is no turn loop to poll or hooks to fire after the print-mode process has already exited with exit 0.

Error Message

This is the failure mode we hit in clauditor (an evaluator for Claude Code skills). We added a detect-and-warn path (clauditor#97) that loudly surfaces the truncation to users, but any skill that legitimately uses run_in_background=true for parallel research fan-out currently cannot be evaluated end-to-end. Our Tier 2 research spike (clauditor ADR) concluded that a clauditor-side fix is infeasible without upstream support, because:

Stream-json events for background-task lifecycle (dispatch / complete / error) so downstream parsers can detect that result isn't the true end-of-work.

Root Cause

Fix Action

Fix / Workaround

Observed: result message arrives shortly after the parent dispatches the background tasks. The sub-agents' findings never make it into the stream; the final synthesis message in the skill's SKILL.md never fires.

A --wait-for-background-tasks flag on claude -p that keeps the process alive until all dispatched Task background jobs complete (or a timeout fires).
A headless claude status --json (as proposed in #52856) that external tools can poll for state == idle and drained task queue.
Stream-json events for background-task lifecycle (dispatch / complete / error) so downstream parsers can detect that result isn't the true end-of-work.
A PostTask hook in print mode (as proposed in #28221 / #48657), firable before the process exits.

PR fix notes

PR #116: feat(runner): add --sync-tasks flag to force synchronous Task spawns (Tier 1.5 of #103)

Repository: wjduenow/clauditor
Author: wjduenow
State: closed | merged: True
Link: https://github.com/wjduenow/clauditor/pull/116

Description (problem / solution / changelog)

Summary

Adds --sync-tasks to validate / grade / capture / run and a sibling EvalSpec.sync_tasks spec field. Both set CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1 in the claude -p subprocess env (a documented Anthropic control), forcing Task(run_in_background=true) spawns to run synchronously. This resolves the output-truncation failure mode for skills that use background sub-agents for parallel fanout — without modifying the skill.

Tier 1.5 of #103. Research + go/no-go decision recorded in docs/adr/transport-research-103.md; upstream feature request filed at anthropics/claude-code#52917. Closes beads issue clauditor-dqz.

What changed

Helper: env_with_sync_tasks() in runner.py — non-mutating per .claude/rules/non-mutating-scrub.md, composes cleanly with env_without_api_key.
Spec field: EvalSpec.sync_tasks: bool = False with load-time bool-guard validator per .claude/rules/constant-with-type-info.md.
Precedence: SkillSpec.run(sync_tasks_override=...) resolves CLI > spec > default in one place per .claude/rules/spec-cli-precedence.md. When effective, mutates the outgoing env dict to add the var; composes with any prior env_override (e.g. from --no-api-key).
CLI: --sync-tasks on all four skill-invoking commands. grade --baseline composes the env var into the baseline arm's run_raw call so both sides of the delta share sync mode.
#97 suppression: the background-task: warning is suppressed when the env var is set — spawning Task under CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1 forces it sync, so warning the user is spurious.
Docs: new Skill Compatibility section in skill-usage.md (matrix + fidelity caveats + --sync-tasks opt-in guidance + refactoring recipes); flag entry in cli-reference.md; spec-field entry in eval-spec-reference.md.

Fidelity caveats (called out in docs)

--sync-tasks resolves the capture side of #103 for the parallel-fanout sub-case. It does NOT close the async-fidelity gap — under the flag, you evaluate a different execution model than production ships. Async-specific logic (race conditions, late-arriving results, progress-while-async branches) stays untested, and timing/cost metrics skew. Full rationale in docs/adr/transport-research-103.md; the true async-fidelity fix still depends on upstream.

Test plan

uv run ruff check src/ tests/ — clean
uv run pytest --cov=clauditor — 2550 passed, 98.41% coverage (gate 80%)
New tests cover: env helper non-mutation + composition, spec field parsing + round-trip + bool-guard, SkillSpec.run precedence (CLI-only / spec-only / CLI-over-spec / env-compose / explicit-False), CLI flags on all four commands (including grade --baseline threading to run_raw), #97 warning suppression under the env var, and the negative case (env var set to "0" does NOT suppress).
Manual smoke against a skill with Task(run_in_background=true) to confirm end-to-end behavior matches the test-suite expectations.

Issue: #103 (Tier 1.5 scope)
ADR: docs/adr/transport-research-103.md
Upstream: anthropics/claude-code#52917
Depends on existing env_override plumbing from #64 / #95.

Changed files

docs/cli-reference.md (modified, +2/-1)
docs/eval-spec-reference.md (modified, +12/-0)
docs/skill-usage.md (modified, +82/-0)
src/clauditor/cli/capture.py (modified, +20/-2)
src/clauditor/cli/grade.py (modified, +32/-1)
src/clauditor/cli/run.py (modified, +20/-2)
src/clauditor/cli/validate.py (modified, +16/-0)
src/clauditor/runner.py (modified, +35/-0)
src/clauditor/schemas.py (modified, +35/-0)
src/clauditor/spec.py (modified, +20/-2)
tests/test_cli.py (modified, +167/-0)
tests/test_runner.py (modified, +117/-0)
tests/test_schemas.py (modified, +47/-0)
tests/test_spec.py (modified, +114/-0)

Code Example

# A skill that fans out 2-3 Task(run_in_background=true) sub-agents,
# then synthesizes their findings in a final message.
claude -p --output-format stream-json --verbose "/my-parallel-research-skill"

RAW_BUFFERClick to expand / collapse

Summary

Repro

# A skill that fans out 2-3 Task(run_in_background=true) sub-agents,
# then synthesizes their findings in a final message.
claude -p --output-format stream-json --verbose "/my-parallel-research-skill"

Expected: either (a) the print-mode process waits for background tasks before emitting result, or (b) the stream-json schema exposes an event that downstream tools can poll on to know background work is still pending.

Context: evaluation tooling

claude -p --help exposes no --wait / --poll / --wait-for-background flag.
Stream-json emits no background_task_pending or equivalent event.
Attempting to inject a synthetic follow-up turn via --resume is unreliable: #50572 documents that background shells are reaped on turn end, and #40692 notes completion notifications arrive after the stream has closed.

Suggested fixes (any one would unblock)

A --wait-for-background-tasks flag on claude -p that keeps the process alive until all dispatched Task background jobs complete (or a timeout fires).
A headless claude status --json (as proposed in #52856) that external tools can poll for state == idle and drained task queue.
Stream-json events for background-task lifecycle (dispatch / complete / error) so downstream parsers can detect that result isn't the true end-of-work.
A PostTask hook in print mode (as proposed in #28221 / #48657), firable before the process exits.

Priority from our side

P3 nice-to-have, but it's the single blocker for a category of skills (parallel research fan-out) under any automated evaluation tooling that shells out to claude -p. Happy to test any of the above against our skill fixtures if a PR goes up.

extent analysis

TL;DR

Adding a --wait-for-background-tasks flag to claude -p or introducing stream-json events for background-task lifecycle could resolve the issue of truncated transcripts in non-interactive mode.

Guidance

Investigate the feasibility of adding a --wait-for-background-tasks flag to claude -p to keep the process alive until all background tasks complete.
Consider introducing stream-json events for background-task lifecycle (dispatch, complete, error) to allow downstream parsers to detect when result isn't the true end-of-work.
Evaluate the proposal of a headless claude status --json for external tools to poll for state == idle and a drained task queue.
Review the suggested PostTask hook in print mode, as proposed in related issues, to determine its applicability to this specific problem.

Example

No specific code snippet is provided due to the nature of the issue, which focuses on the behavior of the claude command and its interaction with background tasks.

Notes

The issue highlights a specific failure mode in non-interactive mode (claude -p) that is distinct from interactive/TUI manifestations. Any solution should consider the implications for automated evaluation tooling and the ability to handle parallel research fan-out skills.

Recommendation

Apply a workaround by introducing stream-json events for background-task lifecycle, as this approach seems to offer a more comprehensive solution by providing a way for downstream tools to detect the completion of background tasks without requiring significant changes to the claude command's flags or behavior.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

claude-code - ✅(Solved) Fix claude -p exits before Task(run_in_background=true) subagents complete, truncating output [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #116: feat(runner): add --sync-tasks flag to force synchronous Task spawns (Tier 1.5 of #103)

Description (problem / solution / changelog)

Summary

What changed

Fidelity caveats (called out in docs)

Test plan

Related

Changed files

Code Example

Summary

Repro

Context: evaluation tooling

Suggested fixes (any one would unblock)

Priority from our side

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING