claude-code - 💡(How to fix) Fix [FEATURE] Restore unbounded additionalContext for SubagentStart hooks (silent 50KB cap on hook output causes 33× cost penalty for large-payload subagent dispatch) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#52628Fetched 2026-04-24 06:02:02
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×3renamed ×1

Root Cause

  1. Switch to a cheaper model — Sonnet 4.6 is 8× cheaper than Opus 4.7, but quality degrades on complex semantic judgments. We measured on the same 20-edge canary batch:
    • Sonnet vs Opus exact-type match: 4/20 (20%)
    • Category-level match: 17/20 (85%)
    • Sonnet required 4 retry passes (60→6→1→0 errors) vs Opus 1 retry — partly because Sonnet hallucinated 4 type names not in the canonical set on first attempt.

Fix Action

Fix / Workaround

TL;DR: CC 2.1.91 capped hook output at 50KB (spillover to disk + 2KB preview). This silently broke high-throughput subagent dispatch patterns that depend on injecting large pre-assembled context (50-700KB of evidence dossiers, structured prompts, knowledge graph slices) into subagents. The workaround — chunked Read calls — multiplies token costs by 3-8× and breaks anti-hallucination guarantees that depended on a single pre-injected payload.

We run a knowledge graph extraction pipeline on a corpus of ~333 chunks. Each subagent dispatch (matcher, updater, edge classifier) needs:

Total per-dispatch context: 150-700KB. This is structured evidence that the subagent must reason over to produce a single structured output (typically <10KB). It is NOT exploration material — the subagent should NOT need to discover or fetch anything.

Code Example

## The breaking change

The relevant documented change in the public changelog is **CC 2.1.51**:

> "Tool results larger than 50K characters are now persisted to disk (previously 100K).
>  This reduces context window usage and improves conversation longevity."

This change explicitly targets **tool results**. However, on current CC 2.1.119 the same persistence behavior empirically applies to **hook `additionalContext` output** as well — the spillover file naming pattern (`tool-results/hook-{uuid}-{N}-additionalContext.txt`) and the "Output too large (XXX KB). Full output saved to: ... Preview (first 2KB):" message format come directly from `src/utils/toolResultStorage.ts` (`DEFAULT_MAX_RESULT_SIZE_CHARS = 50_000`, `PREVIEW_SIZE_BYTES = 2000`).

I could not find an explicit changelog entry documenting the extension of this 50K cap to hook attachment delivery. The cap appears to have been added silently in some version between 2.1.51 and 2.1.119 — likely a code consolidation that routed hook attachment persistence through the same `persistToolResult` path used for tool results.

Source paths in CC 2.1.88 source map (which we have access to):
- `src/utils/sessionStart.ts:148``additionalContexts.push(...hookResult.additionalContexts)` (collection)
- `src/tools/AgentTool/runAgent.ts:546-555` — wraps additionalContexts as `hook_additional_context` attachment for the subagent
- `src/utils/messages.ts:4117-4128``wrapInSystemReminder(content)` → user message **with NO truncation visible in this 2.1.88 path**

In 2.1.88, hook `additionalContext` did not visibly route through `persistToolResult`. By 2.1.119, it does. The wiring change happened in code we don't have access to.

**Empirical confirmation on CC 2.1.119**: a SubagentStart hook emitting 723KB `additionalContext` causes the receiving subagent to see only:

The subagent must then call Read on the spillover file to access the actual content — defeating the entire purpose of pre-injection.

---

## Use case (concrete production system)

We run a knowledge graph extraction pipeline on a corpus of ~333 chunks. Each subagent dispatch (matcher, updater, edge classifier) needs:

1. **Full prompt template** (5-15KB)
2. **Entry roster** (cross-reference index, ~50KB)
3. **Source chunk text** (50-200KB)
4. **High-level + low-level dossiers for referenced nodes** (200-500KB)

Total per-dispatch context: **150-700KB**. This is structured evidence that the subagent must reason over to produce a single structured output (typically <10KB). It is NOT exploration material — the subagent should NOT need to discover or fetch anything.

### Pre-2.1.91 workflow (worked)

---

### Post-2.1.91 workflow (forced fallback)

---

**Empirical comparison from our system** (same content, same model, same task):

| Mode | Tool calls | cache_read | Total tokens | Cost (Opus 4.7) |
|---|---|---|---|---|
| Pre-2.1.91 inline injection (200KB content) | 3 | ~0 | 92K | ~$1.40 |
| Post-2.1.91 chunked-Read fallback (200KB content) | 27 | 1.8M | 4.2M | ~$8 |
| Post-2.1.91 chunked-Read fallback (700KB content) | 47 | 18M | 19M | ~$42 |

**At fleet scale** (1571 batches × 700KB each):
- Hypothetical inline injection: ~$2K total
- Current forced chunked-Read: ~$67K total
- **33× cost penalty for the same workload**

---

## Why pre-injection is the right primitive for this use case

1. **Determinism**: The subagent has exactly the context the orchestrator intended. No partial reads, no offset drift, no skipped sections.

2. **Anti-hallucination**: We've measured fabrication rates spike when subagents must orchestrate their own chunked Reads. They paraphrase, skip middle chunks, and invent quotes. With pre-injection, the content is verbatim in their context — no chunking decisions to get wrong.

3. **Cost efficiency**: A single cache_creation event (cached for an hour) instead of N×M cache_reads of accumulating prefix.

4. **Single-shot semantics**: Read-then-Write becomes Write-only. Fewer turns = less orchestrator overhead = less context burned.

5. **Trust boundary**: The orchestrator KNOWS what the subagent should see. Forcing the subagent to discover content via Read undermines this — the subagent might decide to read other things, exceed scope, or fail to read what it should.

---

## Why chunked-Read is the wrong substitute

The official "fix" for >50KB injection is to have the subagent Read the spillover file (or split parts). This:

- **Inflates turn count 5-15×** — every Read is +1 round-trip.
- **Inflates cache_read 8-30×** — every turn re-fetches the cached prefix.
- **Breaks per-tool budget** (`MAX_TOOL_RESULTS_PER_MESSAGE_CHARS = 200_000` in `constants/toolLimits.ts`) for parallel tool use scenarios.
- **Forces compromise on chunk size** — 60KB max per part to fit `MAX_TOOL_RESULT_TOKENS = 100_000` Read result cap. For 700KB content, that's 12 parts → 12 Reads12 cache_read events of accumulating prefix.

For exploration agents this is fine — they're discovering. For pre-prepared evidence dispatch (matcher, updater, classifier patterns), this is purely waste.

---

## Workarounds we tried — none viable (verified against 2.1.88 source)

1. **`updatedInput.prompt` from PreToolUse:Agent** — verified silently ignored by CC for the Agent tool. Subagent transcript inspection confirmed: receives original unmodified prompt regardless of what `updatedInput.prompt` contained. No consumption path in `src/tools/AgentTool/runAgent.ts` for prompt mutation on subagent dispatch.

2. **`initialUserMessage` field**in hook output schema (`src/types/hooks.ts:86,270`) but only consumed by `src/main.tsx:3474` for **main session**. `runAgent.ts` ignores it entirely for subagents — no `pendingInitialUserMessage` analog and no equivalent of `sessionStart.ts:150-152`'s side-channel handoff. Adding subagent-side honoring would unlock the use case immediately.

3. **GrowthBook flag `tengu_satin_quoll`** — does NOT apply to hook output. Verified: the flag is read only from `src/utils/toolResultStorage.ts:55-79` (`getPersistenceThreshold(toolName, ...)`) and `src/utils/mcpValidation.ts:37` (`mcp_tool` key for MCP-tool token cap). Both call sites are tool-result code paths. Empty grep result for `getPersistenceThreshold` in `sessionStart.ts`, `runAgent.ts`, `hooks.ts`, `messages.ts` — hook additionalContext bypasses this flag entirely. Additionally, the flag is keyed by `toolName` which has no natural mapping to hook events (`SubagentStart`, etc.).

4. **Env vars** — searched 2.1.88 source for any env var that overrides hook output cap. None found. `CLAUDE_CODE_SAVE_HOOK_ADDITIONAL_CONTEXT` (`src/utils/sessionStorage.ts:4360`) controls transcript persistence for `/resume`, not subagent delivery. `CLAUDE_CODE_MAX_CONTEXT_TOKENS` (`src/utils/context.ts:61-63`) is for the conversation context window, not per-message hook output.

5. **Smaller batches** — splits work into 4-6× more dispatches. Each dispatch still pays the orchestrator overhead and the duplicate prompt/rubric setup cost. Total fleet cost roughly the same; latency multiplies.

6. **Switch to a cheaper model**Sonnet 4.6 is 8× cheaper than Opus 4.7, but quality degrades on complex semantic judgments. We measured on the same 20-edge canary batch:
   - Sonnet vs Opus exact-type match: 4/20 (20%)
   - Category-level match: 17/20 (85%)
   - Sonnet required 4 retry passes (60610 errors) vs Opus 1 retry — partly because Sonnet hallucinated 4 type names not in the canonical set on first attempt.

---

## Proposed solutions (in order of preference)

### Option 1 (preferred — lowest API surface change): Honor `initialUserMessage` for SubagentStart

The field already exists in the hook output schema (`src/types/hooks.ts:86`). Main session already honors it via `sessionStart.ts:150-152``pendingInitialUserMessage``main.tsx:3474`. Symmetric addition in `runAgent.ts:546-555`:

---

Real user messages are not subject to the 50K hook output cap (verified: the 2.1.91 change-note scoped to "hook output" specifically). This unlocks the use case without any new fields or flags.

Risk: buggy hooks can now inject arbitrary text directly into the subagent's first user message. Mitigation: require `isMeta: true` on the injected message so it's clearly hook-originated and discoverable in traces.

### Option 2: Per-hook size opt-in via `hookSpecificOutput.additionalContextOverride.maxChars`

---

The override is a contract: integrator asserts they understand the cost. Cap the override at a hard ceiling (e.g. 1MB) to prevent accidental 10MB injections. Requires minor schema addition and a single conditional in the persistence path.

### Option 3: GrowthBook flag extending `tengu_satin_quoll`

Add hook-event keys to the existing flag map:

---

Requires Anthropic-side flag flipping per integrator — high friction but zero public API change.

### Option 4: Separate `largeContext` field (most explicit)

---

Treats it as a real user message at the subagent boundary (not subject to 50K cap). The separate field signals intent — use `additionalContext` for small reminders (current behavior), `largeContext` for pre-assembled evidence payloads.

---

## Why this matters for the ecosystem

This isn't a hypothetical. Anyone running a non-trivial pipeline on top of CC subagents — knowledge graph extractors, code refactor pipelines, document analysis fleets, agentic workflows that delegate to specialized subagents — runs into this 50K wall the moment their evidence-per-dispatch exceeds trivial size. The cost penalty is real and measurable (we have ~$60K of unnecessary cost lined up if we don't solve it).

The 2.1.91 change was probably defensive — preventing accidental context bloat from buggy hooks. But it took away the only zero-overhead delivery path for legitimate large-payload subagent patterns.

An opt-in escape hatch (any of the 4 options above) would restore this primitive without re-introducing the bloat risk it was guarding against.

---

## Reference: empirical evidence files (available on request)

- Production agent metrics showing 33× cost penalty (1571-batch fleet projection)
- Pre-2.1.91 vs post-2.1.91 same-content same-model A/B traces
- 4-way canary across Haiku 4.5 / Sonnet 4.6 / Opus 4.7 with full token breakdown
- Source code paths in 2.1.88 (pre-cap) showing the unbounded delivery path that no longer works

Happy to provide reproductions or detailed measurements.


### Proposed Solution

## Ideal user experience

When dispatching a subagent that needs a large pre-assembled context payload, my orchestrator should be able to inject 200-700KB of structured evidence (chunk text + dossiers + prompts) into the subagent's first user message in ONE round-trip — no Read calls required.

## Recommended implementation: honor `initialUserMessage` for SubagentStart hooks

The field already exists in the hook output schema (`src/types/hooks.ts:86`). Main session already honors it via `sessionStart.ts:150-152``pendingInitialUserMessage``main.tsx:3474`. The symmetric addition needed in `runAgent.ts:546-555`:
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing requests and this feature hasn't been requested yet
  • This is a single feature request (not multiple features)

Problem Statement

Feature Request: Restore unbounded additionalContext injection for SubagentStart hooks (or provide opt-in escape hatch)

TL;DR: CC 2.1.91 capped hook output at 50KB (spillover to disk + 2KB preview). This silently broke high-throughput subagent dispatch patterns that depend on injecting large pre-assembled context (50-700KB of evidence dossiers, structured prompts, knowledge graph slices) into subagents. The workaround — chunked Read calls — multiplies token costs by 3-8× and breaks anti-hallucination guarantees that depended on a single pre-injected payload.

Requesting: an explicit opt-in escape hatch for additionalContext over 50KB, similar to the MCP override added in 2.1.85 (_meta[anthropic/maxResultSizeChars] up to 500K).


The breaking change

## The breaking change

The relevant documented change in the public changelog is **CC 2.1.51**:

> "Tool results larger than 50K characters are now persisted to disk (previously 100K).
>  This reduces context window usage and improves conversation longevity."

This change explicitly targets **tool results**. However, on current CC 2.1.119 the same persistence behavior empirically applies to **hook `additionalContext` output** as well — the spillover file naming pattern (`tool-results/hook-{uuid}-{N}-additionalContext.txt`) and the "Output too large (XXX KB). Full output saved to: ... Preview (first 2KB):" message format come directly from `src/utils/toolResultStorage.ts` (`DEFAULT_MAX_RESULT_SIZE_CHARS = 50_000`, `PREVIEW_SIZE_BYTES = 2000`).

I could not find an explicit changelog entry documenting the extension of this 50K cap to hook attachment delivery. The cap appears to have been added silently in some version between 2.1.51 and 2.1.119 — likely a code consolidation that routed hook attachment persistence through the same `persistToolResult` path used for tool results.

Source paths in CC 2.1.88 source map (which we have access to):
- `src/utils/sessionStart.ts:148` — `additionalContexts.push(...hookResult.additionalContexts)` (collection)
- `src/tools/AgentTool/runAgent.ts:546-555` — wraps additionalContexts as `hook_additional_context` attachment for the subagent
- `src/utils/messages.ts:4117-4128` — `wrapInSystemReminder(content)` → user message **with NO truncation visible in this 2.1.88 path**

In 2.1.88, hook `additionalContext` did not visibly route through `persistToolResult`. By 2.1.119, it does. The wiring change happened in code we don't have access to.

**Empirical confirmation on CC 2.1.119**: a SubagentStart hook emitting 723KB `additionalContext` causes the receiving subagent to see only:

The subagent must then call Read on the spillover file to access the actual content — defeating the entire purpose of pre-injection.

---

## Use case (concrete production system)

We run a knowledge graph extraction pipeline on a corpus of ~333 chunks. Each subagent dispatch (matcher, updater, edge classifier) needs:

1. **Full prompt template** (5-15KB)
2. **Entry roster** (cross-reference index, ~50KB)
3. **Source chunk text** (50-200KB)
4. **High-level + low-level dossiers for referenced nodes** (200-500KB)

Total per-dispatch context: **150-700KB**. This is structured evidence that the subagent must reason over to produce a single structured output (typically <10KB). It is NOT exploration material — the subagent should NOT need to discover or fetch anything.

### Pre-2.1.91 workflow (worked)
  1. Orchestrator (main session): runs assemble_context.py → builds {chunk_id}_updater_context.md (200-700KB combined)
  2. SubagentStart hook (exploration_context_injector.js): → reads pre-assembled file → emits as additionalContext → subagent receives full content as system reminder
  3. Subagent: zero Read calls, single Write call, replies DONE. Token cost: ~$0.50 per dispatch (Opus 4.7), ~3 turns total.

### Post-2.1.91 workflow (forced fallback)
  1. Same assemble_context.py → 200-700KB file
  2. SubagentStart hook still emits 200KB additionalContext → CC silently spills to disk → 2KB preview reaches subagent
  3. Subagent sees Output too large (200KB)... Preview (first 2KB)...
  4. Subagent must Read the spillover file via chunked Reads (Read tool result also caps at ~25K tokens / ~100KB per call)
  5. For 700KB content: 7-12 Read calls × full prefix re-fetched per turn = ~14-24 turns, ~18M cache_read tokens Token cost: ~$42 per dispatch (Opus 4.7), 47-100 tool calls

**Empirical comparison from our system** (same content, same model, same task):

| Mode | Tool calls | cache_read | Total tokens | Cost (Opus 4.7) |
|---|---|---|---|---|
| Pre-2.1.91 inline injection (200KB content) | 3 | ~0 | 92K | ~$1.40 |
| Post-2.1.91 chunked-Read fallback (200KB content) | 27 | 1.8M | 4.2M | ~$8 |
| Post-2.1.91 chunked-Read fallback (700KB content) | 47 | 18M | 19M | ~$42 |

**At fleet scale** (1571 batches × 700KB each):
- Hypothetical inline injection: ~$2K total
- Current forced chunked-Read: ~$67K total
- **33× cost penalty for the same workload**

---

## Why pre-injection is the right primitive for this use case

1. **Determinism**: The subagent has exactly the context the orchestrator intended. No partial reads, no offset drift, no skipped sections.

2. **Anti-hallucination**: We've measured fabrication rates spike when subagents must orchestrate their own chunked Reads. They paraphrase, skip middle chunks, and invent quotes. With pre-injection, the content is verbatim in their context — no chunking decisions to get wrong.

3. **Cost efficiency**: A single cache_creation event (cached for an hour) instead of N×M cache_reads of accumulating prefix.

4. **Single-shot semantics**: Read-then-Write becomes Write-only. Fewer turns = less orchestrator overhead = less context burned.

5. **Trust boundary**: The orchestrator KNOWS what the subagent should see. Forcing the subagent to discover content via Read undermines this — the subagent might decide to read other things, exceed scope, or fail to read what it should.

---

## Why chunked-Read is the wrong substitute

The official "fix" for >50KB injection is to have the subagent Read the spillover file (or split parts). This:

- **Inflates turn count 5-15×** — every Read is +1 round-trip.
- **Inflates cache_read 8-30×** — every turn re-fetches the cached prefix.
- **Breaks per-tool budget** (`MAX_TOOL_RESULTS_PER_MESSAGE_CHARS = 200_000` in `constants/toolLimits.ts`) for parallel tool use scenarios.
- **Forces compromise on chunk size** — 60KB max per part to fit `MAX_TOOL_RESULT_TOKENS = 100_000` Read result cap. For 700KB content, that's 12 parts → 12 Reads → 12 cache_read events of accumulating prefix.

For exploration agents this is fine — they're discovering. For pre-prepared evidence dispatch (matcher, updater, classifier patterns), this is purely waste.

---

## Workarounds we tried — none viable (verified against 2.1.88 source)

1. **`updatedInput.prompt` from PreToolUse:Agent** — verified silently ignored by CC for the Agent tool. Subagent transcript inspection confirmed: receives original unmodified prompt regardless of what `updatedInput.prompt` contained. No consumption path in `src/tools/AgentTool/runAgent.ts` for prompt mutation on subagent dispatch.

2. **`initialUserMessage` field** — in hook output schema (`src/types/hooks.ts:86,270`) but only consumed by `src/main.tsx:3474` for **main session**. `runAgent.ts` ignores it entirely for subagents — no `pendingInitialUserMessage` analog and no equivalent of `sessionStart.ts:150-152`'s side-channel handoff. Adding subagent-side honoring would unlock the use case immediately.

3. **GrowthBook flag `tengu_satin_quoll`** — does NOT apply to hook output. Verified: the flag is read only from `src/utils/toolResultStorage.ts:55-79` (`getPersistenceThreshold(toolName, ...)`) and `src/utils/mcpValidation.ts:37` (`mcp_tool` key for MCP-tool token cap). Both call sites are tool-result code paths. Empty grep result for `getPersistenceThreshold` in `sessionStart.ts`, `runAgent.ts`, `hooks.ts`, `messages.ts` — hook additionalContext bypasses this flag entirely. Additionally, the flag is keyed by `toolName` which has no natural mapping to hook events (`SubagentStart`, etc.).

4. **Env vars** — searched 2.1.88 source for any env var that overrides hook output cap. None found. `CLAUDE_CODE_SAVE_HOOK_ADDITIONAL_CONTEXT` (`src/utils/sessionStorage.ts:4360`) controls transcript persistence for `/resume`, not subagent delivery. `CLAUDE_CODE_MAX_CONTEXT_TOKENS` (`src/utils/context.ts:61-63`) is for the conversation context window, not per-message hook output.

5. **Smaller batches** — splits work into 4-6× more dispatches. Each dispatch still pays the orchestrator overhead and the duplicate prompt/rubric setup cost. Total fleet cost roughly the same; latency multiplies.

6. **Switch to a cheaper model** — Sonnet 4.6 is 8× cheaper than Opus 4.7, but quality degrades on complex semantic judgments. We measured on the same 20-edge canary batch:
   - Sonnet vs Opus exact-type match: 4/20 (20%)
   - Category-level match: 17/20 (85%)
   - Sonnet required 4 retry passes (60→6→1→0 errors) vs Opus 1 retry — partly because Sonnet hallucinated 4 type names not in the canonical set on first attempt.

---

## Proposed solutions (in order of preference)

### Option 1 (preferred — lowest API surface change): Honor `initialUserMessage` for SubagentStart

The field already exists in the hook output schema (`src/types/hooks.ts:86`). Main session already honors it via `sessionStart.ts:150-152` → `pendingInitialUserMessage` → `main.tsx:3474`. Symmetric addition in `runAgent.ts:546-555`:

```typescript
// In src/tools/AgentTool/runAgent.ts, after line 543
let hookInitialUserMessage: string | undefined
for await (const hookResult of executeSubagentStartHooks(...)) {
  if (hookResult.additionalContexts?.length) {
    additionalContexts.push(...hookResult.additionalContexts)
  }
  if (hookResult.initialUserMessage) {
    hookInitialUserMessage = hookResult.initialUserMessage  // NEW
  }
}

// After the additionalContexts attachment block, before agent query:
if (hookInitialUserMessage) {
  initialMessages.push(createUserMessage({ content: hookInitialUserMessage }))
}

Real user messages are not subject to the 50K hook output cap (verified: the 2.1.91 change-note scoped to "hook output" specifically). This unlocks the use case without any new fields or flags.

Risk: buggy hooks can now inject arbitrary text directly into the subagent's first user message. Mitigation: require isMeta: true on the injected message so it's clearly hook-originated and discoverable in traces.

Option 2: Per-hook size opt-in via hookSpecificOutput.additionalContextOverride.maxChars

process.stdout.write(JSON.stringify({
  hookSpecificOutput: {
    hookEventName: "SubagentStart",
    additionalContext: contentUpTo500KB,
    additionalContextOverride: {
      maxChars: 500_000  // explicit opt-in, cap at e.g. 256K tokens
    }
  }
}));

The override is a contract: integrator asserts they understand the cost. Cap the override at a hard ceiling (e.g. 1MB) to prevent accidental 10MB injections. Requires minor schema addition and a single conditional in the persistence path.

Option 3: GrowthBook flag extending tengu_satin_quoll

Add hook-event keys to the existing flag map:

{
  "tengu_satin_quoll": {
    "SubagentStart_hook_additional_context": 500000,
    "PostToolUse_hook_additional_context": 100000
  }
}

Requires Anthropic-side flag flipping per integrator — high friction but zero public API change.

Option 4: Separate largeContext field (most explicit)

hookSpecificOutput: {
  hookEventName: "SubagentStart",
  additionalContext: "small system reminder text",  // capped at 50K (existing)
  largeContext: hugePreAssembledPayload             // explicit large channel, no cap (or higher cap)
}

Treats it as a real user message at the subagent boundary (not subject to 50K cap). The separate field signals intent — use additionalContext for small reminders (current behavior), largeContext for pre-assembled evidence payloads.


Why this matters for the ecosystem

This isn't a hypothetical. Anyone running a non-trivial pipeline on top of CC subagents — knowledge graph extractors, code refactor pipelines, document analysis fleets, agentic workflows that delegate to specialized subagents — runs into this 50K wall the moment their evidence-per-dispatch exceeds trivial size. The cost penalty is real and measurable (we have ~$60K of unnecessary cost lined up if we don't solve it).

The 2.1.91 change was probably defensive — preventing accidental context bloat from buggy hooks. But it took away the only zero-overhead delivery path for legitimate large-payload subagent patterns.

An opt-in escape hatch (any of the 4 options above) would restore this primitive without re-introducing the bloat risk it was guarding against.


Reference: empirical evidence files (available on request)

  • Production agent metrics showing 33× cost penalty (1571-batch fleet projection)
  • Pre-2.1.91 vs post-2.1.91 same-content same-model A/B traces
  • 4-way canary across Haiku 4.5 / Sonnet 4.6 / Opus 4.7 with full token breakdown
  • Source code paths in 2.1.88 (pre-cap) showing the unbounded delivery path that no longer works

Happy to provide reproductions or detailed measurements.

Proposed Solution

Ideal user experience

When dispatching a subagent that needs a large pre-assembled context payload, my orchestrator should be able to inject 200-700KB of structured evidence (chunk text + dossiers + prompts) into the subagent's first user message in ONE round-trip — no Read calls required.

Recommended implementation: honor initialUserMessage for SubagentStart hooks

The field already exists in the hook output schema (src/types/hooks.ts:86). Main session already honors it via sessionStart.ts:150-152pendingInitialUserMessagemain.tsx:3474. The symmetric addition needed in runAgent.ts:546-555:

// In src/tools/AgentTool/runAgent.ts, after line 543
let hookInitialUserMessage: string | undefined
for await (const hookResult of executeSubagentStartHooks(...)) {
  if (hookResult.additionalContexts?.length) {
    additionalContexts.push(...hookResult.additionalContexts)
  }
  if (hookResult.initialUserMessage) {
    hookInitialUserMessage = hookResult.initialUserMessage  // NEW
  }
}

// After the additionalContexts attachment block, before agent query:
if (hookInitialUserMessage) {
  initialMessages.push(createUserMessage({ content: hookInitialUserMessage }))
}

Real user messages aren't subject to the 50K hook output cap (verified: 2.1.91 change-note scoped specifically to "hook output").

Hook author API (no new public surface)

// In a SubagentStart hook
process.stdout.write(JSON.stringify({
  hookSpecificOutput: {
    hookEventName: "SubagentStart",
    initialUserMessage: largePreAssembledPayload  // up to e.g. 1MB, no cap or higher cap
  }
}));

Subagent receives the payload as its first user message (with isMeta: true for trace discoverability), then the orchestrator's normal dispatch prompt as the second user message.

### Alternative Solutions

If initialUserMessage repurposing isn't desirable, equivalent solutions:

Option B: New schema field additionalContextOverride.maxChars:
hookSpecificOutput: {
  hookEventName: "SubagentStart",
  additionalContext: hugePayload,
  additionalContextOverride: { maxChars: 500_000 }  // explicit opt-in contract
}

Option C: New separate largeContext field with no cap (or higher cap), routed to user message instead of system reminder:
hookSpecificOutput: {
  hookEventName: "SubagentStart",
  additionalContext: "small system reminder text",  // 50K cap (existing)
  largeContext: hugePreAssembledPayload             // explicit large channel
}

Option D: Extend existing tengu_satin_quoll GrowthBook flag with hook-event keys (zero public API change, but requires Anthropic-side flag-flipping per integrator).

### Priority

High - Significant impact on productivity

### Feature Category

Developer tools/SDK

### Use Case Example

## Concrete walkthrough: knowledge graph batch classification

**Setup**: I have 1,571 batches to process. Each batch is 20 edges that need re-classification against a 55-type ontology. To classify each edge, the subagent needs the full ontology rubric (15KB) + complete dossiers for the source/target nodes (200-700KB combined HL+LL).

### How it works today (post-2.1.91 — broken state)

1. My orchestrator runs `assemble_context.py` to build a single 700KB context file per batch on disk
2. SubagentStart hook reads the file and emits as `additionalContext`
3. CC silently spills to disk (>50KB cap) → subagent receives only 2KB preview + spillover file path
4. Subagent must call `Read` 7-12 times to chunk-read the spillover file (each Read result capped at ~100KB)
5. Each Read = +1 round-trip = +1 cumulative cache_read of growing prefix
6. ~94 turns total, 18M cache_read tokens, ~$42/dispatch (Opus 4.7)
7. Across 1,571 batches: **~$67K total fleet cost**

### How it would work with the proposed feature

1. Same `assemble_context.py` builds same 700KB context file
2. SubagentStart hook returns `initialUserMessage: <full content>` (or equivalent escape hatch)
3. Subagent receives full context as first user message — no Read needed
4. Subagent makes 1 Write call with the 20-edge JSONL output, replies DONE
5. ~3 turns total, ~0 cache_read accumulation, ~$1.40/dispatch
6. Across 1,571 batches: **~$2K total fleet cost**

### The savings

| Metric | Today | With feature | Δ |
|---|---|---|---|
| Tool calls per batch | 47 | 1 | 47× fewer |
| Cache reads per batch | 18M tokens | ~0 | 100% reduction |
| Cost per batch (Opus 4.7) | $42.46 | ~$1.40 | 30× cheaper |
| Total fleet cost | $67K | $2K | $65K saved |
| Wall-clock per batch | ~7 min | ~30 sec | 14× faster |

### Why I can't just use a cheaper model or smaller batches

- **Cheaper model (Sonnet 4.6)**: tested same canary, only 20% exact-type agreement with Opus, hallucinated invalid type names. 85% category-level agreement is good but not equivalent quality.
- **Smaller batches (5 edges)**: 4× more dispatches × same orchestrator overhead = roughly same total cost, with 4× longer wall-clock.
- **Trim content**: I need the full dossier to make accurate semantic judgments — trimming would cause exactly the hallucination problems the pre-injection pattern was designed to avoid.

This pattern (pre-assemble large context → dispatch subagent that does single-shot reasoning + Write) is the canonical "delegation with full evidence" workflow for any non-trivial knowledge work. It worked perfectly pre-2.1.91. The 50K cap broke it silently.

### Additional Context

## Related changelog entries

- **2.1.91**: "Changed hook output over 50K characters to be saved to disk with a file path + preview instead of being injected directly into context"**the breaking change**
- **2.1.85**: "Added MCP tool result persistence override via `_meta[anthropic/maxResultSizeChars]` annotation (up to 500K)"**precedent for the escape-hatch pattern I'm requesting**

## Source code references (verified against 2.1.88, pre-cap)

- `src/utils/sessionStart.ts:148``additionalContexts.push(...hookResult.additionalContexts)` (collection)
- `src/tools/AgentTool/runAgent.ts:546-555` — wraps additionalContexts as `hook_additional_context` attachment for subagent
- `src/utils/messages.ts:4117-4128``wrapInSystemReminder(content)` → user message **with NO truncation in this path in 2.1.88**
- `src/constants/toolLimits.ts:13``DEFAULT_MAX_RESULT_SIZE_CHARS = 50_000`
- `src/utils/toolResultStorage.ts:55-79``getPersistenceThreshold(toolName, ...)` (where 50K is enforced for tool results, but extended to hook output in 2.1.91)
- `src/types/hooks.ts:86,270``initialUserMessage` field already in schema, ready to be honored by `runAgent.ts`

## Empirical evidence available on request

- Full agent metrics traces (input/output/cache_creation/cache_read tokens per agent) for 4-way Haiku 4.5 / Sonnet 4.6 / Opus 4.7 canary on identical 20-edge content
- Production fleet cost projection (1,571 batches × $42 = $67K Opus, $5.32 = $8.4K Sonnet)
- Subagent transcript inspection proving `updatedInput.prompt` is silently dropped
- Subagent transcript proving spillover preview message at 200KB+ additionalContext

Happy to provide minimal reproduction repo or detailed measurements on request.

## Why this matters beyond my use case

Anyone running a non-trivial pipeline on top of CC subagents — knowledge graph extractors, code refactor pipelines, document analysis fleets, agentic workflows that delegate to specialized subagents — runs into this 50K wall the moment their evidence-per-dispatch exceeds trivial size. The cost penalty is real, measurable, and accumulates across the fleet.

The 2.1.91 change was probably defensive — preventing accidental context bloat from buggy hooks. But it took away the only zero-overhead delivery path for legitimate large-payload subagent patterns. An opt-in escape hatch (any of the 4 options proposed) restores this primitive without re-introducing the bloat risk.

Note: An earlier draft of this issue cited a "2.1.91 changelog entry about hook output 50K cap" — that entry doesn't actually exist in the public changelog. The cap on hook attachments appears to have been a silent code consolidation. The 2.1.51 entry above is the closest documented analog (tool results, same threshold). Apologies for any confusion.

extent analysis

TL;DR

The most likely fix for the issue with the 50KB cap on additionalContext injection for SubagentStart hooks is to honor the initialUserMessage field for SubagentStart hooks, allowing for large pre-assembled context payloads to be injected into the subagent's first user message.

Guidance

  1. Verify the issue: Confirm that the 50KB cap on additionalContext is causing the problem by checking the subagent's output and the spillover file.
  2. Check the code: Review the code changes between 2.1.88 and 2.1.119 to understand how the additionalContext cap was introduced.
  3. Implement the fix: Add the necessary code to honor the initialUserMessage field for SubagentStart hooks, as described in the proposed solution.
  4. Test the fix: Verify that the fix works by testing the subagent with a large pre-assembled context payload.

Example

// In src/tools/AgentTool/runAgent.ts, after line 543
let hookInitialUserMessage: string | undefined
for await (const hookResult of executeSubagentStartHooks(...)) {
  if (hookResult.additionalContexts?.length) {
    additionalContexts.push(...hookResult.additionalContexts)
  }
  if (hookResult.initialUserMessage) {
    hookInitialUserMessage = hookResult.initialUserMessage  // NEW
  }
}

// After the additionalContexts attachment block, before agent query:
if (hookInitialUserMessage) {
  initialMessages.push(createUserMessage({ content: hookInitialUserMessage }))
}

Notes

  • The proposed solution assumes that the initialUserMessage field is not already being used for other purposes.
  • The fix may require additional testing and validation to ensure that it does not introduce any new issues.

Recommendation

Apply the workaround by honoring the initialUserMessage

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING