claude-code - 💡(How to fix) Fix Support configurable prompt cache TTL for subagent launches

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Anthropic's API supports extended cache TTLs (up to 1 hour via cache_control.ttl), but Claude Code does not expose this setting. Users have no way to control the TTL used when Claude Code constructs API calls for subagent system prompts.

Root Cause

In measured runs, subagent cache_write is $3.52 out of $10.51 total (33% of cost) — the single largest cost line item. Most of this is re-caching identical system prompts because agent launches of the same type are spaced 6-10 minutes apart.

Code Example

{ "subagentCacheTtl": 3600 }

---

cacheTtl: 3600

---

Agent({ cacheTtl: 3600, ... })
RAW_BUFFERClick to expand / collapse

Problem

Multi-agent workflows (like Planner-Worker-Judge orchestration) launch 17+ subagents over 20-30 minutes. Same-type agents share 95%+ of their system prompt prefix, but the 5-minute prompt cache TTL forces repeated cache_creation_input_tokens charges every time the window expires between launches of the same agent type.

In measured runs, subagent cache_write is $3.52 out of $10.51 total (33% of cost) — the single largest cost line item. Most of this is re-caching identical system prompts because agent launches of the same type are spaced 6-10 minutes apart.

Context

Anthropic's API supports extended cache TTLs (up to 1 hour via cache_control.ttl), but Claude Code does not expose this setting. Users have no way to control the TTL used when Claude Code constructs API calls for subagent system prompts.

Proposed solution

A configurable cache TTL for subagent launches, either:

  1. Global setting in settings.json:

    { "subagentCacheTtl": 3600 }
  2. Per-agent setting in agent definition files (~/.claude/agents/*.md frontmatter):

    cacheTtl: 3600
  3. Per-invocation via the Agent() tool parameter:

    Agent({ cacheTtl: 3600, ... })

Option 1 is simplest. Option 2 allows fine-grained control (long TTL for frequently reused agents, default for one-off agents).

Impact

With a 30-minute TTL, same-type subagents in a typical PWJ run would get cache hits instead of cache writes. Estimated savings: ~$1.50-2.00/run (15-20% of total cost), with zero changes to workflow logic.

Cost data from production runs

MetricValue
Total run cost$10.51
Subagent cache_write$3.52 (33%)
Subagent cache_read$2.12 (20%)
Total subagent launches17-19
Agent types3 (planner, worker, judge)
Run duration~25 minutes
Avg gap between same-type launches6-10 minutes

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING