claude-code - 💡(How to fix) Fix Support configurable prompt cache TTL for subagent launches

claude-code2026-05-25 12:36:38

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Anthropic's API supports extended cache TTLs (up to 1 hour via cache_control.ttl), but Claude Code does not expose this setting. Users have no way to control the TTL used when Claude Code constructs API calls for subagent system prompts.

Root Cause

In measured runs, subagent cache_write is $3.52 out of $10.51 total (33% of cost) — the single largest cost line item. Most of this is re-caching identical system prompts because agent launches of the same type are spaced 6-10 minutes apart.

Code Example

{ "subagentCacheTtl": 3600 }

---

cacheTtl: 3600

---

Agent({ cacheTtl: 3600, ... })

RAW_BUFFERClick to expand / collapse

Problem

Multi-agent workflows (like Planner-Worker-Judge orchestration) launch 17+ subagents over 20-30 minutes. Same-type agents share 95%+ of their system prompt prefix, but the 5-minute prompt cache TTL forces repeated cache_creation_input_tokens charges every time the window expires between launches of the same agent type.

Context

Proposed solution

A configurable cache TTL for subagent launches, either:

Global setting in settings.json:
```
{ "subagentCacheTtl": 3600 }
```
Per-agent setting in agent definition files (~/.claude/agents/*.md frontmatter):
```
cacheTtl: 3600
```
Per-invocation via the Agent() tool parameter:
```
Agent({ cacheTtl: 3600, ... })
```

Option 1 is simplest. Option 2 allows fine-grained control (long TTL for frequently reused agents, default for one-off agents).

Impact

With a 30-minute TTL, same-type subagents in a typical PWJ run would get cache hits instead of cache writes. Estimated savings: ~$1.50-2.00/run (15-20% of total cost), with zero changes to workflow logic.

Cost data from production runs

Metric	Value
Total run cost	$10.51
Subagent cache_write	$3.52 (33%)
Subagent cache_read	$2.12 (20%)
Total subagent launches	17-19
Agent types	3 (planner, worker, judge)
Run duration	~25 minutes
Avg gap between same-type launches	6-10 minutes

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering