claude-code - 💡(How to fix) Fix Prompt cache miss on resume: Agent tool description enumerates sub-agents in non-deterministic order [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#49038Fetched 2026-04-17 08:52:36
View on GitHub
Comments
2
Participants
3
Timeline
9
Reactions
0
Author
Timeline (top)
labeled ×4commented ×2cross-referenced ×1mentioned ×1

The built-in Agent tool's description enumerates available sub-agent types (built-in + plugin-provided + user-defined) in non-deterministic order across query() invocations. Because Agent is tools[0] in the API payload, any reshuffle invalidates the prompt-cache prefix hash for everything after it — the remaining tool definitions, the system blocks, and the messages prefix up to the cache breakpoint.

For long-running agent systems that call query({ resume: sessionId }) on each new user turn, this causes every resumed session's first API call to miss prompt cache on the full static prefix (~30–60k tokens in practice), with a proportionally large cache_creation_input_tokens bill at the 1-hour-ephemeral rate (2× base input price). This repeats on every turn.

Root Cause

For any production deployment that uses query({ resume }) per user turn — long-running agent systems, Discord/Slack bots, multi-day conversations — this is a silent, compounding cost. With 100+ MCP tools and plugin-registered sub-agents, the prefix is large enough that the full miss per resume is expensive, and because cache_control isn't exposed to SDK consumers (see #89), there's no workaround short of patching cli.js or dropping to the Messages API directly.

Fix Action

Fix / Workaround

The local workaround I applied (via patch-package) is option 2:

For any production deployment that uses query({ resume }) per user turn — long-running agent systems, Discord/Slack bots, multi-day conversations — this is a silent, compounding cost. With 100+ MCP tools and plugin-registered sub-agents, the prefix is large enough that the full miss per resume is expensive, and because cache_control isn't exposed to SDK consumers (see #89), there's no workaround short of patching cli.js or dropping to the Messages API directly.

Code Example

field                identical
-------------------  ---------
model                yes
metadata             yes
max_tokens           yes
thinking             yes
context_management   yes
output_config        yes
stream               yes
system  (34,054 B)   yes
tools   (53,952 B)   NO — tools[0] only
messages             differs (expected)

---

Call A: ... code-reviewer → code-simplifier → pr-test-analyzer → silent-failure-hunter → comment-analyzer → type-design-analyzer ...
Call B: ... code-reviewer → pr-test-analyzer → silent-failure-hunter → comment-analyzer → code-simplifier → type-design-analyzer ...

---

input  cache_read  cache_create  output
pre-fix (call 1)           6       56059           209       7
pre-fix (call 2)           6           0         56296       7  <-- full prefix miss

---

post-fix (call 1)          6           0         56370       7  (wrote to cache)
post-fix (call 2)          6       56370            32       7  <-- full prefix hit

---

async function JWK(q,K,_){
  let z = _ ? q.filter((D)=>_.includes(D.agentType)) : q,
  // ...
  // Y/A/O/w/$/j setup omitted
  H = j ? "..." : `Available agent types and the tools they have access to:
${z.map((D)=>L47(D)).join(`\n`)}`
}

---

-async function JWK(q,K,_){let z=_?q.filter((D)=>_.includes(D.agentType)):q,
+async function JWK(q,K,_){let z=(_?q.filter((D)=>_.includes(D.agentType)):q).slice().sort((a,b)=>String(a.agentType).localeCompare(String(b.agentType))),
RAW_BUFFERClick to expand / collapse

Summary

The built-in Agent tool's description enumerates available sub-agent types (built-in + plugin-provided + user-defined) in non-deterministic order across query() invocations. Because Agent is tools[0] in the API payload, any reshuffle invalidates the prompt-cache prefix hash for everything after it — the remaining tool definitions, the system blocks, and the messages prefix up to the cache breakpoint.

For long-running agent systems that call query({ resume: sessionId }) on each new user turn, this causes every resumed session's first API call to miss prompt cache on the full static prefix (~30–60k tokens in practice), with a proportionally large cache_creation_input_tokens bill at the 1-hour-ephemeral rate (2× base input price). This repeats on every turn.

Environment

  • @anthropic-ai/[email protected] bundling Claude Code 2.1.101
  • Host runtime: Node.js, macOS
  • SDK mode: query({ prompt, options }) with options.resume = previousSessionId
  • settingSources: ['project', 'user'], systemPrompt: { type: 'preset', preset: 'claude_code', append: ... }

Repro

  1. Run any agent workflow that calls query() with resume: <sessionId> on every new user message.
  2. Intercept the POST to api.anthropic.com/v1/messages?beta=true for two consecutive calls on the same thread within TTL.
  3. Diff the two bodies field-by-field.

Observed: tools[0] (the Agent tool) differs between the two captures — same 32 sub-agents, different order. Everything else is byte-identical.

Evidence

Two live captures 45 seconds apart on the same resumable session, otherwise idle:

field                identical
-------------------  ---------
model                yes
metadata             yes
max_tokens           yes
thinking             yes
context_management   yes
output_config        yes
stream               yes
system  (34,054 B)   yes
tools   (53,952 B)   NO — tools[0] only
messages             differs (expected)

Inside tools[0].description, the list of 32 sub-agents appears in a different order across the two captures. Example divergence (the pr-review-toolkit:* entries shuffle position):

Call A: ... code-reviewer → code-simplifier → pr-test-analyzer → silent-failure-hunter → comment-analyzer → type-design-analyzer ...
Call B: ... code-reviewer → pr-test-analyzer → silent-failure-hunter → comment-analyzer → code-simplifier → type-design-analyzer ...

Token-usage impact (measured)

First API call of a resumed session, same thread, two consecutive user turns within TTL, no tool calls in between:

                       input  cache_read  cache_create  output
pre-fix (call 1)           6       56059           209       7
pre-fix (call 2)           6           0         56296       7  <-- full prefix miss

After locally sorting the agent list deterministically before it hits the tool description:

post-fix (call 1)          6           0         56370       7  (wrote to cache)
post-fix (call 2)          6       56370            32       7  <-- full prefix hit

The cache_create difference on the second call is 56,296 → 32 tokens — a ~1,750× reduction in miss cost per resume.

Location in cli.js

In the minified bundle (v2.1.101), the tool description is assembled in function JWK:

async function JWK(q,K,_){
  let z = _ ? q.filter((D)=>_.includes(D.agentType)) : q,
  // ...
  // Y/A/O/w/$/j setup omitted
  H = j ? "..." : `Available agent types and the tools they have access to:
${z.map((D)=>L47(D)).join(`\n`)}`
}

z inherits q's order. q is populated by the caller from (apparently) an unordered iteration over plugin/user-defined agents. The L47 formatter produces the - <agentType>: ... lines we see in tools[0].description.

Suggested fix

Sort the agent array deterministically before it reaches the description. Either:

  1. Upstream of JWK — sort the caller's list once, by agentType, at the point where built-in + plugin + user agents are merged. Preferable because it fixes every consumer of that list, not just the tool description.

  2. At the formatter — sort z inside JWK before .map(L47). Narrower, but guaranteed to fix the symptom.

The local workaround I applied (via patch-package) is option 2:

-async function JWK(q,K,_){let z=_?q.filter((D)=>_.includes(D.agentType)):q,
+async function JWK(q,K,_){let z=(_?q.filter((D)=>_.includes(D.agentType)):q).slice().sort((a,b)=>String(a.agentType).localeCompare(String(b.agentType))),

Verified to produce byte-identical tools[0] across resumes and full cache-read hits on subsequent turns, as shown in the token table above.

Why this matters

For any production deployment that uses query({ resume }) per user turn — long-running agent systems, Discord/Slack bots, multi-day conversations — this is a silent, compounding cost. With 100+ MCP tools and plugin-registered sub-agents, the prefix is large enough that the full miss per resume is expensive, and because cache_control isn't exposed to SDK consumers (see #89), there's no workaround short of patching cli.js or dropping to the Messages API directly.

Related

  • #89 (cache control in SDK, still open) — this issue is one consequence of users not being able to place their own cache breakpoints around the volatile regions.
  • #247 (MCP + cache) — same class of problem reported from a different angle; the non-serializable in-process MCP server was suspected, but in this deployment the MCP config is stable and the divergence is specifically the agent enumeration.
  • The v2.1.90 --resume cache-miss fix is in place (bundled CC version is 2.1.101) and is distinct from this issue.

extent analysis

TL;DR

Sort the agent array deterministically before it reaches the description to fix the non-deterministic order of sub-agent types.

Guidance

  • Identify the point where built-in, plugin, and user agents are merged and sort the list by agentType to ensure a consistent order.
  • Alternatively, sort the z array inside the JWK function before mapping it to the tool description.
  • Verify the fix by checking for byte-identical tools[0] across resumes and full cache-read hits on subsequent turns.
  • Consider patching the cli.js file or dropping to the Messages API directly as a temporary workaround.

Example

-async function JWK(q,K,_){let z=_?q.filter((D)=>_.includes(D.agentType)):q,
+async function JWK(q,K,_){let z=(_?q.filter((D)=>_.includes(D.agentType)):q).slice().sort((a,b)=>String(a.agentType).localeCompare(String(b.agentType))),

Notes

The provided fix is specific to the JWK function and may not address other potential issues with non-deterministic ordering. Additionally, the cache_control feature is not exposed to SDK consumers, which limits the ability to place custom cache breakpoints.

Recommendation

Apply the workaround by sorting the agent array deterministically, either upstream of JWK or at the formatter, to fix the symptom and reduce the cache miss cost.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Prompt cache miss on resume: Agent tool description enumerates sub-agents in non-deterministic order [2 comments, 3 participants]