hermes - 💡(How to fix) Fix background_review fork sends wider tools[] than parent, fragments Anthropic prefix cache (~50% wasted cache-write on long sessions)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The background skill/memory-review fork (agent/background_review.py, spawned every memory_nudge_interval / skill_nudge_interval turns) constructs its child AIAgent without propagating enabled_toolsets / disabled_toolsets from the parent. When the parent has narrowed its toolset (via hermes tools disable, config.yaml, or any other mechanism that produces a non-default tool set), the fork's default enabled_toolsets=None expands to "all registered tools" — and the fork's outbound request body sends a strictly wider tools[] array than the parent's main-turn request.

Anthropic's prompt-cache key is computed over the byte-exact tools[] array (which sits above system in the cache hierarchy). The divergence forks the cache lineage and forces a full prefix rewrite on every review nudge, even though the fork shares messages[0..N] and system with the parent byte-for-byte.

This is the same class of bug as #25322 / PR #17276 (which fixed the system-bytes invariant), one slot up the cache hierarchy.

Root Cause

agent/background_review.py, inside _spawn_background_review, constructs the review fork as:

review_agent = AIAgent(
    model=agent.model,
    max_iterations=16,
    quiet_mode=True,
    platform=agent.platform,
    provider=agent.provider,
    api_mode=_parent_api_mode,
    base_url=_parent_runtime.get("base_url") or None,
    api_key=_parent_runtime.get("api_key") or None,
    credential_pool=getattr(agent, "_credential_pool", None),
    parent_session_id=agent.session_id,
    skip_memory=True,
)

enabled_toolsets and disabled_toolsets are not passed. The AIAgent.__init__ default for enabled_toolsets is None, which toolsets.resolve_multiple_toolsets expands to the full registry. The parent's narrower configuration is silently dropped.

PR #17276 / #25322 explicitly fixed the system-bytes invariant for exactly this reason (the comments around _cached_system_prompt inheritance call this out) — the tools[] slot was simply missed in that earlier patch.

The post-construction runtime whitelist (set_thread_tool_whitelist({memory, skills, …})) installed a few lines below still gates which tools the model is allowed to dispatch, so the safety contract from #15204 is unaffected by this change — only what the request body transmits over the wire needs to be aligned.

Fix Action

Fix / Workaround

PR #17276 / #25322 explicitly fixed the system-bytes invariant for exactly this reason (the comments around _cached_system_prompt inheritance call this out) — the tools[] slot was simply missed in that earlier patch.

The post-construction runtime whitelist (set_thread_tool_whitelist({memory, skills, …})) installed a few lines below still gates which tools the model is allowed to dispatch, so the safety contract from #15204 is unaffected by this change — only what the request body transmits over the wire needs to be aligned.

Code Example

review_agent = AIAgent(
    model=agent.model,
    max_iterations=16,
    quiet_mode=True,
    platform=agent.platform,
    provider=agent.provider,
    api_mode=_parent_api_mode,
    base_url=_parent_runtime.get("base_url") or None,
    api_key=_parent_runtime.get("api_key") or None,
    credential_pool=getattr(agent, "_credential_pool", None),
    parent_session_id=agent.session_id,
    skip_memory=True,
)

---

review_agent = AIAgent(
    ...
    enabled_toolsets=getattr(agent, "enabled_toolsets", None),
    disabled_toolsets=getattr(agent, "disabled_toolsets", None),
    skip_memory=True,
)
RAW_BUFFERClick to expand / collapse

Background skill/memory review fork sends a wider tools[] than the parent — fragments Anthropic prefix cache, ~50% cache-write overhead on long sessions

Summary

The background skill/memory-review fork (agent/background_review.py, spawned every memory_nudge_interval / skill_nudge_interval turns) constructs its child AIAgent without propagating enabled_toolsets / disabled_toolsets from the parent. When the parent has narrowed its toolset (via hermes tools disable, config.yaml, or any other mechanism that produces a non-default tool set), the fork's default enabled_toolsets=None expands to "all registered tools" — and the fork's outbound request body sends a strictly wider tools[] array than the parent's main-turn request.

Anthropic's prompt-cache key is computed over the byte-exact tools[] array (which sits above system in the cache hierarchy). The divergence forks the cache lineage and forces a full prefix rewrite on every review nudge, even though the fork shares messages[0..N] and system with the parent byte-for-byte.

This is the same class of bug as #25322 / PR #17276 (which fixed the system-bytes invariant), one slot up the cache hierarchy.

Reproduction

Tested by routing the Anthropic API through a local HTTP-capture proxy and inspecting request_body.tools on every outbound /v1/messages request during a long real-world session.

In a single ~5-hour conversation with a typical user-narrowed toolset:

SlotMain-turn reqsReview-fork reqs
Tool count3045 (registry default)
tools[] hash<hash A><hash B> (differs)
Top-level keysincludes output_config, thinkingomits both
Last-user-message prefixnormal user text"Review the conversation above and update the skill library…"
cache_read_input_tokens per reqdominantsmall
cache_creation_input_tokens per reqsmalldominant

The fork-shape requests are easy to identify in any captured traffic: they (a) lack output_config and thinking (they go through the secondary completion adapter), and (b) carry the _SKILL_REVIEW_PROMPT / _MEMORY_REVIEW_PROMPT / _COMBINED_REVIEW_PROMPT constant as the last user message.

Cost-impact measurement

From the same captured session (411 /v1/messages requests total over ~5 h, on a Sonnet-class model):

QuantityMain-shape reqsReview-fork reqs
Request count32487
cache_creation_input_tokens total2.62 M2.71 M
cache_read_input_tokens total41.81 M9.61 M
Distinct fork spawns30
Avg cache_creation per spawn~90 K tokens

About 51 % of the session's total cache-write tokens were attributable to review-fork requests — the fork rewrites the prefix from scratch on every nudge instead of reading from the parent's warmed cache. Spawn cadence is governed by the nudge interval (default 5), so the per-turn overhead scales linearly with session length.

Pricing the 2.71 M wasted cache_creation tokens at public Anthropic Sonnet-4 cache-write rates gives a ballpark of ~$10 per ~5 h session; on Opus the same wire pattern costs several times more. The exact dollar figure varies with model and 5-min vs 1-h cache TTL, but the structural inefficiency — fork rewrites instead of reading — is independent of price.

Root cause

agent/background_review.py, inside _spawn_background_review, constructs the review fork as:

review_agent = AIAgent(
    model=agent.model,
    max_iterations=16,
    quiet_mode=True,
    platform=agent.platform,
    provider=agent.provider,
    api_mode=_parent_api_mode,
    base_url=_parent_runtime.get("base_url") or None,
    api_key=_parent_runtime.get("api_key") or None,
    credential_pool=getattr(agent, "_credential_pool", None),
    parent_session_id=agent.session_id,
    skip_memory=True,
)

enabled_toolsets and disabled_toolsets are not passed. The AIAgent.__init__ default for enabled_toolsets is None, which toolsets.resolve_multiple_toolsets expands to the full registry. The parent's narrower configuration is silently dropped.

PR #17276 / #25322 explicitly fixed the system-bytes invariant for exactly this reason (the comments around _cached_system_prompt inheritance call this out) — the tools[] slot was simply missed in that earlier patch.

The post-construction runtime whitelist (set_thread_tool_whitelist({memory, skills, …})) installed a few lines below still gates which tools the model is allowed to dispatch, so the safety contract from #15204 is unaffected by this change — only what the request body transmits over the wire needs to be aligned.

Proposed fix

Two-line change in _spawn_background_review: propagate enabled_toolsets / disabled_toolsets from the parent.

review_agent = AIAgent(
    ...
    enabled_toolsets=getattr(agent, "enabled_toolsets", None),
    disabled_toolsets=getattr(agent, "disabled_toolsets", None),
    skip_memory=True,
)

Symmetric inheritance — whatever the parent has, the fork has the same. When the parent's value is None (registry default), the fork's is also None and they expand identically; when the parent narrows, the fork inherits the narrowed set verbatim.

The accompanying PR also corrects an existing test (test_background_review_does_not_narrow_toolset_schema) whose stated invariant — "the fork must NOT pass enabled_toolsets" — was built on the implicit assumption that the parent always runs with the registry default. That assumption holds only when the user hasn't disabled any toolset; in practice, whenever the parent narrows, leaving the fork at None is what causes the divergence.

Verification

After applying the fix and restarting the agent:

WindowTotal reqsReview-fork reqsDistinct tools[] hashes
Pre-fix (5 h)41187 (across 30 spawns)2 (main vs fork)
Post-fix (~15 min, after first fork spawn)324 (1 spawn)1 (identical)

The post-fix fork retains its fingerprint (missing output_config / thinking, "Review the conversation above…" prompt) but its tools[] array is now byte-identical to the parent's, so its requests read from the parent's warmed cache instead of writing fresh.

Scope of the fix

  • agent/background_review.py: two added kwargs + explanatory comment.
  • Two test files updated: one new positive assertion, one inverted/renamed existing test.
  • No production code paths outside the review fork; no schema or public-API changes; safety whitelist untouched.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING