hermes - 💡(How to fix) Fix background_review fork sends wider tools[] than parent, fragments Anthropic prefix cache (~50% wasted cache-write on long sessions)

hermes2026-05-21 01:41:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

The background skill/memory-review fork (agent/background_review.py, spawned every memory_nudge_interval / skill_nudge_interval turns) constructs its child AIAgent without propagating enabled_toolsets / disabled_toolsets from the parent. When the parent has narrowed its toolset (via hermes tools disable, config.yaml, or any other mechanism that produces a non-default tool set), the fork's default enabled_toolsets=None expands to "all registered tools" — and the fork's outbound request body sends a strictly wider tools[] array than the parent's main-turn request.

Anthropic's prompt-cache key is computed over the byte-exact tools[] array (which sits above system in the cache hierarchy). The divergence forks the cache lineage and forces a full prefix rewrite on every review nudge, even though the fork shares messages[0..N] and system with the parent byte-for-byte.

This is the same class of bug as #25322 / PR #17276 (which fixed the system-bytes invariant), one slot up the cache hierarchy.

Root Cause

agent/background_review.py, inside _spawn_background_review, constructs the review fork as:

review_agent = AIAgent(
    model=agent.model,
    max_iterations=16,
    quiet_mode=True,
    platform=agent.platform,
    provider=agent.provider,
    api_mode=_parent_api_mode,
    base_url=_parent_runtime.get("base_url") or None,
    api_key=_parent_runtime.get("api_key") or None,
    credential_pool=getattr(agent, "_credential_pool", None),
    parent_session_id=agent.session_id,
    skip_memory=True,
)

enabled_toolsets and disabled_toolsets are not passed. The AIAgent.__init__ default for enabled_toolsets is None, which toolsets.resolve_multiple_toolsets expands to the full registry. The parent's narrower configuration is silently dropped.

PR #17276 / #25322 explicitly fixed the system-bytes invariant for exactly this reason (the comments around _cached_system_prompt inheritance call this out) — the tools[] slot was simply missed in that earlier patch.

The post-construction runtime whitelist (set_thread_tool_whitelist({memory, skills, …})) installed a few lines below still gates which tools the model is allowed to dispatch, so the safety contract from #15204 is unaffected by this change — only what the request body transmits over the wire needs to be aligned.

Fix Action

Fix / Workaround

Code Example

review_agent = AIAgent(
    model=agent.model,
    max_iterations=16,
    quiet_mode=True,
    platform=agent.platform,
    provider=agent.provider,
    api_mode=_parent_api_mode,
    base_url=_parent_runtime.get("base_url") or None,
    api_key=_parent_runtime.get("api_key") or None,
    credential_pool=getattr(agent, "_credential_pool", None),
    parent_session_id=agent.session_id,
    skip_memory=True,
)

---

review_agent = AIAgent(
    ...
    enabled_toolsets=getattr(agent, "enabled_toolsets", None),
    disabled_toolsets=getattr(agent, "disabled_toolsets", None),
    skip_memory=True,
)

RAW_BUFFERClick to expand / collapse

Background skill/memory review fork sends a wider `tools[]` than the parent — fragments Anthropic prefix cache, ~50% cache-write overhead on long sessions

Summary

This is the same class of bug as #25322 / PR #17276 (which fixed the system-bytes invariant), one slot up the cache hierarchy.

Reproduction

Tested by routing the Anthropic API through a local HTTP-capture proxy and inspecting request_body.tools on every outbound /v1/messages request during a long real-world session.

In a single ~5-hour conversation with a typical user-narrowed toolset:

Slot	Main-turn reqs	Review-fork reqs
Tool count	30	45 (registry default)
`tools[]` hash	`<hash A>`	`<hash B>` (differs)
Top-level keys	includes `output_config`, `thinking`	omits both
Last-user-message prefix	normal user text	`"Review the conversation above and update the skill library…"`
`cache_read_input_tokens` per req	dominant	small
`cache_creation_input_tokens` per req	small	dominant

The fork-shape requests are easy to identify in any captured traffic: they (a) lack output_config and thinking (they go through the secondary completion adapter), and (b) carry the _SKILL_REVIEW_PROMPT / _MEMORY_REVIEW_PROMPT / _COMBINED_REVIEW_PROMPT constant as the last user message.

Cost-impact measurement

From the same captured session (411 /v1/messages requests total over ~5 h, on a Sonnet-class model):

Quantity	Main-shape reqs	Review-fork reqs
Request count	324	87
`cache_creation_input_tokens` total	2.62 M	2.71 M
`cache_read_input_tokens` total	41.81 M	9.61 M
Distinct fork spawns	—	30
Avg `cache_creation` per spawn	—	~90 K tokens

About 51 % of the session's total cache-write tokens were attributable to review-fork requests — the fork rewrites the prefix from scratch on every nudge instead of reading from the parent's warmed cache. Spawn cadence is governed by the nudge interval (default 5), so the per-turn overhead scales linearly with session length.

Pricing the 2.71 M wasted cache_creation tokens at public Anthropic Sonnet-4 cache-write rates gives a ballpark of ~$10 per ~5 h session; on Opus the same wire pattern costs several times more. The exact dollar figure varies with model and 5-min vs 1-h cache TTL, but the structural inefficiency — fork rewrites instead of reading — is independent of price.

Root cause

agent/background_review.py, inside _spawn_background_review, constructs the review fork as:

review_agent = AIAgent(
    model=agent.model,
    max_iterations=16,
    quiet_mode=True,
    platform=agent.platform,
    provider=agent.provider,
    api_mode=_parent_api_mode,
    base_url=_parent_runtime.get("base_url") or None,
    api_key=_parent_runtime.get("api_key") or None,
    credential_pool=getattr(agent, "_credential_pool", None),
    parent_session_id=agent.session_id,
    skip_memory=True,
)

Proposed fix

Two-line change in _spawn_background_review: propagate enabled_toolsets / disabled_toolsets from the parent.

review_agent = AIAgent(
    ...
    enabled_toolsets=getattr(agent, "enabled_toolsets", None),
    disabled_toolsets=getattr(agent, "disabled_toolsets", None),
    skip_memory=True,
)

Symmetric inheritance — whatever the parent has, the fork has the same. When the parent's value is None (registry default), the fork's is also None and they expand identically; when the parent narrows, the fork inherits the narrowed set verbatim.

The accompanying PR also corrects an existing test (test_background_review_does_not_narrow_toolset_schema) whose stated invariant — "the fork must NOT pass enabled_toolsets" — was built on the implicit assumption that the parent always runs with the registry default. That assumption holds only when the user hasn't disabled any toolset; in practice, whenever the parent narrows, leaving the fork at None is what causes the divergence.

Verification

After applying the fix and restarting the agent:

Window	Total reqs	Review-fork reqs	Distinct `tools[]` hashes
Pre-fix (5 h)	411	87 (across 30 spawns)	2 (main vs fork)
Post-fix (~15 min, after first fork spawn)	32	4 (1 spawn)	1 (identical)

The post-fix fork retains its fingerprint (missing output_config / thinking, "Review the conversation above…" prompt) but its tools[] array is now byte-identical to the parent's, so its requests read from the parent's warmed cache instead of writing fresh.

Scope of the fix

agent/background_review.py: two added kwargs + explanatory comment.
Two test files updated: one new positive assertion, one inverted/renamed existing test.
No production code paths outside the review fork; no schema or public-API changes; safety whitelist untouched.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix background_review fork sends wider tools[] than parent, fragments Anthropic prefix cache (~50% wasted cache-write on long sessions)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Background skill/memory review fork sends a wider `tools[]` than the parent — fragments Anthropic prefix cache, ~50% cache-write overhead on long sessions

Summary

Reproduction

Cost-impact measurement

Root cause

Proposed fix

Verification

Scope of the fix

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix background_review fork sends wider tools[] than parent, fragments Anthropic prefix cache (~50% wasted cache-write on long sessions)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Background skill/memory review fork sends a wider tools[] than the parent — fragments Anthropic prefix cache, ~50% cache-write overhead on long sessions

Summary

Reproduction

Cost-impact measurement

Root cause

Proposed fix

Verification

Scope of the fix

Still need to ship something?

TRENDING

Background skill/memory review fork sends a wider `tools[]` than the parent — fragments Anthropic prefix cache, ~50% cache-write overhead on long sessions