hermes - ✅(Solved) Fix [Bug]: get_tool_definitions() quiet-mode cache pollution causes duplicate LCM tool schemas in Gateway [4 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17335Fetched 2026-04-30 06:48:19
View on GitHub
Comments
1
Participants
2
Timeline
15
Reactions
0
Participants
Timeline (top)
cross-referenced ×9labeled ×5commented ×1

Error Message

  • DeepSeek: Error from provider (DeepSeek): Tool names must be unique.
  • Multiple models triggered the same error (mimo-v2.5-pro, deepseek-v4-flash, kimi-k2.6, etc.) — indicating the bug is in tool schema construction, not model-specific

Root Cause

Commit #17098 (perf(tools): memoize get_tool_definitions + TTL-cache check_fn results) introduced a quiet_mode cache in model_tools.py. The caching logic on cache-hit (line 278 return list(cached)) is safe; but the first uncached call (lines 280-283) returns the same object that gets stored into the cache. Any caller that mutates the returned list also mutates the cache.

Later, run_agent.py line 1986-1993 appends LCM context engine tool schemas to self.tools without checking for duplicates:

for _schema in self.context_compressor.get_tool_schemas():
    _wrapped = {"type": "function", "function": _schema}
    self.tools.append(_wrapped)  # mutates shared cached list

Note: memory tools injection (lines 1728-1748) already has dedup logic — context engine tools injection does NOT.

Why TUI is fine: TUI either uses quiet_mode=False (no caching) or short-lived processes.

Fix Action

Workaround

Restarting the Gateway temporarily clears the in-process cache, but the bug will recur as agents accumulate LCM tool appends.

PR fix notes

PR #17337: fix(tools): isolate get_tool_definitions quiet_mode cache + dedup LCM injection (#17335)

Description (problem / solution / changelog)

Closes #17335.

Problem

Long-lived Gateway processes (Feishu, etc.) were sending duplicate tool names to providers that enforce uniqueness:

  • DeepSeek: Tool names must be unique.
  • Xiaomi MiMo: tools contains duplicate names: lcm_expand
  • Moonshot/Kimi: function name lcm_grep is duplicated

TUI was unaffected because TUI uses quiet_mode=False and skips the cache.

Root Cause (two layered bugs)

1. model_tools.get_tool_definitions(quiet_mode=True) aliased the cached object on the first call. The cache-hit path (line 278) already returned list(cached) — safe. But the first uncached call stored and returned the same object. run_agent.py then mutates self.tools in-place (appending memory + LCM context-engine schemas), so the very first agent init in a Gateway process poisoned the cache, and every subsequent init appended LCM schemas again on top of the already-polluted list.

2. run_agent.py's context-engine injection had no dedup. Memory-tools injection (lines 1728–1748) already skips already-present names. The LCM injection right below it (lines 1986–1993) didn't. So even after fixing the cache, plugin paths that register schemas via ctx.register_tool() could still produce duplicates.

Fix (defense in depth, exactly as the issue suggested)

model_tools.py — on the uncached branch, cache the result but return list(result) to the caller, mirroring the cache-hit path:

result = _compute_tool_definitions(...)
if quiet_mode:
    _tool_defs_cache[cache_key] = result
    return list(result)
return result

run_agent.py — build _existing_tool_names from self.tools and skip already-present schemas, mirroring the memory-tools block above:

_existing_tool_names = {t.get("function", {}).get("name") for t in self.tools if isinstance(t, dict)}
for _schema in self.context_compressor.get_tool_schemas():
    _tname = _schema.get("name", "")
    if _tname and _tname in _existing_tool_names:
        continue
    ...
    _existing_tool_names.add(_tname)

Tests

New file tests/test_get_tool_definitions_cache_isolation.py:

  • test_first_uncached_call_returns_fresh_listpins the fix; without it, the first-call alias is the entire bug.
  • test_cache_hit_returns_fresh_list — pre-existing #17098 behavior stays.
  • test_caller_mutation_does_not_poison_cache — simulates run_agent appending lcm_grep / lcm_expand to the returned list and asserts the next call doesn't see them.
  • test_repeated_caller_mutation_does_not_accumulate — reproduces the long-lived Gateway accumulation across 5 agent inits.
  • test_non_quiet_mode_does_not_use_cache — sanity, explains why TUI was unaffected.
$ python -m pytest tests/test_get_tool_definitions_cache_isolation.py tests/test_model_tools.py -q
............................                                            [100%]
28 passed in 0.78s

5/5 new tests pass; 23/23 existing tests/test_model_tools.py still pass.

Changed files

  • model_tools.py (modified, +8/-0)
  • run_agent.py (modified, +17/-2)
  • tests/test_get_tool_definitions_cache_isolation.py (added, +94/-0)

PR #5713: feat(caching): multi-block system prompt with tiered TTLs (v2)

Description (problem / solution / changelog)

Summary

Refactor Anthropic prompt caching to use a structured multi-block system prompt with per-block cache_control markers instead of a single monolithic system message. This maximizes cache hits by isolating volatile content (timestamps, platform hints) from stable content (identity, skills, memory).

Architecture

The system prompt is now assembled as three SystemPromptBlock instances with different cache TTLs:

BlockTTLContents
static1hSoul.md / default identity, tool-aware guidance (memory, session_search, skills), Nous subscription prompt, tool-use enforcement, model-specific operational guidance (Google/OpenAI), skills system prompt
session5mCustom system_message, memory store blocks (memory + user), external memory provider block, context files (AGENTS.md/CLAUDE.md/etc.)
ephemeralnoneTimestamp + session/model/provider line, Alibaba identity workaround, platform hints

At API call time, blocks are converted to Anthropic content block format (`[{type: text, text: ..., cache_control: ...}, ...]`) and sent as the system message. Non-caching models fall through to the flat-string path unchanged.

New public API in `agent/prompt_caching.py`

  • `SystemPromptBlock`, `CacheMetrics`, `AggregatedCacheMetrics` dataclasses
  • `build_system_content_blocks(blocks)` — convert blocks to Anthropic format
  • `apply_anthropic_cache_control_v2(messages, tools, cache_ttl, native_anthropic)` — multi-block + tool caching with budget management (max 4 breakpoints across tools + system + messages)
  • `extract_cache_metrics(usage, api_mode)` — per-call cache extraction supporting both native Anthropic (`cache_read_input_tokens`, `cache_creation_input_tokens`) and OpenRouter (`prompt_tokens_details.cached_tokens`) response formats
  • `aggregate_cache_metrics(metrics_list)` — cross-turn aggregation

The v1 `apply_anthropic_cache_control` function and `_apply_cache_marker` helper are preserved unchanged for backward compatibility.

Integration in `run_agent.py`

  • New `_build_system_prompt_blocks()` method assembles the three tiered blocks and caches them on `self._cached_system_blocks`
  • The existing `_build_system_prompt()` method still returns a flat string (for backward compatibility with code paths that expect one) but now delegates to the block builder
  • Cached blocks are invalidated on context compression (`_cached_system_blocks = None` alongside `_cached_system_prompt = None`)
  • At API call time, when `_use_prompt_caching` is enabled and `_cached_system_blocks` is populated, a multi-block path builds `{role: system, content: [...]}` with cache_control markers already set per block
  • Plugin turn context (`_plugin_turn_context`) remains reserved for future system-level plugin instructions; plugin context from pre_llm_call hooks still goes into user messages (unchanged)
  • Fallback flat-string path handles non-caching models and pre-structured content correctly

Test coverage

  • `tests/agent/test_prompt_caching.py` — 46 unit tests covering v1 (preserved) and v2 functions: data structures, cache markers, content block conversion, pre-structured detection, breakpoint budgeting, metrics extraction and aggregation
  • `tests/agent/test_prompt_caching_v2.py` — 38 additional integration tests for v2 behavior (tool caching interaction with system blocks, budget with pre-structured content, backward compatibility with v1 code paths)
  • `tests/test_prompt_caching_integration.py` — 10 integration tests against `run_agent.py` block assembly (three-block structure, tier TTLs, timestamp in ephemeral block only, cache invalidation, backward-compat string return, non-caching models unaffected)

Verified: 317 tests passing (all of the above plus `tests/test_run_agent.py` regression suite).

Test plan

  • All new v2 unit tests pass (`pytest tests/agent/test_prompt_caching.py tests/agent/test_prompt_caching_v2.py`)
  • Integration tests against `run_agent.py` block assembly pass (`pytest tests/test_prompt_caching_integration.py`)
  • Full run_agent.py regression suite passes (`pytest tests/test_run_agent.py`)
  • `run_agent` imports cleanly
  • Manual: verify cache hit rate improves on a multi-turn conversation with stable context files (reviewer action)
  • Manual: verify non-caching models (e.g. local Ollama) still work via flat-string fallback (reviewer action)

Platforms tested

Linux (WSL2, Ubuntu 22.04), Python 3.11

🤖 Generated with Claude Code

Changed files

  • agent/prompt_caching.py (modified, +252/-6)
  • run_agent.py (modified, +169/-83)
  • tests/agent/test_prompt_caching.py (modified, +316/-0)
  • tests/agent/test_prompt_caching_v2.py (added, +469/-0)
  • tests/test_prompt_caching_integration.py (added, +209/-0)

PR #10006: fix: codex_responses prompt caching — session routing headers + cache_write_tokens field

Description (problem / solution / changelog)

Problem

Prompt caching does not work when using codex_responses API mode with OpenAI-compatible providers (e.g. theclawbay). Every request is a cache miss despite prompt_cache_key being set in the request body.

Root cause

Two issues:

1. Missing session routing headers (run_agent.py)

The OpenAI client is initialized without session_id or x-client-request-id headers for codex providers. These headers are required for server-side cache routing — they tell the backend to route requests to the same server that holds the cached prompt prefix.

The official Codex CLI sends these unconditionally. Hermes sets default_headers for OpenRouter, GitHub Copilot, Kimi, and Qwen — but never for Codex/theclawbay.

2. Wrong field name for cache_write_tokens (agent/usage_pricing.py)

The codex_responses branch reads cache_creation_tokens (Anthropic naming convention) instead of cache_write_tokens (OpenAI Responses API naming). This means cache write tokens are always reported as 0.

Fix

Patch 1: Session routing headers

After session_id is assigned during __init__, inject session_id and x-client-request-id into default_headers for codex_responses mode. Also applied in _apply_client_headers_for_base_url() so headers survive /model switches.

Patch 2: cache_write_tokens field

Read cache_write_tokens first (OpenAI naming), fall back to cache_creation_tokens for backward compatibility.

Tests

  • test_codex_responses_reads_cache_write_tokens_field — verifies correct field is read
  • test_codex_responses_falls_back_to_cache_creation_tokens — backward compat
  • test_codex_responses_injects_session_routing_headers — verifies headers are set

Affected files

FileChange
run_agent.pyInject session routing headers for codex_responses mode (+22 lines)
agent/usage_pricing.pyRead cache_write_tokens before cache_creation_tokens (+3/-1 lines)
tests/agent/test_usage_pricing.py2 new tests
tests/run_agent/test_run_agent.py1 new test

Changed files

  • agent/usage_pricing.py (modified, +3/-1)
  • run_agent.py (modified, +22/-0)
  • tests/agent/test_usage_pricing.py (modified, +36/-0)
  • tests/run_agent/test_run_agent.py (modified, +19/-0)

PR #17736: fix: dedup context engine tools + fix kimi aux deadcode model name

Description (problem / solution / changelog)

What does this PR do?

<!-- Describe the change clearly. What problem does it solve? Why is this approach the right one? -->

Two small bugfixes from local-mods:

  1. Dedup context engine toolslcm_grep/lcm_describe/lcm_expand were injected without checking valid_tool_names. On session restore or context-engine re-init, get_tool_schemas() returns the same tools → duplicate function names → 400 from DeepSeek/Moonshot.

  2. Fix Kimi aux deadcode_API_KEY_PROVIDER_AUX_MODELS["kimi-coding-cn"] pointed to nonexistent model kimi-k2-turbo-preview. Changed to kimi-k2.6.

Related Issue

<!-- Link the issue this PR addresses. If no issue exists, consider creating one first. -->

Fixes #

Type of Change

<!-- Check the one that applies. -->
  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

<!-- List the specific changes. Include file paths for code changes. -->

How to Test

<!-- Steps to verify this change works. For bugs: reproduction steps + proof that the fix works. -->

Checklist

<!-- Complete these before requesting review. -->

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: <!-- e.g. Ubuntu 24.04, macOS 15.2, Windows 11 -->

Documentation & Housekeeping

<!-- Check all that apply. It's OK to check "N/A" if a category doesn't apply to your change. -->
  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

<!-- Only fill this out if you're adding a skill. Delete this section otherwise. -->
  • This skill is broadly useful to most users (if bundled) — see Contributing Guide
  • SKILL.md follows the standard format (frontmatter, trigger conditions, steps, pitfalls)
  • No external dependencies that aren't already available (prefer stdlib, curl, existing Hermes tools)
  • I've tested the skill end-to-end: hermes --toolsets skills -q "Use the X skill to do Y"

Screenshots / Logs

<!-- If applicable, add screenshots or log output showing the fix/feature in action. -->

Changed files

  • agent/auxiliary_client.py (modified, +1/-1)
  • run_agent.py (modified, +5/-1)

Code Example

for _schema in self.context_compressor.get_tool_schemas():
    _wrapped = {"type": "function", "function": _schema}
    self.tools.append(_wrapped)  # mutates shared cached list
RAW_BUFFERClick to expand / collapse

Symptoms

Gateway (Feishu, etc.) returns HTTP 400 on API calls with errors like:

  • DeepSeek: Error from provider (DeepSeek): Tool names must be unique.
  • Xiaomi MiMo: tools contains duplicate names: lcm_expand
  • Moonshot/Kimi: function name lcm_grep is duplicated

TUI sessions are unaffected.

Environment:

  • Hermes v0.11.0 (commit adef1f33a / cd7150a19 with #17098 merged)
  • context.engine: lcm, hermes-lcm plugin enabled
  • Multiple models triggered the same error (mimo-v2.5-pro, deepseek-v4-flash, kimi-k2.6, etc.) — indicating the bug is in tool schema construction, not model-specific

Root Cause

Commit #17098 (perf(tools): memoize get_tool_definitions + TTL-cache check_fn results) introduced a quiet_mode cache in model_tools.py. The caching logic on cache-hit (line 278 return list(cached)) is safe; but the first uncached call (lines 280-283) returns the same object that gets stored into the cache. Any caller that mutates the returned list also mutates the cache.

Later, run_agent.py line 1986-1993 appends LCM context engine tool schemas to self.tools without checking for duplicates:

for _schema in self.context_compressor.get_tool_schemas():
    _wrapped = {"type": "function", "function": _schema}
    self.tools.append(_wrapped)  # mutates shared cached list

Note: memory tools injection (lines 1728-1748) already has dedup logic — context engine tools injection does NOT.

Why TUI is fine: TUI either uses quiet_mode=False (no caching) or short-lived processes.

Reproduction

Verified with a local script. Clear cache, call get_tool_definitions(quiet_mode=True), mutate the returned list, call again — the mutation is visible in the cache. In a long-lived Gateway process, each agent init appends LCM tools again, causing accumulation.

Suggested Fix

Two places to harden:

  1. model_tools.py: On the first non-cached call, return a shallow copy and cache the original — same defensive pattern as the cache-hit path (line 278).

  2. run_agent.py: Add dedup logic to context engine tool injection (lines 1986-1993), mirroring the memory tools dedup (lines 1728-1748).

Either fix alone stops the symptom; both together provide defense-in-depth.

Workaround

Restarting the Gateway temporarily clears the in-process cache, but the bug will recur as agents accumulate LCM tool appends.

extent analysis

TL;DR

To fix the issue, modify model_tools.py to return a shallow copy of the tool definitions on the first non-cached call and add deduplication logic to context engine tool injection in run_agent.py.

Guidance

  • Modify model_tools.py to return a shallow copy of the tool definitions on the first non-cached call to prevent cache mutation.
  • Add deduplication logic to context engine tool injection in run_agent.py to prevent duplicate tool names.
  • Verify the fix by checking for duplicate tool names in the Gateway logs after applying the changes.
  • Consider implementing both suggested fixes to provide defense-in-depth against similar issues.

Example

# model_tools.py
def get_tool_definitions(quiet_mode):
    # ...
    if not cached:
        result = [...]  # calculate tool definitions
        cached = result.copy()  # cache a copy of the result
        return result.copy()  # return a copy of the result

# run_agent.py
def inject_context_engine_tools(self):
    # ...
    for _schema in self.context_compressor.get_tool_schemas():
        _wrapped = {"type": "function", "function": _schema}
        if _wrapped not in self.tools:  # dedup logic
            self.tools.append(_wrapped)

Notes

The suggested fixes assume that the issue is caused by the caching logic in model_tools.py and the lack of deduplication logic in run_agent.py. If the issue persists after applying these fixes, further investigation may be necessary.

Recommendation

Apply the suggested fixes to model_tools.py and run_agent.py to prevent duplicate tool names and provide defense-in-depth against similar issues. This approach addresses the root cause of the issue and provides a more robust solution than a temporary workaround.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug]: get_tool_definitions() quiet-mode cache pollution causes duplicate LCM tool schemas in Gateway [4 pull requests, 1 comments, 2 participants]