hermes - ✅(Solved) Fix [Bug]: get_tool_definitions() quiet-mode cache pollution causes duplicate LCM tool schemas in Gateway [4 pull requests, 1 comments, 2 participants]

hermes2026-04-29 07:46:24

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#17335•Fetched 2026-04-30 06:48:19

View on GitHub

Comments

Participants

Timeline

Reactions

Author

alansong63

Participants

alansong63

markojak

Timeline (top)

cross-referenced ×9labeled ×5commented ×1

Error Message

DeepSeek: Error from provider (DeepSeek): Tool names must be unique.
Multiple models triggered the same error (mimo-v2.5-pro, deepseek-v4-flash, kimi-k2.6, etc.) — indicating the bug is in tool schema construction, not model-specific

Root Cause

Commit #17098 (perf(tools): memoize get_tool_definitions + TTL-cache check_fn results) introduced a quiet_mode cache in model_tools.py. The caching logic on cache-hit (line 278 return list(cached)) is safe; but the first uncached call (lines 280-283) returns the same object that gets stored into the cache. Any caller that mutates the returned list also mutates the cache.

Later, run_agent.py line 1986-1993 appends LCM context engine tool schemas to self.tools without checking for duplicates:

for _schema in self.context_compressor.get_tool_schemas():
    _wrapped = {"type": "function", "function": _schema}
    self.tools.append(_wrapped)  # mutates shared cached list

Note: memory tools injection (lines 1728-1748) already has dedup logic — context engine tools injection does NOT.

Why TUI is fine: TUI either uses quiet_mode=False (no caching) or short-lived processes.

Fix Action

Workaround

Restarting the Gateway temporarily clears the in-process cache, but the bug will recur as agents accumulate LCM tool appends.

PR fix notes

PR #17337: fix(tools): isolate get_tool_definitions quiet_mode cache + dedup LCM injection (#17335)

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/17337

Description (problem / solution / changelog)

Closes #17335.

Problem

Long-lived Gateway processes (Feishu, etc.) were sending duplicate tool names to providers that enforce uniqueness:

DeepSeek: Tool names must be unique.
Xiaomi MiMo: tools contains duplicate names: lcm_expand
Moonshot/Kimi: function name lcm_grep is duplicated

TUI was unaffected because TUI uses quiet_mode=False and skips the cache.

Root Cause (two layered bugs)

1. model_tools.get_tool_definitions(quiet_mode=True) aliased the cached object on the first call. The cache-hit path (line 278) already returned list(cached) — safe. But the first uncached call stored and returned the same object. run_agent.py then mutates self.tools in-place (appending memory + LCM context-engine schemas), so the very first agent init in a Gateway process poisoned the cache, and every subsequent init appended LCM schemas again on top of the already-polluted list.

2. run_agent.py's context-engine injection had no dedup. Memory-tools injection (lines 1728–1748) already skips already-present names. The LCM injection right below it (lines 1986–1993) didn't. So even after fixing the cache, plugin paths that register schemas via ctx.register_tool() could still produce duplicates.

Fix (defense in depth, exactly as the issue suggested)

model_tools.py — on the uncached branch, cache the result but return list(result) to the caller, mirroring the cache-hit path:

result = _compute_tool_definitions(...)
if quiet_mode:
    _tool_defs_cache[cache_key] = result
    return list(result)
return result

run_agent.py — build _existing_tool_names from self.tools and skip already-present schemas, mirroring the memory-tools block above:

_existing_tool_names = {t.get("function", {}).get("name") for t in self.tools if isinstance(t, dict)}
for _schema in self.context_compressor.get_tool_schemas():
    _tname = _schema.get("name", "")
    if _tname and _tname in _existing_tool_names:
        continue
    ...
    _existing_tool_names.add(_tname)

Tests

New file tests/test_get_tool_definitions_cache_isolation.py:

test_first_uncached_call_returns_fresh_list — pins the fix; without it, the first-call alias is the entire bug.
test_cache_hit_returns_fresh_list — pre-existing #17098 behavior stays.
test_caller_mutation_does_not_poison_cache — simulates run_agent appending lcm_grep / lcm_expand to the returned list and asserts the next call doesn't see them.
test_repeated_caller_mutation_does_not_accumulate — reproduces the long-lived Gateway accumulation across 5 agent inits.
test_non_quiet_mode_does_not_use_cache — sanity, explains why TUI was unaffected.

$ python -m pytest tests/test_get_tool_definitions_cache_isolation.py tests/test_model_tools.py -q
............................                                            [100%]
28 passed in 0.78s

5/5 new tests pass; 23/23 existing tests/test_model_tools.py still pass.

Changed files

model_tools.py (modified, +8/-0)
run_agent.py (modified, +17/-2)
tests/test_get_tool_definitions_cache_isolation.py (added, +94/-0)

PR #5713: feat(caching): multi-block system prompt with tiered TTLs (v2)

Repository: NousResearch/hermes-agent
Author: Deland78
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/5713

Description (problem / solution / changelog)

Summary

Refactor Anthropic prompt caching to use a structured multi-block system prompt with per-block cache_control markers instead of a single monolithic system message. This maximizes cache hits by isolating volatile content (timestamps, platform hints) from stable content (identity, skills, memory).

Architecture

The system prompt is now assembled as three SystemPromptBlock instances with different cache TTLs:

Block	TTL	Contents
static	1h	Soul.md / default identity, tool-aware guidance (memory, session_search, skills), Nous subscription prompt, tool-use enforcement, model-specific operational guidance (Google/OpenAI), skills system prompt
session	5m	Custom `system_message`, memory store blocks (memory + user), external memory provider block, context files (AGENTS.md/CLAUDE.md/etc.)
ephemeral	none	Timestamp + session/model/provider line, Alibaba identity workaround, platform hints

At API call time, blocks are converted to Anthropic content block format (`[{type: text, text: ..., cache_control: ...}, ...]`) and sent as the system message. Non-caching models fall through to the flat-string path unchanged.

New public API in `agent/prompt_caching.py`

`SystemPromptBlock`, `CacheMetrics`, `AggregatedCacheMetrics` dataclasses
`build_system_content_blocks(blocks)` — convert blocks to Anthropic format
`apply_anthropic_cache_control_v2(messages, tools, cache_ttl, native_anthropic)` — multi-block + tool caching with budget management (max 4 breakpoints across tools + system + messages)
`extract_cache_metrics(usage, api_mode)` — per-call cache extraction supporting both native Anthropic (`cache_read_input_tokens`, `cache_creation_input_tokens`) and OpenRouter (`prompt_tokens_details.cached_tokens`) response formats
`aggregate_cache_metrics(metrics_list)` — cross-turn aggregation

The v1 `apply_anthropic_cache_control` function and `_apply_cache_marker` helper are preserved unchanged for backward compatibility.

Integration in `run_agent.py`

New `_build_system_prompt_blocks()` method assembles the three tiered blocks and caches them on `self._cached_system_blocks`
The existing `_build_system_prompt()` method still returns a flat string (for backward compatibility with code paths that expect one) but now delegates to the block builder
Cached blocks are invalidated on context compression (`_cached_system_blocks = None` alongside `_cached_system_prompt = None`)
At API call time, when `_use_prompt_caching` is enabled and `_cached_system_blocks` is populated, a multi-block path builds `{role: system, content: [...]}` with cache_control markers already set per block
Plugin turn context (`_plugin_turn_context`) remains reserved for future system-level plugin instructions; plugin context from pre_llm_call hooks still goes into user messages (unchanged)
Fallback flat-string path handles non-caching models and pre-structured content correctly

Test coverage

`tests/agent/test_prompt_caching.py` — 46 unit tests covering v1 (preserved) and v2 functions: data structures, cache markers, content block conversion, pre-structured detection, breakpoint budgeting, metrics extraction and aggregation
`tests/agent/test_prompt_caching_v2.py` — 38 additional integration tests for v2 behavior (tool caching interaction with system blocks, budget with pre-structured content, backward compatibility with v1 code paths)
`tests/test_prompt_caching_integration.py` — 10 integration tests against `run_agent.py` block assembly (three-block structure, tier TTLs, timestamp in ephemeral block only, cache invalidation, backward-compat string return, non-caching models unaffected)

Verified: 317 tests passing (all of the above plus `tests/test_run_agent.py` regression suite).

Test plan

All new v2 unit tests pass (`pytest tests/agent/test_prompt_caching.py tests/agent/test_prompt_caching_v2.py`)
Integration tests against `run_agent.py` block assembly pass (`pytest tests/test_prompt_caching_integration.py`)
Full run_agent.py regression suite passes (`pytest tests/test_run_agent.py`)
`run_agent` imports cleanly
Manual: verify cache hit rate improves on a multi-turn conversation with stable context files (reviewer action)
Manual: verify non-caching models (e.g. local Ollama) still work via flat-string fallback (reviewer action)

Platforms tested

Linux (WSL2, Ubuntu 22.04), Python 3.11

🤖 Generated with Claude Code

Changed files

agent/prompt_caching.py (modified, +252/-6)
run_agent.py (modified, +169/-83)
tests/agent/test_prompt_caching.py (modified, +316/-0)
tests/agent/test_prompt_caching_v2.py (added, +469/-0)
tests/test_prompt_caching_integration.py (added, +209/-0)

PR #10006: fix: codex_responses prompt caching — session routing headers + cache_write_tokens field

Repository: NousResearch/hermes-agent
Author: zicochaos
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/10006

Description (problem / solution / changelog)

Problem

Prompt caching does not work when using codex_responses API mode with OpenAI-compatible providers (e.g. theclawbay). Every request is a cache miss despite prompt_cache_key being set in the request body.

Root cause

Two issues:

1. Missing session routing headers (`run_agent.py`)

The OpenAI client is initialized without session_id or x-client-request-id headers for codex providers. These headers are required for server-side cache routing — they tell the backend to route requests to the same server that holds the cached prompt prefix.

The official Codex CLI sends these unconditionally. Hermes sets default_headers for OpenRouter, GitHub Copilot, Kimi, and Qwen — but never for Codex/theclawbay.

2. Wrong field name for cache_write_tokens (`agent/usage_pricing.py`)

The codex_responses branch reads cache_creation_tokens (Anthropic naming convention) instead of cache_write_tokens (OpenAI Responses API naming). This means cache write tokens are always reported as 0.

Fix

Patch 1: Session routing headers

After session_id is assigned during __init__, inject session_id and x-client-request-id into default_headers for codex_responses mode. Also applied in _apply_client_headers_for_base_url() so headers survive /model switches.

Patch 2: cache_write_tokens field

Read cache_write_tokens first (OpenAI naming), fall back to cache_creation_tokens for backward compatibility.

Tests

test_codex_responses_reads_cache_write_tokens_field — verifies correct field is read
test_codex_responses_falls_back_to_cache_creation_tokens — backward compat
test_codex_responses_injects_session_routing_headers — verifies headers are set

Affected files

File	Change
`run_agent.py`	Inject session routing headers for `codex_responses` mode (+22 lines)
`agent/usage_pricing.py`	Read `cache_write_tokens` before `cache_creation_tokens` (+3/-1 lines)
`tests/agent/test_usage_pricing.py`	2 new tests
`tests/run_agent/test_run_agent.py`	1 new test

Changed files

agent/usage_pricing.py (modified, +3/-1)
run_agent.py (modified, +22/-0)
tests/agent/test_usage_pricing.py (modified, +36/-0)
tests/run_agent/test_run_agent.py (modified, +19/-0)

PR #17736: fix: dedup context engine tools + fix kimi aux deadcode model name

Repository: NousResearch/hermes-agent
Author: wjameswen888
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/17736

Description (problem / solution / changelog)

What does this PR do?

Two small bugfixes from local-mods:

Dedup context engine tools — lcm_grep/lcm_describe/lcm_expand were injected without checking valid_tool_names. On session restore or context-engine re-init, get_tool_schemas() returns the same tools → duplicate function names → 400 from DeepSeek/Moonshot.
Fix Kimi aux deadcode — _API_KEY_PROVIDER_AUX_MODELS["kimi-coding-cn"] pointed to nonexistent model kimi-k2-turbo-preview. Changed to kimi-k2.6.

Related Issue

Fixes #

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

How to Test

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform:

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

This skill is broadly useful to most users (if bundled) — see Contributing Guide
SKILL.md follows the standard format (frontmatter, trigger conditions, steps, pitfalls)
No external dependencies that aren't already available (prefer stdlib, curl, existing Hermes tools)
I've tested the skill end-to-end: hermes --toolsets skills -q "Use the X skill to do Y"

Screenshots / Logs

Changed files

agent/auxiliary_client.py (modified, +1/-1)
run_agent.py (modified, +5/-1)

Code Example

for _schema in self.context_compressor.get_tool_schemas():
    _wrapped = {"type": "function", "function": _schema}
    self.tools.append(_wrapped)  # mutates shared cached list

RAW_BUFFERClick to expand / collapse

Symptoms

Gateway (Feishu, etc.) returns HTTP 400 on API calls with errors like:

DeepSeek: Error from provider (DeepSeek): Tool names must be unique.
Xiaomi MiMo: tools contains duplicate names: lcm_expand
Moonshot/Kimi: function name lcm_grep is duplicated

TUI sessions are unaffected.

Environment:

Hermes v0.11.0 (commit adef1f33a / cd7150a19 with #17098 merged)
context.engine: lcm, hermes-lcm plugin enabled
Multiple models triggered the same error (mimo-v2.5-pro, deepseek-v4-flash, kimi-k2.6, etc.) — indicating the bug is in tool schema construction, not model-specific

Root Cause

Later, run_agent.py line 1986-1993 appends LCM context engine tool schemas to self.tools without checking for duplicates:

for _schema in self.context_compressor.get_tool_schemas():
    _wrapped = {"type": "function", "function": _schema}
    self.tools.append(_wrapped)  # mutates shared cached list

Note: memory tools injection (lines 1728-1748) already has dedup logic — context engine tools injection does NOT.

Why TUI is fine: TUI either uses quiet_mode=False (no caching) or short-lived processes.

Reproduction

Verified with a local script. Clear cache, call get_tool_definitions(quiet_mode=True), mutate the returned list, call again — the mutation is visible in the cache. In a long-lived Gateway process, each agent init appends LCM tools again, causing accumulation.

Suggested Fix

Two places to harden:

model_tools.py: On the first non-cached call, return a shallow copy and cache the original — same defensive pattern as the cache-hit path (line 278).
run_agent.py: Add dedup logic to context engine tool injection (lines 1986-1993), mirroring the memory tools dedup (lines 1728-1748).

Either fix alone stops the symptom; both together provide defense-in-depth.

Workaround

Restarting the Gateway temporarily clears the in-process cache, but the bug will recur as agents accumulate LCM tool appends.

extent analysis

TL;DR

To fix the issue, modify model_tools.py to return a shallow copy of the tool definitions on the first non-cached call and add deduplication logic to context engine tool injection in run_agent.py.

Guidance

Modify model_tools.py to return a shallow copy of the tool definitions on the first non-cached call to prevent cache mutation.
Add deduplication logic to context engine tool injection in run_agent.py to prevent duplicate tool names.
Verify the fix by checking for duplicate tool names in the Gateway logs after applying the changes.
Consider implementing both suggested fixes to provide defense-in-depth against similar issues.

Example

# model_tools.py
def get_tool_definitions(quiet_mode):
    # ...
    if not cached:
        result = [...]  # calculate tool definitions
        cached = result.copy()  # cache a copy of the result
        return result.copy()  # return a copy of the result

# run_agent.py
def inject_context_engine_tools(self):
    # ...
    for _schema in self.context_compressor.get_tool_schemas():
        _wrapped = {"type": "function", "function": _schema}
        if _wrapped not in self.tools:  # dedup logic
            self.tools.append(_wrapped)

Notes

The suggested fixes assume that the issue is caused by the caching logic in model_tools.py and the lack of deduplication logic in run_agent.py. If the issue persists after applying these fixes, further investigation may be necessary.

Recommendation

Apply the suggested fixes to model_tools.py and run_agent.py to prevent duplicate tool names and provide defense-in-depth against similar issues. This approach addresses the root cause of the issue and provides a more robust solution than a temporary workaround.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix [Bug]: get_tool_definitions() quiet-mode cache pollution causes duplicate LCM tool schemas in Gateway [4 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

PR fix notes

PR #17337: fix(tools): isolate get_tool_definitions quiet_mode cache + dedup LCM injection (#17335)

Description (problem / solution / changelog)

Problem

Root Cause (two layered bugs)

Fix (defense in depth, exactly as the issue suggested)

Tests

Changed files

PR #5713: feat(caching): multi-block system prompt with tiered TTLs (v2)

Description (problem / solution / changelog)

Summary

Architecture

New public API in `agent/prompt_caching.py`

Integration in `run_agent.py`

Test coverage

Test plan

Platforms tested

Changed files

PR #10006: fix: codex_responses prompt caching — session routing headers + cache_write_tokens field

Description (problem / solution / changelog)

Problem

Root cause

1. Missing session routing headers (run_agent.py)

2. Wrong field name for cache_write_tokens (agent/usage_pricing.py)

Fix

Patch 1: Session routing headers

Patch 2: cache_write_tokens field

Tests

Affected files

Changed files

PR #17736: fix: dedup context engine tools + fix kimi aux deadcode model name

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

For New Skills

Screenshots / Logs

Changed files

Code Example

Symptoms

Root Cause

Reproduction

Suggested Fix

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. Missing session routing headers (`run_agent.py`)

2. Wrong field name for cache_write_tokens (`agent/usage_pricing.py`)