hermes - 💡(How to fix) Fix [Bug]: Kimi K2 native tool-call tokens leak as content (chat_completions transport) — observed via OpenRouter with sparse toolsets

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Captured from a reproducing profile (kyoto, K2.6 via OpenRouter, ran the bisection cron jobs documented in Additional Logs / Traceback below). Pastes auto-delete 6 h after upload, so submit promptly.

Report https://paste.rs/xOfi9 agent.log https://paste.rs/l0qoK gateway.log https://paste.rs/iqJgC

The cron output/<job_id>/<timestamp>.md file is the most directly diagnostic single artifact — its ## Response section contains the leaked tokens verbatim. Example output is reproduced in the Actual Behavior section above.

Root Cause

Root Cause Analysis (optional)

Fix Action

Fix / Workaround

These are delivered to the configured destination (Discord/Telegram/etc.) verbatim. No tool call is dispatched. finish_reason="stop" is reported.

  • If tool_calls is empty/None, and content contains a K2 section-begin marker (<|tool_calls_section_begin|> or <|tool_call_section_begin|>), parse the content with KimiK2ToolCallParser.
  • If the parser returns calls, surface them as ToolCall entries, strip the tokens from content, and promote finish_reason from "stop" to "tool_calls" so the agent loop dispatches.
  • Detection is by marker presence in content, not by model name — robust against any upstream that pass-throughs the tokens regardless of model-family detection.

Workaround for affected users

Code Example

hermes cron create "every 2m" \
     "Use the web_search tool to search for 'hermes nousresearch'. Respond with only the top URL." \
     --toolsets web,search \
     --deliver local

---

<|tool_calls_section_begin|><|tool_call_begin|>functions.web_search:0<|tool_call_argument_begin|>{"query": "hermes nousresearch"}<|tool_call_end|><|tool_calls_section_end|>

---

Captured from a reproducing profile (`kyoto`, K2.6 via OpenRouter, ran the bisection cron jobs documented in *Additional Logs / Traceback* below). **Pastes auto-delete 6 h after upload, so submit promptly.**


Report       https://paste.rs/xOfi9
agent.log    https://paste.rs/l0qoK
gateway.log  https://paste.rs/iqJgC


The cron `output/<job_id>/<timestamp>.md` file is the most directly diagnostic single artifact — its `## Response` section contains the leaked tokens verbatim. Example output is reproduced in the *Actual Behavior* section above.

---

Bisection on a clean profile (`moonshotai/kimi-k2.6` via OpenRouter, identical prompt, only varying `enabled_toolsets`):

| Toolsets | Effective tools | Outcome |
|---|---|---|
| `["web","search"]` | `web_search`, `web_extract` | Leaks K2 tokens |
| `["search"]`       | `web_search`                | Leaks K2 tokens |
| `["web"]`          | `web_search`, `web_extract` | Other failure: model emits natural-language stub, no tool call |
| `["web","todo"]`   | `web_search`, `web_extract`, `todo` | Other failure: model hallucinates "I do not have access to a web_search tool" |
| `["web","terminal","file","todo","memory"]` | ~10 tools | Works (structured `tool_calls`) |
| `null` (full default) | ~30 tools | Works (structured `tool_calls`) |

A Claude (`anthropic/claude-sonnet-4-6`) control with the same failing `["web","search"]` toolset works. The bug is K2-specific.

(The natural-language stub and hallucinated-tool-name cases are separate K2-side issues with sparse toolsets, out of scope for this bug — this one targets only the *leakage* case.)

---

import json, pathlib
p = pathlib.Path("~/.hermes/cron/jobs.json").expanduser()
d = json.loads(p.read_text())
for j in d["jobs"]:
    ts = set(j.get("enabled_toolsets") or [])
    if ts and ts.issubset({"web", "search"}):
        j["enabled_toolsets"] = sorted(ts | {"terminal", "file", "todo", "memory"})
p.write_text(json.dumps(d, indent=2))
RAW_BUFFERClick to expand / collapse

Bug Description

ChatCompletionsTransport.normalize_response (in agent/transports/chat_completions.py) extracts tool calls only from the structured message.tool_calls field. When the upstream returns Kimi K2's native tool-call format — <|tool_calls_section_begin|><|tool_call_begin|>functions.<name>:<idx><|tool_call_argument_begin|>{...}<|tool_call_end|><|tool_calls_section_end|> — as plain assistant content (no tool_calls populated), the transport returns the tokens to the agent loop verbatim. The requested tool call never executes; the leaked tokens are delivered to the user as if they were a final answer.

The repo already ships a parser for this exact format — environments/tool_call_parsers/kimi_k2_parser.py (adapted from VLLM's KimiK2ToolParser) — but it's only registered in the RL environments/ registry. It is not consulted by the agent runtime.

Observed reliably with moonshotai/kimi-k2.6 via OpenRouter when the exposed toolset is sparse (1–2 tools). In a multi-tenant deployment the most visible failure mode is cron jobs created with enabled_toolsets=["web","search"] (introduced by #21538) — these expose only web_search/web_extract, K2 falls into the inline-tokens emission path, and the Discord/Telegram/etc. delivery posts the raw tokens.

Steps to Reproduce

  1. Profile config: model.default: moonshotai/kimi-k2.6, model.provider: openrouter.
  2. Create a cron job that exposes only the web and search toolsets:
    hermes cron create "every 2m" \
      "Use the web_search tool to search for 'hermes nousresearch'. Respond with only the top URL." \
      --toolsets web,search \
      --deliver local
  3. Wait for the next tick; inspect ~/.hermes/cron/output/<job_id>/<timestamp>.md.

Expected Behavior

The agent calls web_search, receives results, and the final response is the URL — same as when the job is run with the full default toolset.

Actual Behavior

The output file's ## Response section contains the literal K2 tokens, e.g.:

<|tool_calls_section_begin|><|tool_call_begin|>functions.web_search:0<|tool_call_argument_begin|>{"query": "hermes nousresearch"}<|tool_call_end|><|tool_calls_section_end|>

These are delivered to the configured destination (Discord/Telegram/etc.) verbatim. No tool call is dispatched. finish_reason="stop" is reported.

Affected Component

Agent Core (conversation loop, context compression, memory), Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform (if gateway-related)

Discord

Debug Report

Captured from a reproducing profile (`kyoto`, K2.6 via OpenRouter, ran the bisection cron jobs documented in *Additional Logs / Traceback* below). **Pastes auto-delete 6 h after upload, so submit promptly.**


Report       https://paste.rs/xOfi9
agent.log    https://paste.rs/l0qoK
gateway.log  https://paste.rs/iqJgC


The cron `output/<job_id>/<timestamp>.md` file is the most directly diagnostic single artifact — its `## Response` section contains the leaked tokens verbatim. Example output is reproduced in the *Actual Behavior* section above.

Operating System

Debian 13 (Linux 6.12), running inside the locally-built hermes-agent:local Docker image (debian:13.4 base).

Python Version

3.13 (container default — ghcr.io/astral-sh/uv:0.11.6-python3.13-trixie base for the build stage)

Hermes Version

origin/main at dd0923bb8 (docs: remove public advisory page, 2026-05-12) — current main at time of report.

Additional Logs / Traceback (optional)

Bisection on a clean profile (`moonshotai/kimi-k2.6` via OpenRouter, identical prompt, only varying `enabled_toolsets`):

| Toolsets | Effective tools | Outcome |
|---|---|---|
| `["web","search"]` | `web_search`, `web_extract` | Leaks K2 tokens |
| `["search"]`       | `web_search`                | Leaks K2 tokens |
| `["web"]`          | `web_search`, `web_extract` | Other failure: model emits natural-language stub, no tool call |
| `["web","todo"]`   | `web_search`, `web_extract`, `todo` | Other failure: model hallucinates "I do not have access to a web_search tool" |
| `["web","terminal","file","todo","memory"]` | ~10 tools | Works (structured `tool_calls`) |
| `null` (full default) | ~30 tools | Works (structured `tool_calls`) |

A Claude (`anthropic/claude-sonnet-4-6`) control with the same failing `["web","search"]` toolset works. The bug is K2-specific.

(The natural-language stub and hallucinated-tool-name cases are separate K2-side issues with sparse toolsets, out of scope for this bug — this one targets only the *leakage* case.)

Root Cause Analysis (optional)

agent/transports/chat_completions.py:509normalize_response extracts tool calls from msg.tool_calls (lines 522–548) but has no fallback for native tool-call text formats. When upstream returns K2 tokens in msg.content and leaves msg.tool_calls empty, the tokens reach NormalizedResponse.content unchanged and the agent loop treats them as a final answer.

_is_kimi provider detection in run_agent.py:8315 is base-URL-only (api.kimi.com / moonshot.ai / moonshot.cn), so K2 routed via OpenRouter never trips the Kimi branch. The companion _model_name_is_kimi_family in agent/anthropic_adapter.py:383 does name-based detection but lives on the Anthropic-Messages path and isn't consulted for chat-completions.

The K2 text parser at environments/tool_call_parsers/kimi_k2_parser.py returns (content_before_tokens, [ChatCompletionMessageToolCall]) and is the right tool — just not wired in.

Proposed Fix (optional)

Proposed Fix

In ChatCompletionsTransport.normalize_response, after the existing msg.tool_calls extraction:

  • If tool_calls is empty/None, and content contains a K2 section-begin marker (<|tool_calls_section_begin|> or <|tool_call_section_begin|>), parse the content with KimiK2ToolCallParser.
  • If the parser returns calls, surface them as ToolCall entries, strip the tokens from content, and promote finish_reason from "stop" to "tool_calls" so the agent loop dispatches.
  • Detection is by marker presence in content, not by model name — robust against any upstream that pass-throughs the tokens regardless of model-family detection.

No detection by model name means no risk of missing the bug when a new K2 endpoint appears, and zero false-positive risk on non-K2 models (they don't emit those exact tokens).

Workaround for affected users

Until the patch lands, set enabled_toolsets on the cron job to expose a richer set of tools — ["web","terminal","file","todo","memory"] (or more) is sufficient to push K2 out of the inline-tokens emission mode in our testing. Editing the per-profile cron jobs file directly:

import json, pathlib
p = pathlib.Path("~/.hermes/cron/jobs.json").expanduser()
d = json.loads(p.read_text())
for j in d["jobs"]:
    ts = set(j.get("enabled_toolsets") or [])
    if ts and ts.issubset({"web", "search"}):
        j["enabled_toolsets"] = sorted(ts | {"terminal", "file", "todo", "memory"})
p.write_text(json.dumps(d, indent=2))

Then restart the gateway. Alternatively, clear enabled_toolsets entirely (null) to let cron use the full default toolset — also reliable, at the cost of a larger system prompt.

Either workaround masks the underlying transport gap; the patch below removes the gap structurally.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug]: Kimi K2 native tool-call tokens leak as content (chat_completions transport) — observed via OpenRouter with sparse toolsets