hermes - ✅(Solved) Fix Kimi K2.5 via aggregators (synthetic.new, OpenRouter) gets no max_tokens/reasoning_effort → empty response [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18742Fetched 2026-05-03 04:54:33
View on GitHub
Comments
0
Participants
1
Timeline
7
Reactions
0
Author
Participants
Timeline (top)
labeled ×4cross-referenced ×2referenced ×1

Error Message

--- a/run_agent.py +++ b/run_agent.py @@ -8307,11 +8307,21 @@ class Agent: _is_nous = "nousresearch" in self._base_url_lower _is_nvidia = "integrate.api.nvidia.com" in self._base_url_lower

  •    # Detect Kimi models routed through aggregators (synthetic.new,
  •    # OpenRouter, Together, ...).  Without this branch, those routes
  •    # miss the Kimi-specific max_tokens=32000 default and the
  •    # reasoning_effort=medium hint, leaving the model with whatever
  •    # tiny output budget the aggregator defaults to and free-running
  •    # thinking mode that swallows the entire budget — visible response
  •    # ends up empty.
  •    try:
  •        from agent.moonshot_schema import is_moonshot_model as _is_moonshot
  •    except Exception:  # pragma: no cover — optional helper
  •        _is_moonshot = lambda _m: False  # noqa: E731
       _is_kimi = (
           base_url_host_matches(self.base_url, "api.kimi.com")
           or base_url_host_matches(self.base_url, "moonshot.ai")
           or base_url_host_matches(self.base_url, "moonshot.cn")
  •        or _is_moonshot(self.model)
       )

Root Cause

_is_kimi (run_agent.py:8309 in current main) is base-URL-only:

_is_kimi = (
    base_url_host_matches(self.base_url, "api.kimi.com")
    or base_url_host_matches(self.base_url, "moonshot.ai")
    or base_url_host_matches(self.base_url, "moonshot.cn")
)

It misses every aggregator that routes to Moonshot inference. The downstream effect is in agent/transports/chat_completions.py:240–280:

  • The Kimi-specific max_tokens=32000 default (line ~259) isn't applied → falls through to "send no max_tokens" → aggregator picks a small server default (synthetic.new behaves as if it's ~4K).
  • The Kimi-specific reasoning_effort=medium hint (line ~272) isn't sent → Kimi K2.5 runs with its built-in default thinking effort.
  • Kimi K2.5 has a thinking/reasoning mode that pre-spends output tokens on hidden reasoning before producing visible text. With a small max_tokens cap and no effort hint, it spends the entire budget on reasoning and emits zero visible tokens.
  • finish_reason: length with empty content → hermes retries the API call with continuation → same outcome → "Truncated tool call response detected again — refusing to execute" → empty response.

The interesting thing is is_moonshot_model(self.model) already exists in agent/moonshot_schema.py:171 and correctly recognises model-name-based slugs (hf:moonshotai/Kimi-K2.5, nous/moonshotai/kimi-k2.5, …). It's even used immediately above the is_kimi check in chat_completions.py to sanitize tools. But the runtime detection path doesn't reuse it.

Fix Action

Fixed

PR fix notes

PR #17246: fix: resolve 7 identified issues [automated]

Description (problem / solution / changelog)

Summary

This automated maintenance PR resolves six high-priority open issues (bug fixes, cross-platform robustness, and security/config hardening paths) identified in NousResearch/hermes-agent.

Note: The job target was 7 issues. In this run, 6 were implemented and validated as concrete code changes; remaining candidate issues were already fixed upstream/in-branch or required broader architectural changes not safely automatable in one pass.

Issues resolved

  1. #18757 - resolve_api_key_provider_credentials() misses ~/.hermes/.env for base_url_env_var

    • Replaced os.getenv(...) with get_env_value(...) in API-key provider credential resolution.
    • Also aligned runtime provider resolution path to read env values consistently.
  2. #18705 - load_hermes_dotenv() overrides runtime env vars (override=True)

    • Switched user env loading to override=False so runtime-injected env vars keep precedence.
    • Updated function docstring behavior notes accordingly.
  3. #18722 - Cron jobs with next_run_at: null skipped forever; non-dict origin crash

    • Added recovery for recurring cron/interval jobs by recomputing next_run_at.
    • Hardened _resolve_origin() to tolerate non-dict origin payloads.
  4. #18742 - Kimi/Moonshot via aggregators misses reasoning-mode detection

    • _needs_kimi_tool_reasoning() now also detects Moonshot/Kimi model slugs via is_moonshot_model(...).
  5. #18744 - constraints_path dead config (not loaded)

    • Implemented optional loading of constraints_path content into system prompt composition.
  6. #18778 - Gateway scoped lock stale detection no-op on macOS/Windows

    • Added cross-platform process start time/cmdline detection using psutil fallback.
    • Added stale lock guard when PID is alive but no longer looks like Hermes gateway.

Files modified

  • hermes_cli/auth.py
  • hermes_cli/runtime_provider.py
  • hermes_cli/env_loader.py
  • cron/jobs.py
  • cron/scheduler.py
  • run_agent.py
  • gateway/status.py

Commit list

  • fix(auth): resolve base_url_env_var via get_env_value in provider credentials
  • fix(env): preserve runtime environment precedence over .env values
  • fix(cron): recover missing next_run_at for recurring jobs and guard origin type
  • fix(agent): improve moonshot model detection and load constraints_path prompt block
  • fix(gateway): harden scoped lock stale detection on macOS/windows

Changed files

  • Dockerfile (modified, +3/-2)
  • acp_adapter/session.py (modified, +12/-0)
  • agent/auxiliary_client.py (modified, +280/-28)
  • agent/context_compressor.py (modified, +496/-52)
  • agent/title_generator.py (modified, +2/-2)
  • agent/transports/chat_completions.py (modified, +14/-0)
  • agent/usage_pricing.py (modified, +4/-0)
  • cli-config.yaml.example (modified, +5/-0)
  • cli.py (modified, +27/-3)
  • cron/jobs.py (modified, +10/-2)
  • cron/scheduler.py (modified, +14/-4)
  • docker/entrypoint.sh (modified, +9/-1)
  • gateway/channel_directory.py (modified, +14/-4)
  • gateway/platforms/discord.py (modified, +33/-7)
  • gateway/platforms/email.py (modified, +12/-2)
  • gateway/platforms/feishu.py (modified, +34/-1)
  • gateway/platforms/qqbot/adapter.py (modified, +8/-2)
  • gateway/platforms/telegram_network.py (modified, +7/-2)
  • gateway/platforms/weixin.py (modified, +10/-1)
  • gateway/run.py (modified, +129/-32)
  • gateway/status.py (modified, +37/-2)
  • hermes_cli/auth.py (modified, +4/-4)
  • hermes_cli/commands.py (modified, +1/-1)
  • hermes_cli/config.py (modified, +271/-40)
  • hermes_cli/copilot_auth.py (modified, +1/-1)
  • hermes_cli/doctor.py (modified, +6/-1)
  • hermes_cli/env_loader.py (modified, +5/-4)
  • hermes_cli/gateway.py (modified, +16/-13)
  • hermes_cli/main.py (modified, +69/-3)
  • hermes_cli/memory_setup.py (modified, +1/-1)
  • hermes_cli/model_switch.py (modified, +6/-1)
  • hermes_cli/models.py (modified, +60/-2)
  • hermes_cli/profiles.py (modified, +16/-3)
  • hermes_cli/runtime_provider.py (modified, +17/-14)
  • hermes_cli/setup.py (modified, +8/-2)
  • hermes_cli/slack_cli.py (modified, +1/-2)
  • hermes_cli/status.py (modified, +17/-2)
  • hermes_cli/web_server.py (modified, +1/-1)
  • hermes_constants.py (modified, +16/-3)
  • model_tools.py (modified, +44/-13)
  • run_agent.py (modified, +413/-82)
  • setup-hermes.sh (modified, +23/-12)
  • skills/red-teaming/godmode/scripts/load_godmode.py (modified, +9/-8)
  • tests/agent/test_context_compressor.py (modified, +389/-0)
  • tests/agent/transports/test_chat_completions.py (modified, +11/-0)
  • tests/gateway/test_compress_command.py (modified, +49/-0)
  • tests/hermes_cli/test_api_key_providers.py (modified, +5/-5)
  • tests/hermes_cli/test_config.py (modified, +17/-0)
  • tests/run_agent/test_413_compression.py (modified, +81/-1)
  • tests/run_agent/test_compression_boundary_hook.py (modified, +42/-0)
  • tests/run_agent/test_run_agent.py (modified, +100/-13)
  • tests/tools/test_skill_manager_tool.py (modified, +270/-0)
  • tools/approval.py (modified, +1/-1)
  • tools/delegate_tool.py (modified, +4/-1)
  • tools/environments/docker.py (modified, +36/-5)
  • tools/environments/local.py (modified, +8/-1)
  • tools/file_operations.py (modified, +70/-67)
  • tools/file_tools.py (modified, +13/-2)
  • tools/send_message_tool.py (modified, +72/-2)
  • tools/session_search_tool.py (modified, +2/-2)
  • tools/skill_manager_tool.py (modified, +82/-21)
  • tools/skills_tool.py (modified, +13/-1)
  • tools/terminal_tool.py (modified, +6/-0)
  • tools/tool_backend_helpers.py (modified, +15/-5)
  • tools/tts_tool.py (modified, +27/-16)
  • tools/voice_mode.py (modified, +23/-10)
  • toolsets.py (modified, +14/-1)
  • tui_gateway/server.py (modified, +5/-3)
  • ui-tui/src/app/turnController.ts (modified, +1/-1)
  • ui-tui/src/app/useInputHandlers.ts (modified, +8/-3)
  • ui-tui/src/app/useSessionLifecycle.ts (modified, +1/-1)
  • ui-tui/src/gatewayTypes.ts (modified, +1/-0)
  • utils.py (modified, +9/-0)
  • uv.lock (modified, +161/-2)
  • website/docs/reference/environment-variables.md (modified, +1/-1)

PR #19009: fix(agent): detect Kimi runtime by model slug for aggregator routes

Description (problem / solution / changelog)

What does this PR do?

AIAgent._build_api_kwargs flagged Kimi runtime by base-URL only (api.kimi.com, moonshot.ai, moonshot.cn). Aggregator routes (synthetic.new, OpenRouter, Together, …) keep the aggregator's base URL but still serve Moonshot inference for hf:moonshotai/Kimi-K2.x, moonshotai/kimi-k2.x, nous/moonshotai/kimi-k2-thinking slugs, so the Kimi-specific max_tokens floor and reasoning_effort hint were silently skipped. Kimi K2.x in thinking mode then burned the entire token budget on hidden reasoning and emitted an empty visible response, surfacing as repeated finish_reason='length' warnings and final Agent completed but produced empty response from cron.

Extract the inline check into _is_kimi_runtime() (mirrors the existing _needs_kimi_tool_reasoning() helper next door) and add is_moonshot_model(self.model) as a fourth match. The slug helper already exists in agent/moonshot_schema.py and is unit-tested for all known aggregator prefixes — this just wires it into the runtime detection that gates Kimi defaults.

_needs_kimi_tool_reasoning() is intentionally NOT widened: it guards a signature-replay quirk on the Anthropic-shaped Kimi/Moonshot routes only, and its existing aggregator-host=False test (test_non_kimi_provider) is preserved.

Related Issue

Fixes #18742

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • run_agent.py — import is_moonshot_model; replace inline _is_kimi = (...) with call to new _is_kimi_runtime() helper; add the helper itself parallel to _needs_kimi_tool_reasoning() with a docstring spelling out why the two helpers have different aggregator semantics.
  • tests/run_agent/test_deepseek_reasoning_content_echo.py — new TestIsKimiRuntime class parametrised over direct hosts AND aggregator routes (synthetic.new, OpenRouter, Nous, Together) plus negative cases (OpenAI, Anthropic, DeepSeek, Qwen, empty).

How to Test

  1. Configure model.default: hf:moonshotai/Kimi-K2.5 with a synthetic.new (or OpenRouter / Together) credential pointing at the aggregator's base URL.
  2. Send a long prompt with multiple tool calls.
  3. Before this PR: repeated Response truncated (finish_reason='length') warnings → empty final response. After: visible content returns.
  4. `pytest tests/run_agent/test_deepseek_reasoning_content_echo.py -q` → 48/48 pass.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits
  • I searched for existing PRs
  • My PR contains only changes related to this fix
  • I've run pytest tests/ -q and the touched suite passes
  • I've added tests for my changes
  • I've tested on my platform: macOS 15.x

Documentation & Housekeeping

  • I've updated relevant documentation — N/A (internal helper)
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture — N/A
  • I've considered cross-platform impact — N/A (no platform-specific code)
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Changed files

  • run_agent.py (modified, +25/-5)
  • tests/run_agent/test_deepseek_reasoning_content_echo.py (modified, +44/-0)

Code Example

⚠️  Response truncated (finish_reason='length') - model hit max output tokens
⚠️  Truncated tool call detected — retrying API call...
⚠️  Response truncated (finish_reason='length') - model hit max output tokens
⚠️  Truncated tool call response detected again — refusing to execute incomplete tool arguments.

---

_is_kimi = (
    base_url_host_matches(self.base_url, "api.kimi.com")
    or base_url_host_matches(self.base_url, "moonshot.ai")
    or base_url_host_matches(self.base_url, "moonshot.cn")
)

---

--- a/run_agent.py
+++ b/run_agent.py
@@ -8307,11 +8307,21 @@ class Agent:
         _is_nous = "nousresearch" in self._base_url_lower
         _is_nvidia = "integrate.api.nvidia.com" in self._base_url_lower
+        # Detect Kimi models routed through aggregators (synthetic.new,
+        # OpenRouter, Together, ...).  Without this branch, those routes
+        # miss the Kimi-specific max_tokens=32000 default and the
+        # reasoning_effort=medium hint, leaving the model with whatever
+        # tiny output budget the aggregator defaults to and free-running
+        # thinking mode that swallows the entire budget — visible response
+        # ends up empty.
+        try:
+            from agent.moonshot_schema import is_moonshot_model as _is_moonshot
+        except Exception:  # pragma: no cover — optional helper
+            _is_moonshot = lambda _m: False  # noqa: E731
         _is_kimi = (
             base_url_host_matches(self.base_url, "api.kimi.com")
             or base_url_host_matches(self.base_url, "moonshot.ai")
             or base_url_host_matches(self.base_url, "moonshot.cn")
+            or _is_moonshot(self.model)
         )
RAW_BUFFERClick to expand / collapse

Kimi K2.5 via aggregators (synthetic.new, OpenRouter, Together…) silently truncates to empty response

Symptom

When model: hf:moonshotai/Kimi-K2.5 (or any Kimi slug) is configured to go through an aggregator base URL (e.g. https://api.synthetic.new/v1), long agentic prompts — especially cron jobs with multiple tool calls — consistently fail with:

⚠️  Response truncated (finish_reason='length') - model hit max output tokens
⚠️  Truncated tool call detected — retrying API call...
⚠️  Response truncated (finish_reason='length') - model hit max output tokens
⚠️  Truncated tool call response detected again — refusing to execute incomplete tool arguments.

Eventually the cron scheduler records last_status: error, last_error: "Agent completed but produced empty response (model error, timeout, or misconfiguration)". Direct LINE / Discord chat with the same model usually works because short prompts have enough budget to leak past the truncation, but anything longer than ~30 tool-bearing turns dies.

Root cause

_is_kimi (run_agent.py:8309 in current main) is base-URL-only:

_is_kimi = (
    base_url_host_matches(self.base_url, "api.kimi.com")
    or base_url_host_matches(self.base_url, "moonshot.ai")
    or base_url_host_matches(self.base_url, "moonshot.cn")
)

It misses every aggregator that routes to Moonshot inference. The downstream effect is in agent/transports/chat_completions.py:240–280:

  • The Kimi-specific max_tokens=32000 default (line ~259) isn't applied → falls through to "send no max_tokens" → aggregator picks a small server default (synthetic.new behaves as if it's ~4K).
  • The Kimi-specific reasoning_effort=medium hint (line ~272) isn't sent → Kimi K2.5 runs with its built-in default thinking effort.
  • Kimi K2.5 has a thinking/reasoning mode that pre-spends output tokens on hidden reasoning before producing visible text. With a small max_tokens cap and no effort hint, it spends the entire budget on reasoning and emits zero visible tokens.
  • finish_reason: length with empty content → hermes retries the API call with continuation → same outcome → "Truncated tool call response detected again — refusing to execute" → empty response.

The interesting thing is is_moonshot_model(self.model) already exists in agent/moonshot_schema.py:171 and correctly recognises model-name-based slugs (hf:moonshotai/Kimi-K2.5, nous/moonshotai/kimi-k2.5, …). It's even used immediately above the is_kimi check in chat_completions.py to sanitize tools. But the runtime detection path doesn't reuse it.

Repro

  1. Set model.default: hf:moonshotai/Kimi-K2.5 and a synthetic.new (or OpenRouter, Together) credential in ~/.hermes/auth.json.
  2. Create any cron job with kind: cron, deliver: discord, prompt requiring 3–5 tool calls (e.g. multiple curl + a python3 script).
  3. Wait for fire — every fire ends in last_status: error, last_error: "Agent completed but produced empty response". Container logs show repeated Response truncated (finish_reason='length') warnings with no visible content between them.

The same model works fine when configured against https://api.moonshot.ai/v1 directly because base URL matches and the Kimi defaults kick in.

Proposed fix

--- a/run_agent.py
+++ b/run_agent.py
@@ -8307,11 +8307,21 @@ class Agent:
         _is_nous = "nousresearch" in self._base_url_lower
         _is_nvidia = "integrate.api.nvidia.com" in self._base_url_lower
+        # Detect Kimi models routed through aggregators (synthetic.new,
+        # OpenRouter, Together, ...).  Without this branch, those routes
+        # miss the Kimi-specific max_tokens=32000 default and the
+        # reasoning_effort=medium hint, leaving the model with whatever
+        # tiny output budget the aggregator defaults to and free-running
+        # thinking mode that swallows the entire budget — visible response
+        # ends up empty.
+        try:
+            from agent.moonshot_schema import is_moonshot_model as _is_moonshot
+        except Exception:  # pragma: no cover — optional helper
+            _is_moonshot = lambda _m: False  # noqa: E731
         _is_kimi = (
             base_url_host_matches(self.base_url, "api.kimi.com")
             or base_url_host_matches(self.base_url, "moonshot.ai")
             or base_url_host_matches(self.base_url, "moonshot.cn")
+            or _is_moonshot(self.model)
         )

This makes _is_kimi consistent with the model-name-based detection already used elsewhere (is_moonshot_model in agent/moonshot_schema.py:171). Both max_tokens=32000 and reasoning_effort=medium then route correctly regardless of which aggregator the user goes through.

Related

This bug compounded with two other cron robustness issues filed separately (null next_run_at skip + non-dict origin AttributeError) — taken together they explain a class of "cron quietly does nothing" reports on aggregated Kimi deployments.

Environment

  • hermes-agent: upstream/main as of 2026-05-02
  • Model: hf:moonshotai/Kimi-K2.5
  • Provider: synthetic.new (https://api.synthetic.new/v1)
  • Python 3.14, croniter installed
  • Encountered on a chococlaw VPS Docker deployment (Linux Debian)

extent analysis

TL;DR

The proposed fix involves updating the _is_kimi detection in run_agent.py to include model-name-based detection using is_moonshot_model from agent/moonshot_schema.py, ensuring Kimi-specific defaults are applied even when routed through aggregators.

Guidance

  • Apply the proposed fix by modifying run_agent.py as shown in the diff to correctly detect Kimi models and apply the necessary defaults.
  • Verify that the fix works by running a cron job with a prompt requiring multiple tool calls and checking that the response is no longer empty.
  • Ensure that the is_moonshot_model function from agent/moonshot_schema.py is correctly imported and used in the updated _is_kimi detection.
  • Test the fix with different aggregators (e.g., synthetic.new, OpenRouter, Together) to confirm that it resolves the issue across various routes.

Example

The proposed fix includes a code snippet that demonstrates the updated _is_kimi detection:

_is_kimi = (
    base_url_host_matches(self.base_url, "api.kimi.com")
    or base_url_host_matches(self.base_url, "moonshot.ai")
    or base_url_host_matches(self.base_url, "moonshot.cn")
    or _is_moonshot(self.model)
)

This code uses the is_moonshot_model function to detect Kimi models based on their name, in addition to the existing base URL checks.

Notes

The fix assumes that the is_moonshot_model function is correctly implemented and available in agent/moonshot_schema.py. If this function is not available or does not work as expected, the fix may not be effective.

Recommendation

Apply the proposed workaround by updating run_agent.py with the modified _is_kimi detection, as this should resolve the issue with Kimi models routed through aggregators.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING