hermes - ✅(Solved) Fix Kimi K2.5 via aggregators (synthetic.new, OpenRouter) gets no max_tokens/reasoning_effort → empty response [2 pull requests, 1 participants]

liyoungc · 2026-05-02T08:36:46Z

[hermes] PR 17246: fix: resolve 7 identified issues automated - Repository: NousResearch/hermes-agent - Author: Sldark23 - State: open | merged: False - Link:… # PR #17246: fix: resolve 7 identified issues [automated] - Repository: NousResearch/hermes-agent - Author: Sldark23 - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/17246 ## Description (problem / solution / changelog) ## Summary This automated maintenance PR resolves six high-priority open issues (bug fixes, cross-platform robustness, and security/config hardening paths) identified in `NousResearch/hermes-agent`. > Note: The job target was 7 issues. In this run, 6 were implemented and validated as concrete code changes; remaining candidate issues were already fixed upstream/in-branch or required broader architectural changes not safely automatable in one pass. ## Issues resolved 1. **#18757** - `resolve_api_key_provider_credentials()` misses `~/.hermes/.env` for `base_url_env_var` - Replaced `os.getenv(...)` with `get_env_value(...)` in API-key provider credential resolution. - Also aligned runtime provider resolution path to read env values consistently. 2. **#18705** - `load_hermes_dotenv()` overrides runtime env vars (`override=True`) - Switched user env loading to `override=False` so runtime-injected env vars keep precedence. - Updated function docstring behavior notes accordingly. 3. **#18722** - Cron jobs with `next_run_at: null` skipped forever; non-dict `origin` crash - Added recovery for recurring `cron/interval` jobs by recomputing `next_run_at`. - Hardened `_resolve_origin()` to tolerate non-dict origin payloads. 4. **#18742** - Kimi/Moonshot via aggregators misses reasoning-mode detection - `_needs_kimi_tool_reasoning()` now also detects Moonshot/Kimi model slugs via `is_moonshot_model(...)`. 5. **#18744** - `constraints_path` dead config (not loaded) - Implemented optional loading of `constraints_path` content into system prompt composition. 6. **#18778** - Gateway scoped lock stale detection no-op on macOS/Windows - Added cross-platform process start time/cmdline detection using `psutil` fallback. - Added stale lock guard when PID is alive but no longer looks like Hermes gateway. ## Files modified - `hermes_cli/auth.py` - `hermes_cli/runtime_provider.py` - `hermes_cli/env_loader.py` - `cron/jobs.py` - `cron/scheduler.py` - `run_agent.py` - `gateway/status.py` ## Commit list - `fix(auth): resolve base_url_env_var via get_env_value in provider credentials` - `fix(env): preserve runtime environment precedence over .env values` - `fix(cron): recover missing next_run_at for recurring jobs and guard origin type` - `fix(agent): improve moonshot model detection and load constraints_path prompt block` - `fix(gateway): harden scoped lock stale detection on macOS/windows` ## Changed files - `Dockerfile` (modified, +3/-2) - `acp_adapter/session.py` (modified, +12/-0) - `agent/auxiliary_client.py` (modified, +280/-28) - `agent/context_compressor.py` (modified, +496/-52) - `agent/title_generator.py` (modified, +2/-2) - `agent/transports/chat_completions.py` (modified, +14/-0) - `agent/usage_pricing.py` (modified, +4/-0) - `cli-config.yaml.example` (modified, +5/-0) - `cli.py` (modified, +27/-3) - `cron/jobs.py` (modified, +10/-2) - `cron/scheduler.py` (modified, +14/-4) - `docker/entrypoint.sh` (modified, +9/-1) - `gateway/channel_directory.py` (modified, +14/-4) - `gateway/platforms/discord.py` (modified, +33/-7) - `gateway/platforms/email.py` (modified, +12/-2) - `gateway/platforms/feishu.py` (modified, +34/-1) - `gateway/platforms/qqbot/adapter.py` (modified, +8/-2) - `gateway/platforms/telegram_network.py` (modified, +7/-2) - `gateway/platforms/weixin.py` (modified, +10/-1) - `gateway/run.py` (modified, +129/-32) - `gateway/status.py` (modified, +37/-2) - `hermes_cli/auth.py` (modified, +4/-4) - `hermes_cli/commands.py` (modified, +1/-1) - `hermes_cli/config.py` (modified, +271/-40) - `hermes_cli/copilot_auth.py` (modified, +1/-1) - `hermes_cli/doctor.py` (modified, +6/-1) - `hermes_cli/env_loader.py` (modified, +5/-4) - `hermes_cli/gateway.py` (modified, +16/-13) - `hermes_cli/main.py` (modified, +69/-3) - `hermes_cli/memory_setup.py` (modified, +1/-1) - `hermes_cli/model_switch.py` (modified, +6/-1) - `hermes_cli/models.py` (modified, +60/-2) - `hermes_cli/profiles.py` (modified, +16/-3) - `hermes_cli/runtime_provider.py` (modified, +17/-14) - `hermes_cli/setup.py` (modified, +8/-2) - `hermes_cli/slack_cli.py` (modified, +1/-2) - `hermes_cli/status.py` (modified, +17/-2) - `hermes_cli/web_server.py` (modified, +1/-1) - `hermes_constants.py` (modified, +16/-3) - `model_tools.py` (modified, +44/-13) - `run_agent.py` (modified, +413/-82) - `setup-hermes.sh` (modified, +23/-12) - `skills/red-teaming/godmode/scripts/load_godmode.py` (modified, +9/-8) - `tests/agent/test_context_compressor.py` (modified, +389/-0) - `tests/agent/transports/test_chat_completions.py` (modified, +11/-0) - `tests/gateway/test_compr

hermes2026-05-02 08:36:46

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#18742•Fetched 2026-05-03 04:54:33

View on GitHub

Comments

Participants

Timeline

Reactions

Author

liyoungc

Participants

liyoungc

Timeline (top)

labeled ×4cross-referenced ×2referenced ×1

Error Message

--- a/run_agent.py +++ b/run_agent.py @@ -8307,11 +8307,21 @@ class Agent: _is_nous = "nousresearch" in self._base_url_lower _is_nvidia = "integrate.api.nvidia.com" in self._base_url_lower

   # Detect Kimi models routed through aggregators (synthetic.new,

   # OpenRouter, Together, ...).  Without this branch, those routes

   # miss the Kimi-specific max_tokens=32000 default and the

   # reasoning_effort=medium hint, leaving the model with whatever

   # tiny output budget the aggregator defaults to and free-running

   # thinking mode that swallows the entire budget — visible response

```
   # ends up empty.
```
```
   try:
```

       from agent.moonshot_schema import is_moonshot_model as _is_moonshot

   except Exception:  # pragma: no cover — optional helper

       _is_moonshot = lambda _m: False  # noqa: E731
   _is_kimi = (
       base_url_host_matches(self.base_url, "api.kimi.com")
       or base_url_host_matches(self.base_url, "moonshot.ai")
       or base_url_host_matches(self.base_url, "moonshot.cn")

```
       or _is_moonshot(self.model)
   )
```

Root Cause

_is_kimi (run_agent.py:8309 in current main) is base-URL-only:

_is_kimi = (
    base_url_host_matches(self.base_url, "api.kimi.com")
    or base_url_host_matches(self.base_url, "moonshot.ai")
    or base_url_host_matches(self.base_url, "moonshot.cn")
)

It misses every aggregator that routes to Moonshot inference. The downstream effect is in agent/transports/chat_completions.py:240–280:

The Kimi-specific max_tokens=32000 default (line ~259) isn't applied → falls through to "send no max_tokens" → aggregator picks a small server default (synthetic.new behaves as if it's ~4K).
The Kimi-specific reasoning_effort=medium hint (line ~272) isn't sent → Kimi K2.5 runs with its built-in default thinking effort.
Kimi K2.5 has a thinking/reasoning mode that pre-spends output tokens on hidden reasoning before producing visible text. With a small max_tokens cap and no effort hint, it spends the entire budget on reasoning and emits zero visible tokens.
finish_reason: length with empty content → hermes retries the API call with continuation → same outcome → "Truncated tool call response detected again — refusing to execute" → empty response.

The interesting thing is is_moonshot_model(self.model) already exists in agent/moonshot_schema.py:171 and correctly recognises model-name-based slugs (hf:moonshotai/Kimi-K2.5, nous/moonshotai/kimi-k2.5, …). It's even used immediately above the is_kimi check in chat_completions.py to sanitize tools. But the runtime detection path doesn't reuse it.

Code Example

⚠️  Response truncated (finish_reason='length') - model hit max output tokens
⚠️  Truncated tool call detected — retrying API call...
⚠️  Response truncated (finish_reason='length') - model hit max output tokens
⚠️  Truncated tool call response detected again — refusing to execute incomplete tool arguments.

---

_is_kimi = (
    base_url_host_matches(self.base_url, "api.kimi.com")
    or base_url_host_matches(self.base_url, "moonshot.ai")
    or base_url_host_matches(self.base_url, "moonshot.cn")
)

---

--- a/run_agent.py
+++ b/run_agent.py
@@ -8307,11 +8307,21 @@ class Agent:
         _is_nous = "nousresearch" in self._base_url_lower
         _is_nvidia = "integrate.api.nvidia.com" in self._base_url_lower
+        # Detect Kimi models routed through aggregators (synthetic.new,
+        # OpenRouter, Together, ...).  Without this branch, those routes
+        # miss the Kimi-specific max_tokens=32000 default and the
+        # reasoning_effort=medium hint, leaving the model with whatever
+        # tiny output budget the aggregator defaults to and free-running
+        # thinking mode that swallows the entire budget — visible response
+        # ends up empty.
+        try:
+            from agent.moonshot_schema import is_moonshot_model as _is_moonshot
+        except Exception:  # pragma: no cover — optional helper
+            _is_moonshot = lambda _m: False  # noqa: E731
         _is_kimi = (
             base_url_host_matches(self.base_url, "api.kimi.com")
             or base_url_host_matches(self.base_url, "moonshot.ai")
             or base_url_host_matches(self.base_url, "moonshot.cn")
+            or _is_moonshot(self.model)
         )

RAW_BUFFERClick to expand / collapse

Kimi K2.5 via aggregators (synthetic.new, OpenRouter, Together…) silently truncates to empty response

Symptom

When model: hf:moonshotai/Kimi-K2.5 (or any Kimi slug) is configured to go through an aggregator base URL (e.g. https://api.synthetic.new/v1), long agentic prompts — especially cron jobs with multiple tool calls — consistently fail with:

⚠️  Response truncated (finish_reason='length') - model hit max output tokens
⚠️  Truncated tool call detected — retrying API call...
⚠️  Response truncated (finish_reason='length') - model hit max output tokens
⚠️  Truncated tool call response detected again — refusing to execute incomplete tool arguments.

Eventually the cron scheduler records last_status: error, last_error: "Agent completed but produced empty response (model error, timeout, or misconfiguration)". Direct LINE / Discord chat with the same model usually works because short prompts have enough budget to leak past the truncation, but anything longer than ~30 tool-bearing turns dies.

Root cause

_is_kimi (run_agent.py:8309 in current main) is base-URL-only:

_is_kimi = (
    base_url_host_matches(self.base_url, "api.kimi.com")
    or base_url_host_matches(self.base_url, "moonshot.ai")
    or base_url_host_matches(self.base_url, "moonshot.cn")
)

It misses every aggregator that routes to Moonshot inference. The downstream effect is in agent/transports/chat_completions.py:240–280:

The Kimi-specific max_tokens=32000 default (line ~259) isn't applied → falls through to "send no max_tokens" → aggregator picks a small server default (synthetic.new behaves as if it's ~4K).
The Kimi-specific reasoning_effort=medium hint (line ~272) isn't sent → Kimi K2.5 runs with its built-in default thinking effort.
Kimi K2.5 has a thinking/reasoning mode that pre-spends output tokens on hidden reasoning before producing visible text. With a small max_tokens cap and no effort hint, it spends the entire budget on reasoning and emits zero visible tokens.
finish_reason: length with empty content → hermes retries the API call with continuation → same outcome → "Truncated tool call response detected again — refusing to execute" → empty response.

Repro

Set model.default: hf:moonshotai/Kimi-K2.5 and a synthetic.new (or OpenRouter, Together) credential in ~/.hermes/auth.json.
Create any cron job with kind: cron, deliver: discord, prompt requiring 3–5 tool calls (e.g. multiple curl + a python3 script).
Wait for fire — every fire ends in last_status: error, last_error: "Agent completed but produced empty response". Container logs show repeated Response truncated (finish_reason='length') warnings with no visible content between them.

The same model works fine when configured against https://api.moonshot.ai/v1 directly because base URL matches and the Kimi defaults kick in.

Proposed fix

--- a/run_agent.py
+++ b/run_agent.py
@@ -8307,11 +8307,21 @@ class Agent:
         _is_nous = "nousresearch" in self._base_url_lower
         _is_nvidia = "integrate.api.nvidia.com" in self._base_url_lower
+        # Detect Kimi models routed through aggregators (synthetic.new,
+        # OpenRouter, Together, ...).  Without this branch, those routes
+        # miss the Kimi-specific max_tokens=32000 default and the
+        # reasoning_effort=medium hint, leaving the model with whatever
+        # tiny output budget the aggregator defaults to and free-running
+        # thinking mode that swallows the entire budget — visible response
+        # ends up empty.
+        try:
+            from agent.moonshot_schema import is_moonshot_model as _is_moonshot
+        except Exception:  # pragma: no cover — optional helper
+            _is_moonshot = lambda _m: False  # noqa: E731
         _is_kimi = (
             base_url_host_matches(self.base_url, "api.kimi.com")
             or base_url_host_matches(self.base_url, "moonshot.ai")
             or base_url_host_matches(self.base_url, "moonshot.cn")
+            or _is_moonshot(self.model)
         )

This makes _is_kimi consistent with the model-name-based detection already used elsewhere (is_moonshot_model in agent/moonshot_schema.py:171). Both max_tokens=32000 and reasoning_effort=medium then route correctly regardless of which aggregator the user goes through.

This bug compounded with two other cron robustness issues filed separately (null next_run_at skip + non-dict origin AttributeError) — taken together they explain a class of "cron quietly does nothing" reports on aggregated Kimi deployments.

Environment

hermes-agent: upstream/main as of 2026-05-02
Model: hf:moonshotai/Kimi-K2.5
Provider: synthetic.new (https://api.synthetic.new/v1)
Python 3.14, croniter installed
Encountered on a chococlaw VPS Docker deployment (Linux Debian)

extent analysis

TL;DR

The proposed fix involves updating the _is_kimi detection in run_agent.py to include model-name-based detection using is_moonshot_model from agent/moonshot_schema.py, ensuring Kimi-specific defaults are applied even when routed through aggregators.

Guidance

Apply the proposed fix by modifying run_agent.py as shown in the diff to correctly detect Kimi models and apply the necessary defaults.
Verify that the fix works by running a cron job with a prompt requiring multiple tool calls and checking that the response is no longer empty.
Ensure that the is_moonshot_model function from agent/moonshot_schema.py is correctly imported and used in the updated _is_kimi detection.
Test the fix with different aggregators (e.g., synthetic.new, OpenRouter, Together) to confirm that it resolves the issue across various routes.

Example

The proposed fix includes a code snippet that demonstrates the updated _is_kimi detection:

_is_kimi = (
    base_url_host_matches(self.base_url, "api.kimi.com")
    or base_url_host_matches(self.base_url, "moonshot.ai")
    or base_url_host_matches(self.base_url, "moonshot.cn")
    or _is_moonshot(self.model)
)

This code uses the is_moonshot_model function to detect Kimi models based on their name, in addition to the existing base URL checks.

Notes

The fix assumes that the is_moonshot_model function is correctly implemented and available in agent/moonshot_schema.py. If this function is not available or does not work as expected, the fix may not be effective.

Recommendation

Apply the proposed workaround by updating run_agent.py with the modified _is_kimi detection, as this should resolve the issue with Kimi models routed through aggregators.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #installation #tensor shape #autograd error #model save/load

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Kimi K2.5 via aggregators (synthetic.new, OpenRouter) gets no max_tokens/reasoning_effort → empty response [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #17246: fix: resolve 7 identified issues [automated]

Description (problem / solution / changelog)

Summary

Issues resolved

Files modified

Commit list

Changed files

PR #19009: fix(agent): detect Kimi runtime by model slug for aggregator routes

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping