hermes - ✅(Solved) Fix Z.AI / GLM via `zai` provider never returns reasoning_content — Hermes sends `extra_body.reasoning` (OpenRouter-style) but Z.AI expects `extra_body.thinking={"type":"enabled"}` [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#16533Fetched 2026-04-28 06:52:45
View on GitHub
Comments
2
Participants
3
Timeline
8
Reactions
0
Timeline (top)
labeled ×4commented ×2cross-referenced ×1referenced ×1

Hermes' zai provider successfully calls Z.AI's GLM models (e.g. glm-5.1 via the coding plan endpoint https://open.bigmodel.cn/api/coding/paas/v4), but reasoning_content is always empty — even on questions that clearly trigger chain-of-thought when the same model is hit directly.

Root cause: Hermes signals "reasoning capable" for open.bigmodel.cn in _supports_reasoning_extra_body (run_agent.py:7501), which causes the chat- completions transport to inject:

extra_body["reasoning"] = {"enabled": True, "effort": "medium"}

But Z.AI's API doesn't recognize the OpenRouter-style reasoning field. It silently ignores it. Z.AI's documented thinking parameter is:

extra_body["thinking"] = {"type": "enabled"}

(same shape as the existing is_kimi branch in chat_completions.py:232-240).

Without thinking set, GLM-5.1 / GLM-4.6 / GLM-4.5 produce regular completions with no reasoning_content deltas in streaming responses, and no message.reasoning_content field in non-streaming.

Root Cause

Root cause: Hermes signals "reasoning capable" for open.bigmodel.cn in _supports_reasoning_extra_body (run_agent.py:7501), which causes the chat- completions transport to inject:

Fix Action

Fixed

PR fix notes

PR #16592: fix(agent): enable reasoning_content for Z.AI/GLM models

Description (problem / solution / changelog)

Fixes NousResearch/hermes-agent#16533

Problem

Z.AI/GLM models (glm-5.1, glm-4.7, etc.) never return reasoning_content — even on questions that clearly trigger chain-of-thought when the same model is hit directly via cURL.

Root Cause

Z.AI's API uses the thinking parameter (same format as Kimi) to enable reasoning:

extra_body["thinking"] = {"type": "enabled"}

But Hermes only injected the OpenRouter-style reasoning extra_body, which Z.AI silently ignores. There was no Z.AI-specific handling in the reasoning pipeline.

Fix

Three changes across 2 files:

  1. run_agent.py — Add _is_zai URL detection for z.ai and open.bigmodel.cn hosts, pass is_zai flag to transport
  2. run_agent.py — Add "z-ai/" to OpenRouter reasoning_model_prefixes so Z.AI models get reasoning via OpenRouter
  3. chat_completions.py — Add Z.AI thinking extra_body injection, mirroring the existing Kimi pattern

Files Changed

  • run_agent.py — 8 lines added (Z.AI detection + OpenRouter prefix)
  • agent/transports/chat_completions.py — 11 lines added (thinking extra_body)

Changed files

  • agent/transports/chat_completions.py (modified, +11/-0)
  • run_agent.py (modified, +6/-0)

Code Example

extra_body["reasoning"] = {"enabled": True, "effort": "medium"}

---

extra_body["thinking"] = {"type": "enabled"}

---

curl -sS https://open.bigmodel.cn/api/coding/paas/v4/chat/completions \
  -H "Authorization: Bearer \$GLM_API_KEY" \
  -d '{
    "model": "glm-5.1",
    "messages": [{"role":"user","content":"Xiao Ming has 3 apples at \$2 each and 2 lbs of bananas at \$5/lb. Walk through the math step by step."}],
    "thinking": {"type": "enabled"},
    "stream": true
  }'
RAW_BUFFERClick to expand / collapse

Summary

Hermes' zai provider successfully calls Z.AI's GLM models (e.g. glm-5.1 via the coding plan endpoint https://open.bigmodel.cn/api/coding/paas/v4), but reasoning_content is always empty — even on questions that clearly trigger chain-of-thought when the same model is hit directly.

Root cause: Hermes signals "reasoning capable" for open.bigmodel.cn in _supports_reasoning_extra_body (run_agent.py:7501), which causes the chat- completions transport to inject:

extra_body["reasoning"] = {"enabled": True, "effort": "medium"}

But Z.AI's API doesn't recognize the OpenRouter-style reasoning field. It silently ignores it. Z.AI's documented thinking parameter is:

extra_body["thinking"] = {"type": "enabled"}

(same shape as the existing is_kimi branch in chat_completions.py:232-240).

Without thinking set, GLM-5.1 / GLM-4.6 / GLM-4.5 produce regular completions with no reasoning_content deltas in streaming responses, and no message.reasoning_content field in non-streaming.

Reproduction

Direct cURL against the same endpoint, same model, same payload:

curl -sS https://open.bigmodel.cn/api/coding/paas/v4/chat/completions \
  -H "Authorization: Bearer \$GLM_API_KEY" \
  -d '{
    "model": "glm-5.1",
    "messages": [{"role":"user","content":"Xiao Ming has 3 apples at \$2 each and 2 lbs of bananas at \$5/lb. Walk through the math step by step."}],
    "thinking": {"type": "enabled"},
    "stream": true
  }'

→ 542 streamed `reasoning_content` chunks + 118 `content` chunks.

Same payload through Hermes' agent loop with the same model and endpoint: → 0 `reasoning_content` chunks, only `content`.

The only difference between the working cURL and Hermes' wire payload is the `thinking` field. Verified by dumping Hermes' actual `stream_kwargs` right before the SDK call and replaying it via the OpenAI Python SDK 2.32.0 — adding `extra_body={"thinking": {"type": "enabled"}}` to the same payload immediately produces hundreds of `reasoning_content` deltas.

Proposed fix

Mirror the `is_kimi` branch for Z.AI in `chat_completions.py`:

  1. In `run_agent.py` (next to `_is_kimi`), compute: ```python _is_zai = base_url_host_matches(self._base_url_lower, "open.bigmodel.cn") ``` Pass `is_zai=_is_zai` to `build_kwargs`.

  2. In `chat_completions.py` `build_kwargs`, after the `is_kimi` block: ```python if params.get("is_zai", False): _zai_thinking_enabled = True if reasoning_config and isinstance(reasoning_config, dict): if reasoning_config.get("enabled") is False: _zai_thinking_enabled = False extra_body["thinking"] = { "type": "enabled" if _zai_thinking_enabled else "disabled", } ```

  3. Optionally, exclude `open.bigmodel.cn` from `_supports_reasoning_extra_body` so the no-op `reasoning` field is no longer sent (cosmetic — Z.AI ignores it anyway, but it pollutes the wire payload).

This makes `reasoning_config.enabled = False` correctly disable thinking on Z.AI, mirroring the Kimi behavior. Honors the same on/off semantics. Effort levels could be added later — Z.AI's docs don't currently document an effort scalar for the `thinking` parameter.

Environment

References

extent analysis

TL;DR

The issue can be fixed by modifying the chat_completions.py file to include a conditional block for Z.AI that sets the thinking field in the extra_body dictionary instead of the reasoning field.

Guidance

  • Verify that the thinking field is not being set in the current implementation by checking the extra_body dictionary before the API call.
  • Modify the chat_completions.py file to include a conditional block for Z.AI that sets the thinking field in the extra_body dictionary based on the reasoning_config and is_zai parameters.
  • Test the modified code with the provided cURL command to ensure that the reasoning_content chunks are being received correctly.
  • Consider excluding open.bigmodel.cn from _supports_reasoning_extra_body to prevent the reasoning field from being sent in the wire payload.

Example

if params.get("is_zai", False):
    _zai_thinking_enabled = True
    if reasoning_config and isinstance(reasoning_config, dict):
        if reasoning_config.get("enabled") is False:
            _zai_thinking_enabled = False
    extra_body["thinking"] = {
        "type": "enabled" if _zai_thinking_enabled else "disabled",
    }

Notes

The proposed fix assumes that the is_zai parameter is being passed correctly to the build_kwargs function. Additionally, the fix only addresses the issue with the thinking field and does not modify the reasoning field.

Recommendation

Apply the proposed fix to the chat_completions.py file to correctly set the thinking field for Z.AI and receive the reasoning_content chunks. This fix mirrors the existing behavior for Kimi and honors the same on/off semantics for reasoning.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING