hermes - ✅(Solved) Fix Z.AI / GLM via `zai` provider never returns reasoning_content — Hermes sends `extra_body.reasoning` (OpenRouter-style) but Z.AI expects `extra_body.thinking={"type":"enabled"}` [1 pull requests, 2 comments, 3 participants]

nicolas-hey · 2026-04-27T12:20:00Z

[hermes] Hermes' zai provider successfully calls Z.AI's GLM models e.g. glm-5.1 via the coding plan endpoint https://open.bigmodel.cn/api/coding/paas/v4 , but… Hermes' `zai` provider successfully calls Z.AI's GLM models (e.g. `glm-5.1` via the coding plan endpoint `https://open.bigmodel.cn/api/coding/paas/v4`), but `reasoning_content` is **always empty** — even on questions that clearly trigger chain-of-thought when the same model is hit directly. Root cause: Hermes signals "reasoning capable" for `open.bigmodel.cn` in `_supports_reasoning_extra_body` (run_agent.py:7501), which causes the chat- completions transport to inject: ```python extra_body["reasoning"] = {"enabled": True, "effort": "medium"} ``` But Z.AI's API doesn't recognize the OpenRouter-style `reasoning` field. It silently ignores it. Z.AI's documented thinking parameter is: ```python extra_body["thinking"] = {"type": "enabled"} ``` (same shape as the existing `is_kimi` branch in chat_completions.py:232-240). Without `thinking` set, GLM-5.1 / GLM-4.6 / GLM-4.5 produce regular completions with no `reasoning_content` deltas in streaming responses, and no `message.reasoning_content` field in non-streaming. # PR #16592: fix(agent): enable reasoning_content for Z.AI/GLM models - Repository: NousResearch/hermes-agent - Author: vominh1919 - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/16592 ## Description (problem / solution / changelog) ## Fixes NousResearch/hermes-agent#16533 ### Problem Z.AI/GLM models (glm-5.1, glm-4.7, etc.) never return `reasoning_content` — even on questions that clearly trigger chain-of-thought when the same model is hit directly via cURL. ### Root Cause Z.AI's API uses the `thinking` parameter (same format as Kimi) to enable reasoning: ```python extra_body["thinking"] = {"type": "enabled"} ``` But Hermes only injected the OpenRouter-style `reasoning` extra_body, which Z.AI silently ignores. There was no Z.AI-specific handling in the reasoning pipeline. ### Fix Three changes across 2 files: 1. **`run_agent.py`** — Add `_is_zai` URL detection for `z.ai` and `open.bigmodel.cn` hosts, pass `is_zai` flag to transport 2. **`run_agent.py`** — Add `"z-ai/"` to OpenRouter `reasoning_model_prefixes` so Z.AI models get reasoning via OpenRouter 3. **`chat_completions.py`** — Add Z.AI `thinking` extra_body injection, mirroring the existing Kimi pattern ### Files Changed - `run_agent.py` — 8 lines added (Z.AI detection + OpenRouter prefix) - `agent/transports/chat_completions.py` — 11 lines added (thinking extra_body) ## Changed files - `agent/transports/chat_completions.py` (modified, +11/-0) - `run_agent.py` (modified, +6/-0) ## Fixed - Fixed by PR: fix(agent): enable reasoning_content for Z.AI/GLM models (https://github.com/NousResearch/hermes-agent/pull/16592) ## Summary Hermes' `zai` provider successfully calls Z.AI's GLM models (e.g. `glm-5.1` via the coding plan endpoint `https://open.bigmodel.cn/api/coding/paas/v4`), but `reasoning_content` is **always empty** — even on questions that clearly trigger chain-of-thought when the same model is hit directly. Root cause: Hermes signals "reasoning capable" for `open.bigmodel.cn` in `_supports_reasoning_extra_body` (run_agent.py:7501), which causes the chat- completions transport to inject: ```python extra_body["reasoning"] = {"enabled": True, "effort": "medium"} ``` But Z.AI's API doesn't recognize the OpenRouter-style `reasoning` field. It silently ignores it. Z.AI's documented thinking parameter is: ```python extra_body["thinking"] = {"type": "enabled"} ``` (same shape as the existing `is_kimi` branch in chat_completions.py:232-240). Without `thinking` set, GLM-5.1 / GLM-4.6 / GLM-4.5 produce regular completions with no `reasoning_content` deltas in streaming responses, and no `message.reasoning_content` field in non-streaming. ## Reproduction Direct cURL against the same endpoint, same model, same payload: ```bash curl -sS https://open.bigmodel.cn/api/coding/paas/v4/chat/completions \ -H "Authorization: Bearer \$GLM_API_KEY" \ -d '{ "model": "glm-5.1", "messages": [{"role":"user","content":"Xiao Ming has 3 apples at \$2 each and 2 lbs of bananas at \$5/lb. Walk through the math step by step."}], "thinking": {"type": "enabled"}, "stream": true }' ``` → 542 streamed \`reasoning_content\` chunks + 118 \`content\` chunks. Same payload through Hermes' agent loop with the same model and endpoint: → 0 \`reasoning_content\` chunks, only \`content\`. The only difference between the working cURL and Hermes' wire payload is the \`thinking\` field. Verified by dumping Hermes' actual \`stream_kwargs\` right before the SDK call and replaying it via the OpenAI Python SDK 2.32.0 — adding \`extra_body={\"thinking\": {\"type\": \"enabled\"}}\` to the same payload immediately produces hundreds of \`reasoning_content\` deltas. ## Proposed fix Mirror the \`is_kimi\` branch for Z.AI in \`chat_completions.py\`: 1. In \`run_agent.py\` (next to \`_is_kimi\`), compute:

hermes2026-04-27 12:20:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#16533•Fetched 2026-04-28 06:52:45

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

labeled ×4commented ×2cross-referenced ×1referenced ×1

Hermes' zai provider successfully calls Z.AI's GLM models (e.g. glm-5.1 via the coding plan endpoint https://open.bigmodel.cn/api/coding/paas/v4), but reasoning_content is always empty — even on questions that clearly trigger chain-of-thought when the same model is hit directly.

Root cause: Hermes signals "reasoning capable" for open.bigmodel.cn in _supports_reasoning_extra_body (run_agent.py:7501), which causes the chat- completions transport to inject:

extra_body["reasoning"] = {"enabled": True, "effort": "medium"}

But Z.AI's API doesn't recognize the OpenRouter-style reasoning field. It silently ignores it. Z.AI's documented thinking parameter is:

extra_body["thinking"] = {"type": "enabled"}

(same shape as the existing is_kimi branch in chat_completions.py:232-240).

Without thinking set, GLM-5.1 / GLM-4.6 / GLM-4.5 produce regular completions with no reasoning_content deltas in streaming responses, and no message.reasoning_content field in non-streaming.

Root Cause

Root cause: Hermes signals "reasoning capable" for open.bigmodel.cn in _supports_reasoning_extra_body (run_agent.py:7501), which causes the chat- completions transport to inject:

Fix Action

Fixed

Fixed by PR: fix(agent): enable reasoning_content for Z.AI/GLM models (https://github.com/NousResearch/hermes-agent/pull/16592)

PR fix notes

PR #16592: fix(agent): enable reasoning_content for Z.AI/GLM models

Repository: NousResearch/hermes-agent
Author: vominh1919
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/16592

Description (problem / solution / changelog)

Fixes NousResearch/hermes-agent#16533

Problem

Z.AI/GLM models (glm-5.1, glm-4.7, etc.) never return reasoning_content — even on questions that clearly trigger chain-of-thought when the same model is hit directly via cURL.

Root Cause

Z.AI's API uses the thinking parameter (same format as Kimi) to enable reasoning:

extra_body["thinking"] = {"type": "enabled"}

But Hermes only injected the OpenRouter-style reasoning extra_body, which Z.AI silently ignores. There was no Z.AI-specific handling in the reasoning pipeline.

Fix

Three changes across 2 files:

run_agent.py — Add _is_zai URL detection for z.ai and open.bigmodel.cn hosts, pass is_zai flag to transport
run_agent.py — Add "z-ai/" to OpenRouter reasoning_model_prefixes so Z.AI models get reasoning via OpenRouter
chat_completions.py — Add Z.AI thinking extra_body injection, mirroring the existing Kimi pattern

Files Changed

run_agent.py — 8 lines added (Z.AI detection + OpenRouter prefix)
agent/transports/chat_completions.py — 11 lines added (thinking extra_body)

Changed files

agent/transports/chat_completions.py (modified, +11/-0)
run_agent.py (modified, +6/-0)

Code Example

extra_body["reasoning"] = {"enabled": True, "effort": "medium"}

---

extra_body["thinking"] = {"type": "enabled"}

---

curl -sS https://open.bigmodel.cn/api/coding/paas/v4/chat/completions \
  -H "Authorization: Bearer \$GLM_API_KEY" \
  -d '{
    "model": "glm-5.1",
    "messages": [{"role":"user","content":"Xiao Ming has 3 apples at \$2 each and 2 lbs of bananas at \$5/lb. Walk through the math step by step."}],
    "thinking": {"type": "enabled"},
    "stream": true
  }'

RAW_BUFFERClick to expand / collapse

Summary

Root cause: Hermes signals "reasoning capable" for open.bigmodel.cn in _supports_reasoning_extra_body (run_agent.py:7501), which causes the chat- completions transport to inject:

extra_body["reasoning"] = {"enabled": True, "effort": "medium"}

But Z.AI's API doesn't recognize the OpenRouter-style reasoning field. It silently ignores it. Z.AI's documented thinking parameter is:

extra_body["thinking"] = {"type": "enabled"}

(same shape as the existing is_kimi branch in chat_completions.py:232-240).

Without thinking set, GLM-5.1 / GLM-4.6 / GLM-4.5 produce regular completions with no reasoning_content deltas in streaming responses, and no message.reasoning_content field in non-streaming.

Reproduction

Direct cURL against the same endpoint, same model, same payload:

curl -sS https://open.bigmodel.cn/api/coding/paas/v4/chat/completions \
  -H "Authorization: Bearer \$GLM_API_KEY" \
  -d '{
    "model": "glm-5.1",
    "messages": [{"role":"user","content":"Xiao Ming has 3 apples at \$2 each and 2 lbs of bananas at \$5/lb. Walk through the math step by step."}],
    "thinking": {"type": "enabled"},
    "stream": true
  }'

→ 542 streamed `reasoning_content` chunks + 118 `content` chunks.

Same payload through Hermes' agent loop with the same model and endpoint: → 0 `reasoning_content` chunks, only `content`.

The only difference between the working cURL and Hermes' wire payload is the `thinking` field. Verified by dumping Hermes' actual `stream_kwargs` right before the SDK call and replaying it via the OpenAI Python SDK 2.32.0 — adding `extra_body={"thinking": {"type": "enabled"}}` to the same payload immediately produces hundreds of `reasoning_content` deltas.

Proposed fix

Mirror the `is_kimi` branch for Z.AI in `chat_completions.py`:

In `run_agent.py` (next to `_is_kimi`), compute: ```python _is_zai = base_url_host_matches(self._base_url_lower, "open.bigmodel.cn") ``` Pass `is_zai=_is_zai` to `build_kwargs`.
In `chat_completions.py` `build_kwargs`, after the `is_kimi` block: ```python if params.get("is_zai", False): _zai_thinking_enabled = True if reasoning_config and isinstance(reasoning_config, dict): if reasoning_config.get("enabled") is False: _zai_thinking_enabled = False extra_body["thinking"] = { "type": "enabled" if _zai_thinking_enabled else "disabled", } ```
Optionally, exclude `open.bigmodel.cn` from `_supports_reasoning_extra_body` so the no-op `reasoning` field is no longer sent (cosmetic — Z.AI ignores it anyway, but it pollutes the wire payload).

This makes `reasoning_config.enabled = False` correctly disable thinking on Z.AI, mirroring the Kimi behavior. Honors the same on/off semantics. Effort levels could be added later — Z.AI's docs don't currently document an effort scalar for the `thinking` parameter.

Environment

Hermes commit: `f67a61dc` (recent main)
Provider: `zai`
Endpoint: `https://open.bigmodel.cn/api/coding/paas/v4\` (coding plan)
Models tested: `glm-5.1`
OpenAI SDK: `2.32.0`

References

extent analysis

TL;DR

The issue can be fixed by modifying the chat_completions.py file to include a conditional block for Z.AI that sets the thinking field in the extra_body dictionary instead of the reasoning field.

Guidance

Verify that the thinking field is not being set in the current implementation by checking the extra_body dictionary before the API call.
Modify the chat_completions.py file to include a conditional block for Z.AI that sets the thinking field in the extra_body dictionary based on the reasoning_config and is_zai parameters.
Test the modified code with the provided cURL command to ensure that the reasoning_content chunks are being received correctly.
Consider excluding open.bigmodel.cn from _supports_reasoning_extra_body to prevent the reasoning field from being sent in the wire payload.

Example

if params.get("is_zai", False):
    _zai_thinking_enabled = True
    if reasoning_config and isinstance(reasoning_config, dict):
        if reasoning_config.get("enabled") is False:
            _zai_thinking_enabled = False
    extra_body["thinking"] = {
        "type": "enabled" if _zai_thinking_enabled else "disabled",
    }

Notes

The proposed fix assumes that the is_zai parameter is being passed correctly to the build_kwargs function. Additionally, the fix only addresses the issue with the thinking field and does not modify the reasoning field.

Recommendation

Apply the proposed fix to the chat_completions.py file to correctly set the thinking field for Z.AI and receive the reasoning_content chunks. This fix mirrors the existing behavior for Kimi and honors the same on/off semantics for reasoning.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #model download #tokenizer error #prompt formatting #chain error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix Z.AI / GLM via `zai` provider never returns reasoning_content — Hermes sends `extra_body.reasoning` (OpenRouter-style) but Z.AI expects `extra_body.thinking={"type":"enabled"}` [1 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #16592: fix(agent): enable reasoning_content for Z.AI/GLM models

Description (problem / solution / changelog)

Fixes NousResearch/hermes-agent#16533

Problem

Root Cause

Fix

Files Changed

Changed files

Code Example

Summary

Reproduction

Proposed fix

Environment

References

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix Z.AI / GLM via `zai` provider never returns reasoning_content — Hermes sends `extra_body.reasoning` (OpenRouter-style) but Z.AI expects `extra_body.thinking={"type":"enabled"}` [1 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #16592: fix(agent): enable reasoning_content for Z.AI/GLM models

Description (problem / solution / changelog)

Fixes NousResearch/hermes-agent#16533

Problem

Root Cause

Fix

Files Changed

Changed files

Code Example

Summary

Reproduction

Proposed fix

Environment

References

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING