hermes - ✅(Solved) Fix [Feature]: Preserved thinking for GLM models when the inference provider supports it. [1 pull requests, 1 comments, 1 participants]

neuneu2k · 2026-04-17T08:01:43Z

[hermes] PR 11494: feat agent : add z.ai/GLM-5 preserved thinking support - Repository: NousResearch/hermes-agent - Author: neuneu2k - State: open | merged: Fa… # PR #11494: feat(agent): add z.ai/GLM-5 preserved thinking support - Repository: NousResearch/hermes-agent - Author: neuneu2k - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/11494 ## Description (problem / solution / changelog) Enable z.ai/Zhipu GLM-5.x and GLM-4.7 preserved thinking mode for multi-turn agent loops. Three changes in run_agent.py: 1. _is_zai_direct() helper — detects zai provider or known z.ai/bigmodel endpoint URLs (api.z.ai, open.bigmodel.cn). 2. _build_api_kwargs() — injects thinking parameter in extra_body for GLM-5/4.7 models: - Default: {type: enabled, compact_history: false} (preserved thinking) - reasoning_config.enabled=false → {type: disabled} - GLM-4.6/4.5 excluded (they auto-determine thinking) 3. Message sanitization — re-injects reasoning_content on assistant messages for z.ai so multi-turn reasoning continuity works with compact_history=false. Response-side extraction was already handled by the generic _extract_reasoning() method (checks reasoning_content field). Tests: 19 new tests covering detection, parameter injection, config gating, and multi-turn passthrough. ## What does this PR do? The GLM 5 family, and to a lesser degree the 4.7 line, has been trained on preserved interleaved thinking, It's supposed to improve chained tool calling by keeping the reasoning steps in context instead as a short term memory. This PR enables preserved thinking mode on z.ai models if and only if they are served directly from their inference endpoints. ## Related Issue Fixes [Preserved thinking for GLM models when the inference provider supports it.](https://github.com/NousResearch/hermes-agent/issues/11483) ## Type of Change - [ ] 🐛 Bug fix (non-breaking change that fixes an issue) - [x] ✨ New feature (non-breaking change that adds functionality) - [ ] 🔒 Security fix - [ ] 📝 Documentation update - [ ] ✅ Tests (adding or improving test coverage) - [ ] ♻️ Refactor (no behavior change) - [ ] 🎯 New skill (bundled or hub) ### Code - [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md) - [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.) - [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate - [x] My PR contains **only** changes related to this fix/feature (no unrelated commits) - [x] I've run `pytest tests/ -q` and all tests pass - [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features) - [x] I've tested on my platform: Debian GNU/Linux 12 (bookworm) ### Documentation & Housekeeping - [ ] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [ ] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [ ] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A - [ ] I've updated tool descriptions/schemas if I changed tool behavior — or N/A ## Changed files - `run_agent.py` (modified, +45/-0) - `tests/run_agent/test_zai_thinking.py` (added, +260/-0) ## Fixed - Fixed by PR: feat(agent): add z.ai/GLM-5 preserved thinking support (https://github.com/NousResearch/hermes-agent/pull/11494) ### Problem or Use Case The GLM 5 series (and to a lesser degree the 4.7) has been trained in a ‘preserved thinking' mode for agentic use. In that mode, interleaved reasoning traces in between tool calls are retained to serve as a short term memory for persistence of intent in long tool call chains, it's supposed to significantly improve long session capabilities and while it does increase (cached) input tokens and context use, if it does limit the number of necessary tool calls necessary to a task, it’s likely worthwhile. While this is a feature that is intrinsic to the model, I’d only activate it on inference providers that publicize supporting it, it may conflict with prefix caching hit rate on providers who don’t. ### Proposed Solution Default preserved thinking on supported models if they are directly provided by z.ai. ### Alternatives Considered _No response_ ### Feature Type Performance / reliability ### Scope Small (single file, < 50 lines) ### Contribution - [x] I'd like to implement this myself and submit a PR ### Debug Report (optional) ```shell ```

hermes2026-04-17 08:01:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#11483•Fetched 2026-04-18 06:00:50

View on GitHub

Comments

Participants

Timeline

Reactions

Author

neuneu2k

Participants

neuneu2k

Timeline (top)

commented ×1cross-referenced ×1labeled ×1

RAW_BUFFERClick to expand / collapse

Problem or Use Case

The GLM 5 series (and to a lesser degree the 4.7) has been trained in a ‘preserved thinking' mode for agentic use.

In that mode, interleaved reasoning traces in between tool calls are retained to serve as a short term memory for persistence of intent in long tool call chains, it's supposed to significantly improve long session capabilities and while it does increase (cached) input tokens and context use, if it does limit the number of necessary tool calls necessary to a task, it’s likely worthwhile.

While this is a feature that is intrinsic to the model, I’d only activate it on inference providers that publicize supporting it, it may conflict with prefix caching hit rate on providers who don’t.

Proposed Solution

Default preserved thinking on supported models if they are directly provided by z.ai.

Alternatives Considered

No response

Feature Type

Performance / reliability

Scope

Small (single file, < 50 lines)

Contribution

I'd like to implement this myself and submit a PR

Debug Report (optional)

extent analysis

TL;DR

Enable the "preserved thinking" mode by default on supported models directly provided by z.ai to improve long session capabilities.

Guidance

Identify the models supported by z.ai that have the "preserved thinking" feature and verify their compatibility.
Determine the conditions under which the "preserved thinking" mode should be activated to avoid conflicts with prefix caching hit rates on unsupported providers.
Consider adding a configuration option to allow users to opt-in or opt-out of the "preserved thinking" mode.
Test the implementation on a small scale to ensure it does not introduce any performance issues or regressions.

Notes

The solution assumes that the "preserved thinking" mode is a feature of the GLM 5 series and 4.7 models, and that z.ai provides these models. The implementation should be careful not to conflict with prefix caching hit rates on unsupported providers.

Recommendation

Apply workaround: Enable the "preserved thinking" mode by default on supported models directly provided by z.ai, as it is likely to improve long session capabilities with minimal risks.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#installation #tensor shape #autograd error #model save/load #optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix [Feature]: Preserved thinking for GLM models when the inference provider supports it. [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #11494: feat(agent): add z.ai/GLM-5 preserved thinking support

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Code

Documentation & Housekeeping

Changed files

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Debug Report (optional)

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix [Feature]: Preserved thinking for GLM models when the inference provider supports it. [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #11494: feat(agent): add z.ai/GLM-5 preserved thinking support

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Code

Documentation & Housekeeping

Changed files

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Debug Report (optional)

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING