hermes - ✅(Solved) Fix [Feature]: Preserved thinking for GLM models when the inference provider supports it. [1 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#11483Fetched 2026-04-18 06:00:50
View on GitHub
Comments
1
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
commented ×1cross-referenced ×1labeled ×1

Fix Action

Fixed

PR fix notes

PR #11494: feat(agent): add z.ai/GLM-5 preserved thinking support

Description (problem / solution / changelog)

Enable z.ai/Zhipu GLM-5.x and GLM-4.7 preserved thinking mode for multi-turn agent loops.

Three changes in run_agent.py:

  1. _is_zai_direct() helper — detects zai provider or known z.ai/bigmodel endpoint URLs (api.z.ai, open.bigmodel.cn).

  2. _build_api_kwargs() — injects thinking parameter in extra_body for GLM-5/4.7 models:

    • Default: {type: enabled, compact_history: false} (preserved thinking)
    • reasoning_config.enabled=false → {type: disabled}
    • GLM-4.6/4.5 excluded (they auto-determine thinking)
  3. Message sanitization — re-injects reasoning_content on assistant messages for z.ai so multi-turn reasoning continuity works with compact_history=false.

Response-side extraction was already handled by the generic _extract_reasoning() method (checks reasoning_content field).

Tests: 19 new tests covering detection, parameter injection, config gating, and multi-turn passthrough.

What does this PR do?

The GLM 5 family, and to a lesser degree the 4.7 line, has been trained on preserved interleaved thinking, It's supposed to improve chained tool calling by keeping the reasoning steps in context instead as a short term memory.

This PR enables preserved thinking mode on z.ai models if and only if they are served directly from their inference endpoints.

Related Issue

Fixes Preserved thinking for GLM models when the inference provider supports it.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Debian GNU/Linux 12 (bookworm)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Changed files

  • run_agent.py (modified, +45/-0)
  • tests/run_agent/test_zai_thinking.py (added, +260/-0)
RAW_BUFFERClick to expand / collapse

Problem or Use Case

The GLM 5 series (and to a lesser degree the 4.7) has been trained in a ‘preserved thinking' mode for agentic use.

In that mode, interleaved reasoning traces in between tool calls are retained to serve as a short term memory for persistence of intent in long tool call chains, it's supposed to significantly improve long session capabilities and while it does increase (cached) input tokens and context use, if it does limit the number of necessary tool calls necessary to a task, it’s likely worthwhile.

While this is a feature that is intrinsic to the model, I’d only activate it on inference providers that publicize supporting it, it may conflict with prefix caching hit rate on providers who don’t.

Proposed Solution

Default preserved thinking on supported models if they are directly provided by z.ai.

Alternatives Considered

No response

Feature Type

Performance / reliability

Scope

Small (single file, < 50 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

extent analysis

TL;DR

Enable the "preserved thinking" mode by default on supported models directly provided by z.ai to improve long session capabilities.

Guidance

  • Identify the models supported by z.ai that have the "preserved thinking" feature and verify their compatibility.
  • Determine the conditions under which the "preserved thinking" mode should be activated to avoid conflicts with prefix caching hit rates on unsupported providers.
  • Consider adding a configuration option to allow users to opt-in or opt-out of the "preserved thinking" mode.
  • Test the implementation on a small scale to ensure it does not introduce any performance issues or regressions.

Notes

The solution assumes that the "preserved thinking" mode is a feature of the GLM 5 series and 4.7 models, and that z.ai provides these models. The implementation should be careful not to conflict with prefix caching hit rates on unsupported providers.

Recommendation

Apply workaround: Enable the "preserved thinking" mode by default on supported models directly provided by z.ai, as it is likely to improve long session capabilities with minimal risks.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING