hermes - ✅(Solved) Fix [Bug] Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+) [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#19003Fetched 2026-05-03 04:52:55
View on GitHub
Comments
2
Participants
2
Timeline
7
Reactions
0
Author
Participants
Timeline (top)
labeled ×4commented ×2cross-referenced ×1

context_compressor.py reads only response.choices[0].message.content (line 871) when extracting the summary from the auxiliary LLM. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3, etc.) via Ollama 0.22+, these models return their output in the reasoning field with content as an empty string — especially when max_tokens is constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction.

Root Cause

Ollama 0.22.x changed how thinking models return responses. The reasoning field is now always populated for models with a thinking renderer (DeepSeek v4, GLM-5.1, Qwen 3.5, etc.), even without think: true. With limited max_tokens, the model's reasoning tokens consume the entire budget, and content comes back empty.

Hermes already has extract_content_or_reasoning() in auxiliary_client.py (line 3561) that handles this exact case — it checks content first, then falls back to reasoning, reasoning_content, and reasoning_details. But context_compressor.py doesn't use it.

Fix Action

Fix

One-line change + one import:

# Line 27:
from agent.auxiliary_client import call_llm, extract_content_or_reasoning

# Line 871:
content = extract_content_or_reasoning(response)

The extract_content_or_reasoning() function already exists and handles content, reasoning, reasoning_content, and reasoning_details with appropriate fallback logic and inline think-tag stripping.

PR fix notes

PR #19007: fix(compression): use extract_content_or_reasoning for thinking models

Description (problem / solution / changelog)

Summary

Fixes #19003

Context compressor ignores the reasoning field when extracting summary content from the auxiliary LLM response. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3.5, etc.) via Ollama 0.22+, these models return their output in the reasoning field with content as an empty string — especially when max_tokens is constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction.

Bug

agent/context_compressor.py line 871:

content = response.choices[0].message.content  # BUG: ignores reasoning field

When a thinking model returns content="" and reasoning="...", this line produces an empty string. The compressor then falls back to: "Summary generation failed — inserting static fallback context marker"

In our deployment, 3 of 25 compression events (12%) produced fallback markers.

Fix

Replace raw .message.content access with the existing extract_content_or_reasoning() helper from auxiliary_client.py, which checks content first, then falls back to reasoning, reasoning_content, and reasoning_details.

from agent.auxiliary_client import call_llm, extract_content_or_reasoning
# ...
content = extract_content_or_reasoning(response)

The extract_content_or_reasoning() function already exists and is used by the main agent loop — this just applies the same reasoning-field handling to the compression path.

Testing

  • Tested with Ollama 0.22.1+ and deepseek-v4-flash:cloud as compression auxiliary
  • Before fix: content="", reasoning held the actual summary → fallback marker inserted
  • After fix: extract_content_or_reasoning() returns the reasoning content → proper summary

Impact

  • One-line change + one import addition
  • No behavior change for models that return content normally (the common path)
  • Fixes 12% compression failure rate with thinking models

Changed files

  • agent/context_compressor.py (modified, +2/-2)

Code Example

content = response.choices[0].message.content  # BUG: ignores reasoning field

---

from agent.auxiliary_client import call_llm, extract_content_or_reasoning
# ...
content = extract_content_or_reasoning(response)

---

# Line 27:
from agent.auxiliary_client import call_llm, extract_content_or_reasoning

# Line 871:
content = extract_content_or_reasoning(response)
RAW_BUFFERClick to expand / collapse

Bug: Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+)

Summary

context_compressor.py reads only response.choices[0].message.content (line 871) when extracting the summary from the auxiliary LLM. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3, etc.) via Ollama 0.22+, these models return their output in the reasoning field with content as an empty string — especially when max_tokens is constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction.

Root Cause

Ollama 0.22.x changed how thinking models return responses. The reasoning field is now always populated for models with a thinking renderer (DeepSeek v4, GLM-5.1, Qwen 3.5, etc.), even without think: true. With limited max_tokens, the model's reasoning tokens consume the entire budget, and content comes back empty.

Hermes already has extract_content_or_reasoning() in auxiliary_client.py (line 3561) that handles this exact case — it checks content first, then falls back to reasoning, reasoning_content, and reasoning_details. But context_compressor.py doesn't use it.

Affected Code

agent/context_compressor.py line 871:

content = response.choices[0].message.content  # BUG: ignores reasoning field

Should be:

from agent.auxiliary_client import call_llm, extract_content_or_reasoning
# ...
content = extract_content_or_reasoning(response)

Reproduction

  1. Configure Hermes with Ollama 0.22.1+ and a thinking model (e.g., deepseek-v4-flash:cloud or glm-5.1:cloud) as the compression auxiliary
  2. Have a conversation long enough to trigger context compression
  3. Observe that the compression summary is empty — the model's output went entirely to reasoning while content is ""
  4. Compressor falls back to a static context marker: "Summary generation failed — inserting static fallback context marker"

Evidence

In our deployment, 3 of 25 compression events produced fallback markers (12% failure rate). After upgrading to Ollama 0.22.1, testing with deepseek-v4-flash:cloud and max_tokens: 100 returns:

  • content: "" (empty)
  • reasoning: "..." (928 chars of actual summary content)

With max_tokens: 300, content returns 75 chars while reasoning takes 929 chars — the reasoning field holds the actual compressed content the compressor needs.

Impact

  • All thinking models used for compression produce empty summaries when max_tokens is insufficient to cover both reasoning and content
  • The Caveman Compressor plugin also inherits this bug via super()._generate_summary()
  • No recovery path — the fallback marker is static text, not a summary

Fix

One-line change + one import:

# Line 27:
from agent.auxiliary_client import call_llm, extract_content_or_reasoning

# Line 871:
content = extract_content_or_reasoning(response)

The extract_content_or_reasoning() function already exists and handles content, reasoning, reasoning_content, and reasoning_details with appropriate fallback logic and inline think-tag stripping.

Environment

Machine 1 (Hazel):

  • Hermes Agent: 0b76d23 (2026-04-30), 39 commits behind upstream main
  • OS: Ubuntu 25.10 (Questing Quokka), kernel 6.17.0-23-generic, x86_64
  • CPU: 12th Gen Intel i7-12700H (14 cores: 6P+8E)
  • GPU: NVIDIA GeForce RTX 3060 Laptop 6GB, driver 595.58.03
  • RAM: 32 GB (2x16 GB DDR5, 30.5 GiB usable after kernel reservation)
  • Python: 3.13.7
  • Model: glm-5.1:cloud via custom provider (Ollama, http://127.0.0.1:11434/v1)
  • Ollama: 0.22.1
  • Models affected in testing: deepseek-v4-flash:cloud, glm-5.1:cloud, qwen3.5-397b-cn-think:latest

Machine 2 (Ember/Nova):

  • Hermes Agent: 0b76d23 (2026-04-30), 39 commits behind upstream main
  • OS: Ubuntu 26.04 LTS, kernel 7.0.0-15-generic, x86_64
  • CPU: AMD Ryzen 9 5900XT 16-Core
  • GPU: NVIDIA RTX PRO 4000 Blackwell 24GB, driver 595.58.03
  • RAM: 64 GB (60.7 GiB usable after kernel reservation)
  • Python: 3.14.4
  • Model: glm-5.1:cloud via custom provider (Ollama, http://127.0.0.1:11434/v1)
  • Ollama: 0.22.1
  • Same models affected

Both machines have the local fix applied and verified. The bug is reproducible on stock Hermes Agent without the patch.

Related Issues

  • #9344 — Broader "thinking model reasoning tokens exhaust output budget" bug in the main agent loop. This issue (#19003) is a specific instance of that pattern in the compressor subsystem.

extent analysis

TL;DR

The bug in the context compressor can be fixed by using the existing extract_content_or_reasoning() function from auxiliary_client.py to handle the reasoning field returned by thinking models.

Guidance

  • Identify the affected code in context_compressor.py and replace the line content = response.choices[0].message.content with content = extract_content_or_reasoning(response).
  • Import the extract_content_or_reasoning function from auxiliary_client.py to use it in the compressor.
  • Verify that the fix works by testing the compression with thinking models and checking that the summary is no longer empty.
  • Consider updating the Caveman Compressor plugin to use the same fix, as it inherits the bug via super()._generate_summary().

Example

from agent.auxiliary_client import extract_content_or_reasoning

# ...

content = extract_content_or_reasoning(response)

Notes

The fix relies on the existing extract_content_or_reasoning() function, which handles the reasoning field and provides a fallback logic. This solution assumes that the function is correctly implemented and works as expected.

Recommendation

Apply the workaround by using the extract_content_or_reasoning() function, as it provides a straightforward fix for the issue without requiring any additional changes or upgrades.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug] Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+) [1 pull requests, 2 comments, 2 participants]