hermes - ✅(Solved) Fix [Bug] Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+) [1 pull requests, 2 comments, 2 participants]

mikronn2 · 2026-05-02T22:59:35Z

[hermes] context compressor.py reads only response.choices 0 .message.content line 871 when extracting the summary from the auxiliary LLM. When using thinking/… `context_compressor.py` reads only `response.choices[0].message.content` (line 871) when extracting the summary from the auxiliary LLM. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3, etc.) via Ollama 0.22+, these models return their output in the `reasoning` field with `content` as an empty string — especially when `max_tokens` is constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction. # PR #19007: fix(compression): use extract_content_or_reasoning for thinking models - Repository: NousResearch/hermes-agent - Author: shellybotmoyer - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/19007 ## Description (problem / solution / changelog) ## Summary Fixes #19003 Context compressor ignores the `reasoning` field when extracting summary content from the auxiliary LLM response. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3.5, etc.) via Ollama 0.22+, these models return their output in the `reasoning` field with `content` as an empty string — especially when `max_tokens` is constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction. ## Bug `agent/context_compressor.py` line 871: ```python content = response.choices[0].message.content # BUG: ignores reasoning field ``` When a thinking model returns `content=""` and `reasoning="..."`, this line produces an empty string. The compressor then falls back to: `"Summary generation failed — inserting static fallback context marker"` In our deployment, 3 of 25 compression events (12%) produced fallback markers. ## Fix Replace raw `.message.content` access with the existing `extract_content_or_reasoning()` helper from `auxiliary_client.py`, which checks `content` first, then falls back to `reasoning`, `reasoning_content`, and `reasoning_details`. ```python from agent.auxiliary_client import call_llm, extract_content_or_reasoning # ... content = extract_content_or_reasoning(response) ``` The `extract_content_or_reasoning()` function already exists and is used by the main agent loop — this just applies the same reasoning-field handling to the compression path. ## Testing - Tested with Ollama 0.22.1+ and `deepseek-v4-flash:cloud` as compression auxiliary - Before fix: `content=""`, `reasoning` held the actual summary → fallback marker inserted - After fix: `extract_content_or_reasoning()` returns the reasoning content → proper summary ## Impact - One-line change + one import addition - No behavior change for models that return content normally (the common path) - Fixes 12% compression failure rate with thinking models ## Changed files - `agent/context_compressor.py` (modified, +2/-2) ## Fix One-line change + one import: ```python # Line 27: from agent.auxiliary_client import call_llm, extract_content_or_reasoning # Line 871: content = extract_content_or_reasoning(response) ``` The `extract_content_or_reasoning()` function already exists and handles `content`, `reasoning`, `reasoning_content`, and `reasoning_details` with appropriate fallback logic and inline think-tag stripping. ## Bug: Context compressor ignores `reasoning` field — empty summaries with thinking models (Ollama 0.22+) ### Summary `context_compressor.py` reads only `response.choices[0].message.content` (line 871) when extracting the summary from the auxiliary LLM. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3, etc.) via Ollama 0.22+, these models return their output in the `reasoning` field with `content` as an empty string — especially when `max_tokens` is constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction. ### Root Cause Ollama 0.22.x changed how thinking models return responses. The `reasoning` field is now always populated for models with a thinking renderer (DeepSeek v4, GLM-5.1, Qwen 3.5, etc.), even without `think: true`. With limited `max_tokens`, the model's reasoning tokens consume the entire budget, and `content` comes back empty. Hermes already has `extract_content_or_reasoning()` in `auxiliary_client.py` (line 3561) that handles this exact case — it checks `content` first, then falls back to `reasoning`, `reasoning_content`, and `reasoning_details`. But `context_compressor.py` doesn't use it. ### Affected Code `agent/context_compressor.py` line 871: ```python content = response.choices[0].message.content # BUG: ignores reasoning field ``` Should be: ```python from agent.auxiliary_client import call_llm, extract_content_or_reasoning # ... content = extract_content_or_reasoning(response) ``` ### Reproduction 1. Configure Hermes with Ollama 0.22.1+ and a thinking model (e.g., `d

hermes2026-05-02 22:59:35

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#19003•Fetched 2026-05-03 04:52:55

View on GitHub

Comments

Participants

Timeline

Reactions

Author

mikronn2

Participants

alt-glitch

mikronn2

Timeline (top)

labeled ×4commented ×2cross-referenced ×1

context_compressor.py reads only response.choices[0].message.content (line 871) when extracting the summary from the auxiliary LLM. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3, etc.) via Ollama 0.22+, these models return their output in the reasoning field with content as an empty string — especially when max_tokens is constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction.

Root Cause

Ollama 0.22.x changed how thinking models return responses. The reasoning field is now always populated for models with a thinking renderer (DeepSeek v4, GLM-5.1, Qwen 3.5, etc.), even without think: true. With limited max_tokens, the model's reasoning tokens consume the entire budget, and content comes back empty.

Hermes already has extract_content_or_reasoning() in auxiliary_client.py (line 3561) that handles this exact case — it checks content first, then falls back to reasoning, reasoning_content, and reasoning_details. But context_compressor.py doesn't use it.

Fix Action

Fix

One-line change + one import:

# Line 27:
from agent.auxiliary_client import call_llm, extract_content_or_reasoning

# Line 871:
content = extract_content_or_reasoning(response)

The extract_content_or_reasoning() function already exists and handles content, reasoning, reasoning_content, and reasoning_details with appropriate fallback logic and inline think-tag stripping.

PR fix notes

PR #19007: fix(compression): use extract_content_or_reasoning for thinking models

Repository: NousResearch/hermes-agent
Author: shellybotmoyer
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/19007

Description (problem / solution / changelog)

Summary

Fixes #19003

Context compressor ignores the reasoning field when extracting summary content from the auxiliary LLM response. When using thinking/reasoning models (DeepSeek v4, GLM-5.1, Qwen 3.5, etc.) via Ollama 0.22+, these models return their output in the reasoning field with content as an empty string — especially when max_tokens is constrained (which compression does). The compressor gets an empty summary and falls back to a static context marker, losing the entire compaction.

Bug

agent/context_compressor.py line 871:

content = response.choices[0].message.content  # BUG: ignores reasoning field

When a thinking model returns content="" and reasoning="...", this line produces an empty string. The compressor then falls back to: "Summary generation failed — inserting static fallback context marker"

In our deployment, 3 of 25 compression events (12%) produced fallback markers.

Fix

Replace raw .message.content access with the existing extract_content_or_reasoning() helper from auxiliary_client.py, which checks content first, then falls back to reasoning, reasoning_content, and reasoning_details.

from agent.auxiliary_client import call_llm, extract_content_or_reasoning
# ...
content = extract_content_or_reasoning(response)

The extract_content_or_reasoning() function already exists and is used by the main agent loop — this just applies the same reasoning-field handling to the compression path.

Testing

Tested with Ollama 0.22.1+ and deepseek-v4-flash:cloud as compression auxiliary
Before fix: content="", reasoning held the actual summary → fallback marker inserted
After fix: extract_content_or_reasoning() returns the reasoning content → proper summary

Impact

One-line change + one import addition
No behavior change for models that return content normally (the common path)
Fixes 12% compression failure rate with thinking models

Changed files

agent/context_compressor.py (modified, +2/-2)

Code Example

content = response.choices[0].message.content  # BUG: ignores reasoning field

---

from agent.auxiliary_client import call_llm, extract_content_or_reasoning
# ...
content = extract_content_or_reasoning(response)

---

# Line 27:
from agent.auxiliary_client import call_llm, extract_content_or_reasoning

# Line 871:
content = extract_content_or_reasoning(response)

RAW_BUFFERClick to expand / collapse

Bug: Context compressor ignores `reasoning` field — empty summaries with thinking models (Ollama 0.22+)

Summary

Root Cause

Affected Code

agent/context_compressor.py line 871:

content = response.choices[0].message.content  # BUG: ignores reasoning field

Should be:

from agent.auxiliary_client import call_llm, extract_content_or_reasoning
# ...
content = extract_content_or_reasoning(response)

Reproduction

Configure Hermes with Ollama 0.22.1+ and a thinking model (e.g., deepseek-v4-flash:cloud or glm-5.1:cloud) as the compression auxiliary
Have a conversation long enough to trigger context compression
Observe that the compression summary is empty — the model's output went entirely to reasoning while content is ""
Compressor falls back to a static context marker: "Summary generation failed — inserting static fallback context marker"

Evidence

In our deployment, 3 of 25 compression events produced fallback markers (12% failure rate). After upgrading to Ollama 0.22.1, testing with deepseek-v4-flash:cloud and max_tokens: 100 returns:

content: "" (empty)
reasoning: "..." (928 chars of actual summary content)

With max_tokens: 300, content returns 75 chars while reasoning takes 929 chars — the reasoning field holds the actual compressed content the compressor needs.

Impact

All thinking models used for compression produce empty summaries when max_tokens is insufficient to cover both reasoning and content
The Caveman Compressor plugin also inherits this bug via super()._generate_summary()
No recovery path — the fallback marker is static text, not a summary

Fix

One-line change + one import:

# Line 27:
from agent.auxiliary_client import call_llm, extract_content_or_reasoning

# Line 871:
content = extract_content_or_reasoning(response)

Environment

Machine 1 (Hazel):

Hermes Agent: 0b76d23 (2026-04-30), 39 commits behind upstream main
OS: Ubuntu 25.10 (Questing Quokka), kernel 6.17.0-23-generic, x86_64
CPU: 12th Gen Intel i7-12700H (14 cores: 6P+8E)
GPU: NVIDIA GeForce RTX 3060 Laptop 6GB, driver 595.58.03
RAM: 32 GB (2x16 GB DDR5, 30.5 GiB usable after kernel reservation)
Python: 3.13.7
Model: glm-5.1:cloud via custom provider (Ollama, http://127.0.0.1:11434/v1)
Ollama: 0.22.1
Models affected in testing: deepseek-v4-flash:cloud, glm-5.1:cloud, qwen3.5-397b-cn-think:latest

Machine 2 (Ember/Nova):

Hermes Agent: 0b76d23 (2026-04-30), 39 commits behind upstream main
OS: Ubuntu 26.04 LTS, kernel 7.0.0-15-generic, x86_64
CPU: AMD Ryzen 9 5900XT 16-Core
GPU: NVIDIA RTX PRO 4000 Blackwell 24GB, driver 595.58.03
RAM: 64 GB (60.7 GiB usable after kernel reservation)
Python: 3.14.4
Model: glm-5.1:cloud via custom provider (Ollama, http://127.0.0.1:11434/v1)
Ollama: 0.22.1
Same models affected

Both machines have the local fix applied and verified. The bug is reproducible on stock Hermes Agent without the patch.

Related Issues

#9344 — Broader "thinking model reasoning tokens exhaust output budget" bug in the main agent loop. This issue (#19003) is a specific instance of that pattern in the compressor subsystem.

extent analysis

TL;DR

The bug in the context compressor can be fixed by using the existing extract_content_or_reasoning() function from auxiliary_client.py to handle the reasoning field returned by thinking models.

Guidance

Identify the affected code in context_compressor.py and replace the line content = response.choices[0].message.content with content = extract_content_or_reasoning(response).
Import the extract_content_or_reasoning function from auxiliary_client.py to use it in the compressor.
Verify that the fix works by testing the compression with thinking models and checking that the summary is no longer empty.
Consider updating the Caveman Compressor plugin to use the same fix, as it inherits the bug via super()._generate_summary().

Example

from agent.auxiliary_client import extract_content_or_reasoning

# ...

content = extract_content_or_reasoning(response)

Notes

The fix relies on the existing extract_content_or_reasoning() function, which handles the reasoning field and provides a fallback logic. This solution assumes that the function is correctly implemented and works as expected.

Recommendation

Apply the workaround by using the extract_content_or_reasoning() function, as it provides a straightforward fix for the issue without requiring any additional changes or upgrades.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#prompt formatting #chain error #conversation history #tool integration #LLM response

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix [Bug] Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+) [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix

PR fix notes

PR #19007: fix(compression): use extract_content_or_reasoning for thinking models

Description (problem / solution / changelog)

Summary

Bug

Fix

Testing

Impact

Changed files

Code Example

Bug: Context compressor ignores reasoning field — empty summaries with thinking models (Ollama 0.22+)

Summary

Root Cause

Affected Code

Reproduction

Evidence

Impact

Fix

Environment

Related Issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Bug: Context compressor ignores `reasoning` field — empty summaries with thinking models (Ollama 0.22+)