hermes - 💡(How to fix) Fix [Bug]: Compression / Summarize In general Destroy Sessions? [1 comments, 1 participants]

hermes2026-04-21 15:00:36

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#13576•Fetched 2026-04-22 08:05:41

View on GitHub

Comments

Participants

Timeline

Reactions

Author

BluePeer

Participants

BluePeer

Timeline (top)

closed ×1commented ×1labeled ×1

Error Message

Additional Logs / Traceback (optional)

Root Cause

idk why nobody else see this ? with enabled compression or in any other different case where hermes goes to limits and use summarzize, it produces 100% useless sessions reason is this

"max_tokens": 2600

+64000 tokens unable to get compressed in to 2600 special not if you use a reasoning model that toke this alone for the reasoning

Code Example

"max_tokens": 2600

---

2026-04-21 16:38:44 [DEBUG]
 Received request: POST to /v1/chat/completions with body  {
  "messages": [
    {
      "role": "user",
      "content": "You are a summarization agent creating a context c... <Truncated in logs> ...mmary body. Do not include any preamble or prefix."
    }
  ],
  "model": "qwen3.6-35b-a3b",
  "max_tokens": 2600
}

---

2026-04-21 16:42:36  [INFO]
 [qwen3.6-35b-a3b] Generated prediction:  {
  "id": "chatcmpl-rs35uhriykpjlbkt2oero",
  "object": "chat.completion",
  "created": 1776782324,
  "model": "qwen3.6-35b-a3b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "reasoning_content": "Here's a thinking process:\n\n1.  **Analyze User Input:**\n 
..........................................
cutted
..........................................
I'll use: \"Core-Typen im Source Code lesen und mit der MD vergleichen\" as the active task, but I'll format it exactly as requested. Actually, I'll just write: \"Vergleich der Core-Typen im Source Code (`AIAgent`, `AgentSession`, `ChatClientAgent`) mit der Dokumentation (`AGENTS.md`)\" to be safe.\n\n   Let's",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 1231,
    "completion_tokens": 2600,
    "total_tokens": 3831,
    "completion_tokens_details": {
      "reasoning_tokens": 2599
    }
  },
  "stats": {},
  "system_fingerprint": "qwen3.6-35b-a3b"
}

---

Report     https://paste.rs/L2fZd
  agent.log  https://paste.rs/nQizU

---

RAW_BUFFERClick to expand / collapse

Bug Description

idk why nobody else see this ? with enabled compression or in any other different case where hermes goes to limits and use summarzize, it produces 100% useless sessions reason is this

"max_tokens": 2600

+64000 tokens unable to get compressed in to 2600 special not if you use a reasoning model that toke this alone for the reasoning

in my case i tryed to use hermes with my LM Studio and wonder why it start perfect and end up mostly in horrible dumb situations, after monitoring , i found out this, it alltime add a max_tokens 2600 to the summarize action

Steps to Reproduce

2026-04-21 16:38:44 [DEBUG]
 Received request: POST to /v1/chat/completions with body  {
  "messages": [
    {
      "role": "user",
      "content": "You are a summarization agent creating a context c... <Truncated in logs> ...mmary body. Do not include any preamble or prefix."
    }
  ],
  "model": "qwen3.6-35b-a3b",
  "max_tokens": 2600
}

this result then in a output like this

2026-04-21 16:42:36  [INFO]
 [qwen3.6-35b-a3b] Generated prediction:  {
  "id": "chatcmpl-rs35uhriykpjlbkt2oero",
  "object": "chat.completion",
  "created": 1776782324,
  "model": "qwen3.6-35b-a3b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "reasoning_content": "Here's a thinking process:\n\n1.  **Analyze User Input:**\n 
..........................................
cutted
..........................................
I'll use: \"Core-Typen im Source Code lesen und mit der MD vergleichen\" as the active task, but I'll format it exactly as requested. Actually, I'll just write: \"Vergleich der Core-Typen im Source Code (`AIAgent`, `AgentSession`, `ChatClientAgent`) mit der Dokumentation (`AGENTS.md`)\" to be safe.\n\n   Let's",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 1231,
    "completion_tokens": 2600,
    "total_tokens": 3831,
    "completion_tokens_details": {
      "reasoning_tokens": 2599
    }
  },
  "stats": {},
  "system_fingerprint": "qwen3.6-35b-a3b"
}

Expected Behavior

do not kill a summarize , and work then with that useless data

Actual Behavior

it try to compress/summarize , that fails related to the 2600 token limit, with or without reasoning models , it end up with a small % of the real summarize that is complete useless

Affected Component

Other, Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

No response

Debug Report

Report     https://paste.rs/L2fZd
  agent.log  https://paste.rs/nQizU

Operating System

Windows 10 WSL2 Debian

Python Version

No response

Hermes Version

latest

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

the max_token setting and it out of real world usecases small value

Proposed Fix (optional)

remove the max_token entry in this type of requests

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

extent analysis

TL;DR

Increase the max_tokens limit to a value that can accommodate the required summarization, or remove the max_tokens entry for summarization requests.

Guidance

The issue is caused by the max_tokens limit of 2600, which is too low for the summarization task, resulting in incomplete and useless summaries.
To verify the issue, check the usage section of the response, which shows the prompt_tokens, completion_tokens, and total_tokens used.
To mitigate the issue, try increasing the max_tokens limit to a higher value, such as 64000, or remove the max_tokens entry for summarization requests.
Monitor the response to ensure that the summarization is complete and useful.

Example

No code snippet is provided as the issue is related to the configuration of the max_tokens limit.

Notes

The max_tokens limit may be set to prevent excessive resource usage, so increasing or removing it may have performance implications. It is essential to test and monitor the system after making changes to ensure that it can handle the increased load.

Recommendation

Apply a workaround by increasing the max_tokens limit or removing it for summarization requests, as the current limit is too restrictive for the required use case.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#prompt formatting #chain error #conversation history #tool integration #LLM response

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: Compression / Summarize In general Destroy Sessions? [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Additional Logs / Traceback (optional)

Root Cause

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Bug]: Compression / Summarize In general Destroy Sessions? [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Additional Logs / Traceback (optional)

Root Cause

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING