hermes - 💡(How to fix) Fix [Bug]: 128K token， cannot be used [1 participants]

hermes2026-04-27 16:58:38

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#16649•Fetched 2026-04-28 06:51:56

View on GitHub

Comments

Participants

Timeline

Reactions

Author

syser09

Participants

syser09

Timeline (top)

labeled ×4

Error Message

⚠️ API call failed (attempt 1/3): APIError 🔌 Provider: custom Model: Qwen3.6-35B-A3B-UD-Q4_K_M.gguf 🌐 Endpoint: http://192.168.1.13:8080/v1 📝 Error: Context size has been exceeded. ⚠️ Context length exceeded — stepping down: 131,072 → 128,000 tokens 🗜️ Context too large (~42,664 tokens) — compressing (1/3)...

Root Cause

Root Cause Analysis (optional)

Code Example

none

---

⚠️  API call failed (attempt 1/3): APIError
   🔌 Provider: custom  Model: Qwen3.6-35B-A3B-UD-Q4_K_M.gguf
   🌐 Endpoint: http://192.168.1.13:8080/v1
   📝 Error: Context size has been exceeded.
⚠️  Context length exceeded — stepping down: 131,072 → 128,000 tokens
🗜️ Context too large (~42,664 tokens) — compressing (1/3)...

RAW_BUFFERClick to expand / collapse

Bug Description

model ：Qwen3.6-35B-A3B-UD-Q4_K_M.gguf LLM Inference Platform ： Ollama 、 Llama.ccp tatus: The Hermes Agent and LLM Inference Platform were configured to attempt context lengths of 256K and 128K. However, each session encounters an error and fails to function properly when the context length approaches approximately 68K tokens. Error：⚠️ API call failed (attempt 1/3): APIError 🔌 Provider: custom Model: Qwen3.6-35B-A3B-UD-Q4_K_M.gguf 🌐 Endpoint: http://192.168.0.3:8080/v1 📝 Error: Context size has been exceeded. ⚠️ Context length exceeded — stepping down: 131,072 → 128,000 tokens 🗜️ Context too large (~42,664 tokens) — compressing (1/3)...

Steps to Reproduce

1.run "hermes" 2.ask agent do the job 3.almost 68k , error pop-up

Expected Behavior

nothing to do , only waitting for the agent job finish

Actual Behavior

hermes agent chat down

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

No response

Debug Report

none

Operating System

ubuntu24.04

Python Version

3.11.15

Hermes Version

0.11.0

Additional Logs / Traceback (optional)

⚠️  API call failed (attempt 1/3): APIError
   🔌 Provider: custom  Model: Qwen3.6-35B-A3B-UD-Q4_K_M.gguf
   🌐 Endpoint: http://192.168.1.13:8080/v1
   📝 Error: Context size has been exceeded.
⚠️  Context length exceeded — stepping down: 131,072 → 128,000 tokens
🗜️ Context too large (~42,664 tokens) — compressing (1/3)...

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

extent analysis

TL;DR

Reduce the context length to below 68K tokens to prevent the API call from failing due to exceeded context size.

Guidance

Verify the current context length and adjust it to a value below 68K tokens to prevent errors.
Check the Hermes Agent and LLM Inference Platform configurations to ensure they are set to handle context lengths below 68K tokens.
Consider implementing context compression or splitting to handle larger context sizes.
Review the error logs to confirm that the issue is indeed related to context size exceeding the limit.

Example

No code snippet is provided as the issue does not imply a specific code change.

Notes

The root cause of the issue appears to be the context size exceeding the limit, but the exact solution may depend on the specific requirements of the Hermes Agent and LLM Inference Platform.

Recommendation

Apply workaround: Reduce the context length to prevent errors, as the exact fix may require further investigation into the Hermes Agent and LLM Inference Platform configurations.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #parallel task #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: 128K token， cannot be used [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root Cause Analysis (optional)

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Bug]: 128K token， cannot be used [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root Cause Analysis (optional)

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING