hermes - 💡(How to fix) Fix [Bug]: Hardcoded MINIMUM_CONTEXT_LENGTH = 64_000 deadlocks auto-compression and causes infinite tool loops on high-context models [2 pull requests]

Error Message

Escalate a malformed JSON tool call parsing error to a ContextLengthError so the agent recovers, instead of silently discarding the tool arguments and repeating the call indefinitely.

Additional Logs / Traceback (optional)

Silent Parser Failure: In run_agent.py, when a tool call fails parsing due to truncation under high context, the parser logs a warning and returns {} instead of raising a catchable exception.
Truncation-Aware Exception: In run_agent.py, if a tool call argument exhibits truncation signatures (ends with ..., contains open quotes/braces, or is otherwise unrepairable), raise a ValueError("Truncated tool call arguments due to context length limits") containing 'context length' to proactively trigger the ContextCompressor instead of returning {}.

Root Cause

Because MINIMUM_CONTEXT_LENGTH is hardcoded to 64,000, the context compressor cannot trigger or compress history below 64k. However, the model's structured output degrades well before 64k tokens, generating truncated JSON tool calls for fact_store that are silently discarded as "unrepairable warnings," trapping the agent in an infinite, token-wasting loop on every subsequent turn.

N/A - Dynamic Docker NAS Environment (Standard build logs supplied below) --- WARNING run_agent: Unrepairable tool_call arguments for fact_store — replaced with empty object (was: {"action": "add", "category": "user_pref", "content": "User...")

Bug Description

During long conversations where context size grows beyond ~60,000 tokens on models with large physical windows (e.g. Gemini 3.5 Flash via direct API or OpenRouter), the background review agent (review_agent) frequently fails to persist new memories.

Steps to Reproduce

Run a session using a model with a massive context window (e.g., gemini-3.5-flash with a 1M token window).
Accumulate conversation history to ~60,000 tokens.
Introduce a new high-priority preference (e.g., "My workstation name is GOD").
Complete the turn and monitor agent.log for the warning.

Expected Behavior

The system should either:

Automatically trigger context compression when structured tool calling reliability drops below the 64,000 limit.
Escalate a malformed JSON tool call parsing error to a ContextLengthError so the agent recovers, instead of silently discarding the tool arguments and repeating the call indefinitely.

Actual Behavior

The background review agent gets trapped in a loop, wasting tokens. The log shows: WARNING run_agent: Unrepairable tool_call arguments for fact_store — replaced with empty object (was: {"action": "add", "category": "user_pref", "content": "Dieter...")

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

Debug Report

N/A - Dynamic Docker NAS Environment (Standard build logs supplied below)

Operating System

Linux (Synology DSM 7.x Docker container)

Python Version

3.13 (Native to latest nousresearch/hermes-agent:latest container)

Hermes Version

v0.14.0

Additional Logs / Traceback (optional)

WARNING run_agent: Unrepairable tool_call arguments for fact_store — replaced with empty object (was: {"action": "add", "category": "user_pref", "content": "User...")

Root Cause Analysis (optional)

Hardcoded Safety Floor: In agent/model_metadata.py, MINIMUM_CONTEXT_LENGTH is pinned to 64_000. This overrides custom compression thresholds or overridden model contexts: threshold_tokens = max(int(model_context * threshold), MINIMUM_CONTEXT_LENGTH)
Silent Parser Failure: In run_agent.py, when a tool call fails parsing due to truncation under high context, the parser logs a warning and returns {} instead of raising a catchable exception.
The Deadlock: The compressor is blocked from running below 64k, but tool parsing consistently fails at ~60k, creating a loop where the same memory is retried with progressively larger contexts on every subsequent turn.

Proposed Fix (optional)

Configuration Parameter: Move MINIMUM_CONTEXT_LENGTH out of model_metadata.py constants and make it a configurable property in config.yaml (e.g. compression.minimum_context_floor, allowing overrides down to 16_000).
Truncation-Aware Exception: In run_agent.py, if a tool call argument exhibits truncation signatures (ends with ..., contains open quotes/braces, or is otherwise unrepairable), raise a ValueError("Truncated tool call arguments due to context length limits") containing 'context length' to proactively trigger the ContextCompressor instead of returning {}.

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: Hardcoded MINIMUM_CONTEXT_LENGTH = 64_000 deadlocks auto-compression and causes infinite tool loops on high-context models [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Additional Logs / Traceback (optional)

Root Cause

Fix Action

Fixed

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Still need to ship something?

TRENDING