hermes - 💡(How to fix) Fix [Bug]: Iteration-exhaustion summary call has no output budget when agent.max

Error Message

Collecting debug report... Traceback (most recent call last): File "/Users/paulie/.hermes/hermes-agent/venv/bin/hermes", line 10, in <module> sys.exit(main()) ^^^^^^ File "/Users/paulie/.hermes/hermes-agent/hermes_cli/main.py", line 14612, in main args.func(args) File "/Users/paulie/.hermes/hermes-agent/hermes_cli/main.py", line 6310, in cmd_debug run_debug(args) File "/Users/paulie/.hermes/hermes-agent/hermes_cli/debug.py", line 725, in run_debug run_debug_share(args) File "/Users/paulie/.hermes/hermes-agent/hermes_cli/debug.py", line 599, in run_debug_share dump_text = _capture_dump() ^^^^^^^^^^^^^^^ File "/Users/paulie/.hermes/hermes-agent/hermes_cli/debug.py", line 523, in _capture_dump run_dump(_FakeArgs()) File "/Users/paulie/.hermes/hermes-agent/hermes_cli/dump.py", line 329, in run_dump lines.append(f" mcp_servers: {_count_mcp_servers(config)}") ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/paulie/.hermes/hermes-agent/hermes_cli/dump.py", line 104, in _count_mcp_servers servers = mcp.get("servers", {}) ^^^^^^^ AttributeError: 'NoneType' object has no attribute 'get'

Root Cause

When a session reaches the 90/90 iteration limit and handle_max_iterations() fires, the summary call that produces the final handoff omits max_tokens when agent.max_tokens is None (the default). This causes:

Cloud providers (OpenAI, OpenRouter): default to ~4-8K output tokens — insufficient to summarize 90 tool turns → truncated summary
Local llama.cpp: no cap at all → generates until KV cache saturates, leaving the router child in an unrecoverable state requiring a full server restart
Anthropic: rejects the call outright with a 400 error because max_tokens is mandatory on their API

Code Example

Collecting debug report...
Traceback (most recent call last):
  File "/Users/paulie/.hermes/hermes-agent/venv/bin/hermes", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/main.py", line 14612, in main
    args.func(args)
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/main.py", line 6310, in cmd_debug
    run_debug(args)
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/debug.py", line 725, in run_debug
    run_debug_share(args)
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/debug.py", line 599, in run_debug_share
    dump_text = _capture_dump()
                ^^^^^^^^^^^^^^^
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/debug.py", line 523, in _capture_dump
    run_dump(_FakeArgs())
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/dump.py", line 329, in run_dump
    lines.append(f"  mcp_servers:        {_count_mcp_servers(config)}")
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/dump.py", line 104, in _count_mcp_servers
    servers = mcp.get("servers", {})
              ^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

---

Bug Description

Cloud providers (OpenAI, OpenRouter): default to ~4-8K output tokens — insufficient to summarize 90 tool turns → truncated summary
Local llama.cpp: no cap at all → generates until KV cache saturates, leaving the router child in an unrecoverable state requiring a full server restart
Anthropic: rejects the call outright with a 400 error because max_tokens is mandatory on their API

Steps to Reproduce

Run Hermes with default config (agent.max_tokens not set in config.yaml)
Use a steer-heavy workflow that burns through the 90-turn iteration budget
When budget exhausts, handle_max_iterations() fires to produce a final summary
The summary call is made without max_tokens in the request body
Result: truncated response (cloud), router corruption (local llama.cpp), or 400 error (Anthropic)

Expected Behavior

The summary call should always have an explicit output budget. When agent.max_tokens is None, a sensible default (e.g. 16384 tokens) should be used — enough for a coherent handoff summary without risking unbounded generation.

Actual Behavior

File: agent/chat_completion_helpers.py, function handle_max_iterations(), lines ~1375-1377: if agent.max_tokens is not None: summary_kwargs.update(agent._max_tokens_param(agent.max_tokens)) # else: falls through — no max_tokens in the request at all When agent.max_tokens is None, the else branch is empty. The API call proceeds without an output budget.

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

Collecting debug report...
Traceback (most recent call last):
  File "/Users/paulie/.hermes/hermes-agent/venv/bin/hermes", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/main.py", line 14612, in main
    args.func(args)
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/main.py", line 6310, in cmd_debug
    run_debug(args)
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/debug.py", line 725, in run_debug
    run_debug_share(args)
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/debug.py", line 599, in run_debug_share
    dump_text = _capture_dump()
                ^^^^^^^^^^^^^^^
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/debug.py", line 523, in _capture_dump
    run_dump(_FakeArgs())
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/dump.py", line 329, in run_dump
    lines.append(f"  mcp_servers:        {_count_mcp_servers(config)}")
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulie/.hermes/hermes-agent/hermes_cli/dump.py", line 104, in _count_mcp_servers
    servers = mcp.get("servers", {})
              ^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

Operating System

macOS 15.5 (arm64)

Python Version

3.11.15

Hermes Version

v0.15.1 (2026.5.29)

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

File: agent/chat_completion_helpers.py, function handle_max_iterations(). Lines 1374-1377 only set max_tokens on the summary call when agent.max_tokens is configured: if agent.max_tokens is not None: summary_kwargs.update(agent._max_tokens_param(agent.max_tokens)) When agent.max_tokens is None (default — not set in config.yaml), the else branch is empty. The summary API call goes out without an output budget. Fix: add an else clause with a sensible default: if agent.max_tokens is not None: summary_kwargs.update(agent._max_tokens_param(agent.max_tokens)) else: summary_kwargs.update(agent._max_tokens_param(16384)) 16384 is 4x the _boost_base used in the mid-turn streaming continuation and well within the 32,768 cap shared by that mechanism — sufficient for a coherent handoff summary across 90 tool turns.

Proposed Fix (optional)

Add an else branch at agent/chat_completion_helpers.py:1377: if agent.max_tokens is not None: summary_kwargs.update(agent._max_tokens_param(agent.max_tokens)) else: # Default output budget for summary call — prevents unbounded # generation on local models and gives cloud providers sufficient # room for a coherent handoff. summary_kwargs.update(agent._max_tokens_param(16384)) Three lines, fixes all four provider categories (OpenAI, OpenRouter, local llama.cpp, Anthropic).

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: Iteration-exhaustion summary call has no output budget when agent.max_tokens is unset [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis