hermes - 💡(How to fix) Fix model.max_tokens from config.yaml is ignored in both Gateway and CLI modes [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#11443Fetched 2026-04-18 06:01:03
View on GitHub
Comments
1
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
closed ×1commented ×1

Fix Action

Workaround

Manually patch gateway/run.py to bridge model.max_tokens from config into runtime_kwargs and turn_route["runtime"]. This survives until the next hermes update.

Code Example

model:
  default: aws.claude-sonnet-4.6
  provider: custom:friday
  max_tokens: 128000

---

# In _resolve_runtime_agent_kwargs() or wherever config is loaded:
cfg = _load_gateway_config()
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, dict):
    max_tokens = model_cfg.get("max_tokens")
    if max_tokens is not None:
        result["max_tokens"] = int(max_tokens)
RAW_BUFFERClick to expand / collapse

Bug Description

model.max_tokens set in ~/.hermes/config.yaml has no effect — it is never read or passed to AIAgent in either Gateway or CLI mode.

Steps to Reproduce

  1. Set model.max_tokens in config.yaml:
model:
  default: aws.claude-sonnet-4.6
  provider: custom:friday
  max_tokens: 128000
  1. Use Hermes via Gateway (API server / messaging platforms) or CLI (hermes chat)

  2. The upstream API receives no max_tokens parameter in the request

Expected Behavior

The configured model.max_tokens should be passed to the AIAgent constructor and included in API requests as max_tokens (or max_completion_tokens for direct OpenAI).

Actual Behavior

  • Gateway mode (gateway/run.py): _resolve_runtime_agent_kwargs() only extracts provider-related fields (api_key, base_url, provider, api_mode, etc.) — max_tokens is not included.
  • CLI mode (cli.py): AIAgent(...) is constructed without max_tokens at line ~2872. No code reads model.max_tokens from config.
  • AIAgent.__init__ defaults to max_tokens=None (line 799 of run_agent.py)
  • When max_tokens is None, _build_api_kwargs() skips adding it to the request (line ~6563)

Impact

Most providers default to a reasonable output limit when max_tokens is omitted. However, some providers (e.g., AWS Bedrock proxied through custom endpoints) default to very low values like 1024 tokens, causing:

  • Tool call arguments getting truncated (finish_reason: length)
  • Long responses cut short (e.g., multi-step analysis, structured JSON output)
  • The agent entering a truncation → retry → give up loop, returning "Response truncated due to output length limit"

Affected Code Paths

  1. gateway/run.py:_resolve_runtime_agent_kwargs() (line ~319) — does not read model.max_tokens from config
  2. gateway/run.py:_resolve_turn_agent_config() (line ~967) — primary dict and route["runtime"] do not include max_tokens
  3. cli.py (line ~2872) — AIAgent(...) constructor call missing max_tokens=
  4. All other AIAgent(...) instantiation points in gateway/run.py (lines ~5690, ~5871, ~8572)

Suggested Fix

Read model.max_tokens from config and pass it through to AIAgent:

# In _resolve_runtime_agent_kwargs() or wherever config is loaded:
cfg = _load_gateway_config()
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, dict):
    max_tokens = model_cfg.get("max_tokens")
    if max_tokens is not None:
        result["max_tokens"] = int(max_tokens)

And ensure max_tokens flows through resolve_turn_route()route["runtime"]AIAgent(max_tokens=...).

Environment

  • Hermes v0.9.0 (2026.4.13)
  • Provider: custom:friday (AWS Bedrock proxy via https://aigc.sankuai.com/v1/openai/native)
  • Model: aws.claude-sonnet-4.6
  • OS: macOS (arm64)

Workaround

Manually patch gateway/run.py to bridge model.max_tokens from config into runtime_kwargs and turn_route["runtime"]. This survives until the next hermes update.

extent analysis

TL;DR

The most likely fix is to read model.max_tokens from the config file and pass it to the AIAgent constructor.

Guidance

  • Verify that the model.max_tokens value is correctly set in the ~/.hermes/config.yaml file.
  • Check the gateway/run.py and cli.py files to ensure that the max_tokens parameter is being read from the config file and passed to the AIAgent constructor.
  • Apply the suggested fix by reading model.max_tokens from config and passing it through to AIAgent, as shown in the provided code snippet.
  • Ensure that the max_tokens value flows through resolve_turn_route()route["runtime"]AIAgent(max_tokens=...).

Example

# In _resolve_runtime_agent_kwargs() or wherever config is loaded:
cfg = _load_gateway_config()
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, dict):
    max_tokens = model_cfg.get("max_tokens")
    if max_tokens is not None:
        result["max_tokens"] = int(max_tokens)

Notes

The provided fix assumes that the model.max_tokens value is correctly set in the config file and that the AIAgent constructor is being called with the correct parameters. If the issue persists, further debugging may be necessary.

Recommendation

Apply the workaround by manually patching gateway/run.py to bridge model.max_tokens from config into runtime_kwargs and turn_route["runtime"], as this will survive until the next hermes update.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING