hermes - 💡(How to fix) Fix model.max_tokens from config.yaml is ignored in both Gateway and CLI modes [1 comments, 1 participants]

hermes2026-04-17 06:19:38

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#11443•Fetched 2026-04-18 06:01:03

View on GitHub

Comments

Participants

Timeline

Reactions

Author

pydaxing

Participants

pydaxing

Timeline (top)

closed ×1commented ×1

Fix Action

Workaround

Manually patch gateway/run.py to bridge model.max_tokens from config into runtime_kwargs and turn_route["runtime"]. This survives until the next hermes update.

Code Example

model:
  default: aws.claude-sonnet-4.6
  provider: custom:friday
  max_tokens: 128000

---

# In _resolve_runtime_agent_kwargs() or wherever config is loaded:
cfg = _load_gateway_config()
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, dict):
    max_tokens = model_cfg.get("max_tokens")
    if max_tokens is not None:
        result["max_tokens"] = int(max_tokens)

RAW_BUFFERClick to expand / collapse

Bug Description

model.max_tokens set in ~/.hermes/config.yaml has no effect — it is never read or passed to AIAgent in either Gateway or CLI mode.

Steps to Reproduce

Set model.max_tokens in config.yaml:

model:
  default: aws.claude-sonnet-4.6
  provider: custom:friday
  max_tokens: 128000

Use Hermes via Gateway (API server / messaging platforms) or CLI (hermes chat)
The upstream API receives no max_tokens parameter in the request

Expected Behavior

The configured model.max_tokens should be passed to the AIAgent constructor and included in API requests as max_tokens (or max_completion_tokens for direct OpenAI).

Actual Behavior

Gateway mode (gateway/run.py): _resolve_runtime_agent_kwargs() only extracts provider-related fields (api_key, base_url, provider, api_mode, etc.) — max_tokens is not included.
CLI mode (cli.py): AIAgent(...) is constructed without max_tokens at line ~2872. No code reads model.max_tokens from config.
AIAgent.__init__ defaults to max_tokens=None (line 799 of run_agent.py)
When max_tokens is None, _build_api_kwargs() skips adding it to the request (line ~6563)

Impact

Most providers default to a reasonable output limit when max_tokens is omitted. However, some providers (e.g., AWS Bedrock proxied through custom endpoints) default to very low values like 1024 tokens, causing:

Tool call arguments getting truncated (finish_reason: length)
Long responses cut short (e.g., multi-step analysis, structured JSON output)
The agent entering a truncation → retry → give up loop, returning "Response truncated due to output length limit"

Affected Code Paths

gateway/run.py:_resolve_runtime_agent_kwargs() (line ~319) — does not read model.max_tokens from config
gateway/run.py:_resolve_turn_agent_config() (line ~967) — primary dict and route["runtime"] do not include max_tokens
cli.py (line ~2872) — AIAgent(...) constructor call missing max_tokens=
All other AIAgent(...) instantiation points in gateway/run.py (lines ~5690, ~5871, ~8572)

Suggested Fix

Read model.max_tokens from config and pass it through to AIAgent:

# In _resolve_runtime_agent_kwargs() or wherever config is loaded:
cfg = _load_gateway_config()
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, dict):
    max_tokens = model_cfg.get("max_tokens")
    if max_tokens is not None:
        result["max_tokens"] = int(max_tokens)

And ensure max_tokens flows through resolve_turn_route() → route["runtime"] → AIAgent(max_tokens=...).

Environment

Hermes v0.9.0 (2026.4.13)
Provider: custom:friday (AWS Bedrock proxy via https://aigc.sankuai.com/v1/openai/native)
Model: aws.claude-sonnet-4.6
OS: macOS (arm64)

Workaround

Manually patch gateway/run.py to bridge model.max_tokens from config into runtime_kwargs and turn_route["runtime"]. This survives until the next hermes update.

extent analysis

TL;DR

The most likely fix is to read model.max_tokens from the config file and pass it to the AIAgent constructor.

Guidance

Verify that the model.max_tokens value is correctly set in the ~/.hermes/config.yaml file.
Check the gateway/run.py and cli.py files to ensure that the max_tokens parameter is being read from the config file and passed to the AIAgent constructor.
Apply the suggested fix by reading model.max_tokens from config and passing it through to AIAgent, as shown in the provided code snippet.
Ensure that the max_tokens value flows through resolve_turn_route() → route["runtime"] → AIAgent(max_tokens=...).

Example

# In _resolve_runtime_agent_kwargs() or wherever config is loaded:
cfg = _load_gateway_config()
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, dict):
    max_tokens = model_cfg.get("max_tokens")
    if max_tokens is not None:
        result["max_tokens"] = int(max_tokens)

Notes

The provided fix assumes that the model.max_tokens value is correctly set in the config file and that the AIAgent constructor is being called with the correct parameters. If the issue persists, further debugging may be necessary.

Recommendation

Apply the workaround by manually patching gateway/run.py to bridge model.max_tokens from config into runtime_kwargs and turn_route["runtime"], as this will survive until the next hermes update.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #tokenizer error #prompt formatting #chain error #conversation history

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix model.max_tokens from config.yaml is ignored in both Gateway and CLI modes [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Workaround

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Impact

Affected Code Paths

Suggested Fix

Environment

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix model.max_tokens from config.yaml is ignored in both Gateway and CLI modes [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Workaround

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Impact

Affected Code Paths

Suggested Fix

Environment

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING