hermes - ✅(Solved) Fix max_tokens config from custom_providers is not passed to AIAgent [3 pull requests, 1 participants]

chengoak · 2026-05-05T02:48:36Z

[hermes] PR 19991: fix: properly pass model.max tokens config to AIAgent in gateway - Repository: NousResearch/hermes-agent - Author: chengoak - State: open |… # PR #19991: fix: properly pass model.max_tokens config to AIAgent in gateway - Repository: NousResearch/hermes-agent - Author: chengoak - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/19991 ## Description (problem / solution / changelog) ### What does this PR do? Fixes the issue where from config.yaml was not being passed to AIAgent when running via the gateway (Feishu, QQBot, etc.), causing model responses to be truncated due to conservative default output limits. ### Changes: 1. ****: Import and use to read from config 2. ****: Include in the runtime dict passed to AIAgent 3. ****: Include in fallback provider resolution 4. ****: Add parameter with config priority: CLI args > config file > model default ### Why is this needed? For custom providers like ByteDance Ark, the model default output token limit is quite conservative. When is configured but not passed through, users see warnings in platforms like Feishu. ### Testing - Verified that the config path correctly reads - All changes are backward compatible (None is passed when config is not set) - Gateway routes correctly unpack the runtime dict including max_tokens ## Changed files - `cli.py` (modified, +9/-0) - `gateway/run.py` (modified, +16/-1) - `hermes_cli/config.py` (modified, +5/-0) - `hermes_cli/runtime_provider.py` (modified, +15/-0) --- # PR #20121: fix(gateway): honor custom_providers max_tokens when constructing AIAgent - Repository: NousResearch/hermes-agent - Author: konsisumer - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/20121 ## Description (problem / solution / changelog) Per-provider `max_tokens` set under `custom_providers` (or the new-style `providers` dict) was dropped during config normalization and never reached `AIAgent`, so the gateway always used provider transport defaults regardless of the user's cap. ## What changed and why - `hermes_cli/config.py`: add `max_tokens` to `_KNOWN_KEYS` in `_normalize_custom_provider_entry` and preserve positive int values in the normalized entry — without this, the key was dropped (and a spurious "unknown config keys" warning was logged). - `hermes_cli/runtime_provider.py`: propagate `max_tokens` from `_get_named_custom_provider` (legacy list, v12 dict, and credential-pool branches) and from `_resolve_named_custom_runtime` so the resolved runtime dict carries the cap. - `gateway/run.py`: include `max_tokens` in `_resolve_runtime_agent_kwargs` and the fallback-provider helper (with a fallback to top-level `model.max_tokens`), and forward it through `_resolve_turn_agent_config` so `AIAgent(**turn_route["runtime"])` receives the value. - `tests/hermes_cli/test_custom_provider_max_tokens.py`: 10 new tests covering normalization (positive int, zero/negative rejection, non-int rejection, no spurious unknown-key warning), runtime propagation through the legacy list and v12 dict paths, omission semantics, and gateway precedence (runtime wins, falls back to `model.max_tokens`, returns `None` when neither is set). Precedence is now: `custom_providers[].max_tokens` (carried on the runtime dict) → `model.max_tokens` (global) → `None` (provider transport default). ## How to test - `pytest tests/hermes_cli/test_custom_provider_max_tokens.py -q` (10 passed locally) - `pytest tests/hermes_cli/test_runtime_provider_resolution.py tests/hermes_cli/test_custom_provider_context_length.py tests/hermes_cli/test_config.py -q` (173 passed) - Broader sweep: `pytest tests/hermes_cli/ tests/gateway/ -q` shows only pre-existing platform/flaky failures (systemd D-Bus on macOS, whatsapp/discord adapter tests, an SSE-keepalive timing test) that also fail on `main`. - Manual: set `custom_providers: [{name: ark, base_url: ..., max_tokens: 131072}]` and confirm via gateway logs that the agent's `max_tokens` is 131072 instead of the provider default. ## What platforms tested on - macOS on darwin-arm64 (local) Fixes #20004 ## Changed files - `gateway/run.py` (modified, +33/-0) - `hermes_cli/config.py` (modified, +5/-1) - `hermes_cli/runtime_provider.py` (modified, +18/-0) - `tests/hermes_cli/test_custom_provider_max_tokens.py` (added, +219/-0) --- # PR #20149: fix(gateway): honor max_tokens from custom_providers / providers entries (#20004) - Repository: NousResearch/hermes-agent - Author: Sanjays2402 - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/20149 ## Description (problem / solution / changelog) Closes #20004. A `max_tokens` value set on a `custom_providers` (or `providers`) entry was silently dropped: - `_normalize_custom_provider_entry` discarded the field as an unknown key. - Runtime resolution (`_get_named_custom_provider`, `_resolve_named_custom_runtime`) never lifted it onto th

hermes2026-05-05 02:48:36

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#20004•Fetched 2026-05-06 06:39:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

chengoak

Participants

chengoak

Timeline (top)

labeled ×4cross-referenced ×3

Fix Action

Fix / Workaround

Fix This is fixed by PR #19991 which:

Adds max_tokens to _KNOWN_KEYS in config normalization
Passes max_tokens through runtime provider resolution
Updates gateway to read max_tokens from runtime config first, then fallback to model config

PR fix notes

PR #19991: fix: properly pass model.max_tokens config to AIAgent in gateway

Repository: NousResearch/hermes-agent
Author: chengoak
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/19991

Description (problem / solution / changelog)

What does this PR do?

Fixes the issue where from config.yaml was not being passed to AIAgent when running via the gateway (Feishu, QQBot, etc.), causing model responses to be truncated due to conservative default output limits.

Changes:

****: Import and use to read from config
****: Include in the runtime dict passed to AIAgent
****: Include in fallback provider resolution
****: Add parameter with config priority: CLI args > config file > model default

Why is this needed?

For custom providers like ByteDance Ark, the model default output token limit is quite conservative. When is configured but not passed through, users see warnings in platforms like Feishu.

Testing

Verified that the config path correctly reads
All changes are backward compatible (None is passed when config is not set)
Gateway routes correctly unpack the runtime dict including max_tokens

Changed files

cli.py (modified, +9/-0)
gateway/run.py (modified, +16/-1)
hermes_cli/config.py (modified, +5/-0)
hermes_cli/runtime_provider.py (modified, +15/-0)

PR #20121: fix(gateway): honor custom_providers max_tokens when constructing AIAgent

Repository: NousResearch/hermes-agent
Author: konsisumer
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/20121

Description (problem / solution / changelog)

Per-provider max_tokens set under custom_providers (or the new-style providers dict) was dropped during config normalization and never reached AIAgent, so the gateway always used provider transport defaults regardless of the user's cap.

What changed and why

hermes_cli/config.py: add max_tokens to _KNOWN_KEYS in _normalize_custom_provider_entry and preserve positive int values in the normalized entry — without this, the key was dropped (and a spurious "unknown config keys" warning was logged).
hermes_cli/runtime_provider.py: propagate max_tokens from _get_named_custom_provider (legacy list, v12 dict, and credential-pool branches) and from _resolve_named_custom_runtime so the resolved runtime dict carries the cap.
gateway/run.py: include max_tokens in _resolve_runtime_agent_kwargs and the fallback-provider helper (with a fallback to top-level model.max_tokens), and forward it through _resolve_turn_agent_config so AIAgent(**turn_route["runtime"]) receives the value.
tests/hermes_cli/test_custom_provider_max_tokens.py: 10 new tests covering normalization (positive int, zero/negative rejection, non-int rejection, no spurious unknown-key warning), runtime propagation through the legacy list and v12 dict paths, omission semantics, and gateway precedence (runtime wins, falls back to model.max_tokens, returns None when neither is set).

Precedence is now: custom_providers[].max_tokens (carried on the runtime dict) → model.max_tokens (global) → None (provider transport default).

How to test

pytest tests/hermes_cli/test_custom_provider_max_tokens.py -q (10 passed locally)
pytest tests/hermes_cli/test_runtime_provider_resolution.py tests/hermes_cli/test_custom_provider_context_length.py tests/hermes_cli/test_config.py -q (173 passed)
Broader sweep: pytest tests/hermes_cli/ tests/gateway/ -q shows only pre-existing platform/flaky failures (systemd D-Bus on macOS, whatsapp/discord adapter tests, an SSE-keepalive timing test) that also fail on main.
Manual: set custom_providers: [{name: ark, base_url: ..., max_tokens: 131072}] and confirm via gateway logs that the agent's max_tokens is 131072 instead of the provider default.

What platforms tested on

macOS on darwin-arm64 (local)

Fixes #20004

Changed files

gateway/run.py (modified, +33/-0)
hermes_cli/config.py (modified, +5/-1)
hermes_cli/runtime_provider.py (modified, +18/-0)
tests/hermes_cli/test_custom_provider_max_tokens.py (added, +219/-0)

PR #20149: fix(gateway): honor max_tokens from custom_providers / providers entries (#20004)

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/20149

Description (problem / solution / changelog)

Closes #20004.

A max_tokens value set on a custom_providers (or providers) entry was silently dropped:

_normalize_custom_provider_entry discarded the field as an unknown key.
Runtime resolution (_get_named_custom_provider, _resolve_named_custom_runtime) never lifted it onto the runtime dict.
The gateway's _resolve_runtime_agent_kwargs only read model.max_tokens.

Result: an explicit per-endpoint output cap was overridden by either model.max_tokens (also ignored if absent) or the transport-layer hardcoded default (4096 for Anthropic Bedrock, 16384 for NVIDIA NIM, etc.).

Fix — three layers

Normalization (hermes_cli/config.py): add max_tokens to _KNOWN_KEYS; copy positive int values onto the normalized entry. Drop bogus values (zero/negative/string/bool) silently.
Runtime lift (hermes_cli/runtime_provider.py): new _attach_custom_provider_max_tokens helper centralises the validation. Called from all four lookup paths — providers-dict-by-key, providers-dict-by-display-name, legacy custom_providers list, and pool-backed resolution — so they can't drift.
Gateway resolution (gateway/run.py): documented priority chain in _resolve_runtime_agent_kwargs and _try_resolve_fallback_provider:
1. runtime['max_tokens'] — from the matched custom-provider entry
2. model.max_tokens — top-level config.yaml fallback
3. None → AIAgent / transport picks a provider-appropriate default

A tiny _coerce_max_tokens helper enforces the positive-int contract so a misconfigured max_tokens: 64K falls through cleanly instead of crashing the constructor.

Test

17 new cases in tests/hermes_cli/test_custom_provider_max_tokens.py covering:

normalization accept/reject for positive int / zero / negative / string / missing key,
the _attach_custom_provider_max_tokens helper across all input shapes (positive, zero, negative, string, None, missing, doesn't-overwrite),
end-to-end through _get_named_custom_provider for legacy custom_providers list, providers-dict-by-key, and providers-dict-by-display-name lookup paths.

Pre-existing related suites stay green:

tests/hermes_cli/test_runtime_provider_resolution.py (109 cases)
tests/hermes_cli/test_custom_provider_context_length.py (12 cases)
tests/hermes_cli/test_provider_config_validation.py (17 cases)

Notes

No call-site signature changes — value flows through turn_route['runtime'] via the existing **runtime splat into AIAgent.
Same chain applied in primary and auth-fallback resolution so a fallback kick-in doesn't silently change the output cap.
Related to #19991, but reimplemented from scratch against current main.

Changed files

gateway/run.py (modified, +46/-1)
hermes_cli/config.py (modified, +8/-1)
hermes_cli/runtime_provider.py (modified, +28/-0)
tests/hermes_cli/test_custom_provider_max_tokens.py (added, +233/-0)

Code Example

custom_providers:
   - name: ark
     max_tokens: 131072
     # ...

RAW_BUFFERClick to expand / collapse

Describe the bug

The max_tokens configuration set in custom_providers is not being properly passed to the AIAgent instance. Instead, the gateway always uses model.max_tokens from the global config, or falls back to provider-specific defaults (like 32k for Kimi).

To Reproduce

Set max_tokens in a custom_provider entry, e.g.:

custom_providers:
- name: ark
  max_tokens: 131072
  # ...

Remove or don't set model.max_tokens globally
Start a conversation through the gateway
Observe that max_tokens uses the provider default instead of 131072

Expected behavior The max_tokens value from custom_providers should be used, with fallback to model.max_tokens if provider-specific config is not present.

Fix This is fixed by PR #19991 which:

Adds max_tokens to _KNOWN_KEYS in config normalization
Passes max_tokens through runtime provider resolution
Updates gateway to read max_tokens from runtime config first, then fallback to model config

Relates to #19991

extent analysis

TL;DR

The issue can be fixed by applying the changes from PR #19991, which updates the configuration normalization and provider resolution to properly pass the max_tokens value from custom_providers to the AIAgent instance.

Guidance

Review the changes in PR #19991 to understand the necessary updates for fixing the issue.
Apply the three steps outlined in the Fix section: add max_tokens to _KNOWN_KEYS, pass max_tokens through runtime provider resolution, and update the gateway to read max_tokens from runtime config first.
Verify the fix by setting max_tokens in a custom provider entry, removing or not setting model.max_tokens globally, and observing that the correct max_tokens value is used.
If the PR #19991 is not yet merged, consider applying the changes manually or waiting for the official release.

Example

No code snippet is provided as the issue already references a specific PR with the necessary changes.

Notes

The fix relies on the changes introduced in PR #19991, which may not be available in all versions. Ensure that the changes are compatible with the current version being used.

Recommendation

Apply the workaround by manually applying the changes from PR #19991, as this is the most direct way to resolve the issue until an official release is available.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#chain error #conversation history #tool integration #LLM response #prompt template

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix max_tokens config from custom_providers is not passed to AIAgent [3 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

PR fix notes

PR #19991: fix: properly pass model.max_tokens config to AIAgent in gateway

Description (problem / solution / changelog)

What does this PR do?

Changes:

Why is this needed?

Testing

Changed files

PR #20121: fix(gateway): honor custom_providers max_tokens when constructing AIAgent

Description (problem / solution / changelog)

What changed and why

How to test

What platforms tested on

Changed files

PR #20149: fix(gateway): honor max_tokens from custom_providers / providers entries (#20004)

Description (problem / solution / changelog)

Fix — three layers

Test

Notes

Changed files

Code Example

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING