hermes - ✅(Solved) Fix max_tokens config from custom_providers is not passed to AIAgent [3 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#20004Fetched 2026-05-06 06:39:17
View on GitHub
Comments
0
Participants
1
Timeline
7
Reactions
0
Author
Participants
Timeline (top)
labeled ×4cross-referenced ×3

Fix Action

Fix / Workaround

Fix This is fixed by PR #19991 which:

  1. Adds max_tokens to _KNOWN_KEYS in config normalization
  2. Passes max_tokens through runtime provider resolution
  3. Updates gateway to read max_tokens from runtime config first, then fallback to model config

PR fix notes

PR #19991: fix: properly pass model.max_tokens config to AIAgent in gateway

Description (problem / solution / changelog)

What does this PR do?

Fixes the issue where from config.yaml was not being passed to AIAgent when running via the gateway (Feishu, QQBot, etc.), causing model responses to be truncated due to conservative default output limits.

Changes:

  1. ****: Import and use to read from config
  2. ****: Include in the runtime dict passed to AIAgent
  3. ****: Include in fallback provider resolution
  4. ****: Add parameter with config priority: CLI args > config file > model default

Why is this needed?

For custom providers like ByteDance Ark, the model default output token limit is quite conservative. When is configured but not passed through, users see warnings in platforms like Feishu.

Testing

  • Verified that the config path correctly reads
  • All changes are backward compatible (None is passed when config is not set)
  • Gateway routes correctly unpack the runtime dict including max_tokens

Changed files

  • cli.py (modified, +9/-0)
  • gateway/run.py (modified, +16/-1)
  • hermes_cli/config.py (modified, +5/-0)
  • hermes_cli/runtime_provider.py (modified, +15/-0)

PR #20121: fix(gateway): honor custom_providers max_tokens when constructing AIAgent

Description (problem / solution / changelog)

Per-provider max_tokens set under custom_providers (or the new-style providers dict) was dropped during config normalization and never reached AIAgent, so the gateway always used provider transport defaults regardless of the user's cap.

What changed and why

  • hermes_cli/config.py: add max_tokens to _KNOWN_KEYS in _normalize_custom_provider_entry and preserve positive int values in the normalized entry — without this, the key was dropped (and a spurious "unknown config keys" warning was logged).
  • hermes_cli/runtime_provider.py: propagate max_tokens from _get_named_custom_provider (legacy list, v12 dict, and credential-pool branches) and from _resolve_named_custom_runtime so the resolved runtime dict carries the cap.
  • gateway/run.py: include max_tokens in _resolve_runtime_agent_kwargs and the fallback-provider helper (with a fallback to top-level model.max_tokens), and forward it through _resolve_turn_agent_config so AIAgent(**turn_route["runtime"]) receives the value.
  • tests/hermes_cli/test_custom_provider_max_tokens.py: 10 new tests covering normalization (positive int, zero/negative rejection, non-int rejection, no spurious unknown-key warning), runtime propagation through the legacy list and v12 dict paths, omission semantics, and gateway precedence (runtime wins, falls back to model.max_tokens, returns None when neither is set).

Precedence is now: custom_providers[].max_tokens (carried on the runtime dict) → model.max_tokens (global) → None (provider transport default).

How to test

  • pytest tests/hermes_cli/test_custom_provider_max_tokens.py -q (10 passed locally)
  • pytest tests/hermes_cli/test_runtime_provider_resolution.py tests/hermes_cli/test_custom_provider_context_length.py tests/hermes_cli/test_config.py -q (173 passed)
  • Broader sweep: pytest tests/hermes_cli/ tests/gateway/ -q shows only pre-existing platform/flaky failures (systemd D-Bus on macOS, whatsapp/discord adapter tests, an SSE-keepalive timing test) that also fail on main.
  • Manual: set custom_providers: [{name: ark, base_url: ..., max_tokens: 131072}] and confirm via gateway logs that the agent's max_tokens is 131072 instead of the provider default.

What platforms tested on

  • macOS on darwin-arm64 (local)

Fixes #20004

<!-- autocontrib:worker-id=issue-new-ce3066c0 kind=pr-open -->

Changed files

  • gateway/run.py (modified, +33/-0)
  • hermes_cli/config.py (modified, +5/-1)
  • hermes_cli/runtime_provider.py (modified, +18/-0)
  • tests/hermes_cli/test_custom_provider_max_tokens.py (added, +219/-0)

PR #20149: fix(gateway): honor max_tokens from custom_providers / providers entries (#20004)

Description (problem / solution / changelog)

Closes #20004.

A max_tokens value set on a custom_providers (or providers) entry was silently dropped:

  • _normalize_custom_provider_entry discarded the field as an unknown key.
  • Runtime resolution (_get_named_custom_provider, _resolve_named_custom_runtime) never lifted it onto the runtime dict.
  • The gateway's _resolve_runtime_agent_kwargs only read model.max_tokens.

Result: an explicit per-endpoint output cap was overridden by either model.max_tokens (also ignored if absent) or the transport-layer hardcoded default (4096 for Anthropic Bedrock, 16384 for NVIDIA NIM, etc.).

Fix — three layers

  1. Normalization (hermes_cli/config.py): add max_tokens to _KNOWN_KEYS; copy positive int values onto the normalized entry. Drop bogus values (zero/negative/string/bool) silently.
  2. Runtime lift (hermes_cli/runtime_provider.py): new _attach_custom_provider_max_tokens helper centralises the validation. Called from all four lookup paths — providers-dict-by-key, providers-dict-by-display-name, legacy custom_providers list, and pool-backed resolution — so they can't drift.
  3. Gateway resolution (gateway/run.py): documented priority chain in _resolve_runtime_agent_kwargs and _try_resolve_fallback_provider:
    1. runtime['max_tokens'] — from the matched custom-provider entry
    2. model.max_tokens — top-level config.yaml fallback
    3. None → AIAgent / transport picks a provider-appropriate default

A tiny _coerce_max_tokens helper enforces the positive-int contract so a misconfigured max_tokens: 64K falls through cleanly instead of crashing the constructor.

Test

17 new cases in tests/hermes_cli/test_custom_provider_max_tokens.py covering:

  • normalization accept/reject for positive int / zero / negative / string / missing key,
  • the _attach_custom_provider_max_tokens helper across all input shapes (positive, zero, negative, string, None, missing, doesn't-overwrite),
  • end-to-end through _get_named_custom_provider for legacy custom_providers list, providers-dict-by-key, and providers-dict-by-display-name lookup paths.

Pre-existing related suites stay green:

  • tests/hermes_cli/test_runtime_provider_resolution.py (109 cases)
  • tests/hermes_cli/test_custom_provider_context_length.py (12 cases)
  • tests/hermes_cli/test_provider_config_validation.py (17 cases)

Notes

  • No call-site signature changes — value flows through turn_route['runtime'] via the existing **runtime splat into AIAgent.
  • Same chain applied in primary and auth-fallback resolution so a fallback kick-in doesn't silently change the output cap.
  • Related to #19991, but reimplemented from scratch against current main.

Changed files

  • gateway/run.py (modified, +46/-1)
  • hermes_cli/config.py (modified, +8/-1)
  • hermes_cli/runtime_provider.py (modified, +28/-0)
  • tests/hermes_cli/test_custom_provider_max_tokens.py (added, +233/-0)

Code Example

custom_providers:
   - name: ark
     max_tokens: 131072
     # ...
RAW_BUFFERClick to expand / collapse

Describe the bug

The max_tokens configuration set in custom_providers is not being properly passed to the AIAgent instance. Instead, the gateway always uses model.max_tokens from the global config, or falls back to provider-specific defaults (like 32k for Kimi).

To Reproduce

  1. Set max_tokens in a custom_provider entry, e.g.:
    custom_providers:
    - name: ark
      max_tokens: 131072
      # ...
  2. Remove or don't set model.max_tokens globally
  3. Start a conversation through the gateway
  4. Observe that max_tokens uses the provider default instead of 131072

Expected behavior The max_tokens value from custom_providers should be used, with fallback to model.max_tokens if provider-specific config is not present.

Fix This is fixed by PR #19991 which:

  1. Adds max_tokens to _KNOWN_KEYS in config normalization
  2. Passes max_tokens through runtime provider resolution
  3. Updates gateway to read max_tokens from runtime config first, then fallback to model config

Relates to #19991

extent analysis

TL;DR

The issue can be fixed by applying the changes from PR #19991, which updates the configuration normalization and provider resolution to properly pass the max_tokens value from custom_providers to the AIAgent instance.

Guidance

  • Review the changes in PR #19991 to understand the necessary updates for fixing the issue.
  • Apply the three steps outlined in the Fix section: add max_tokens to _KNOWN_KEYS, pass max_tokens through runtime provider resolution, and update the gateway to read max_tokens from runtime config first.
  • Verify the fix by setting max_tokens in a custom provider entry, removing or not setting model.max_tokens globally, and observing that the correct max_tokens value is used.
  • If the PR #19991 is not yet merged, consider applying the changes manually or waiting for the official release.

Example

No code snippet is provided as the issue already references a specific PR with the necessary changes.

Notes

The fix relies on the changes introduced in PR #19991, which may not be available in all versions. Ensure that the changes are compatible with the current version being used.

Recommendation

Apply the workaround by manually applying the changes from PR #19991, as this is the most direct way to resolve the issue until an official release is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING