hermes - ✅(Solved) Fix feat: custom_providers should support per-provider max_tokens override [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#28782Fetched 2026-05-20 04:02:02
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Author
Participants
Timeline (top)
labeled ×5commented ×1cross-referenced ×1

Root Cause

This is problematic when:

  • A custom provider (e.g. Ark DeepSeek) needs an explicit max_tokens because auto-detection doesn't work
  • Fallback providers (e.g. MiniMax, NVIDIA) should NOT inherit that same max_tokens value

Fix Action

Fix / Workaround

Current workaround

PR fix notes

PR #28786: feat: per-provider max_tokens via custom_providers models.<model>.max_tokens

Description (problem / solution / changelog)

Summary

Adds per-provider max_tokens support to custom_providers, mirroring the existing context_length pattern. This allows users to set a provider-scoped output-token cap without affecting fallback providers.

Problem

model.max_tokens in config.yaml is global — it applies to all providers including fallbacks. There is no way to scope max_tokens to a specific provider, unlike context_length which already supports per-provider overrides.

Changes

hermes_cli/config.py

  • Adds get_custom_provider_max_tokens() — mirrors get_custom_provider_context_length() exactly. Matches by base_url + model name, returns models.<model>.max_tokens if present and valid.
  • Adds "max_tokens" to _KNOWN_KEYS in _normalize_custom_provider_entry so the top-level key is not flagged as unknown (defensive, for users who had it at top level).

agent/agent_init.py

  • After the global model.max_tokens fallback (and before context_length resolution), adds a second fallback that checks custom_providers for a per-provider max_tokens when agent.max_tokens is still None.

Usage

custom_providers:
  - name: My Provider
    base_url: https://example.com/v1
    api_key: ...
    model: my-model
    api_mode: chat_completions
    models:
      my-model:
        context_length: 1000000
        max_tokens: 131072       # ← new per-provider field

Related

Closes #28782

Changed files

  • agent/agent_init.py (modified, +17/-0)
  • hermes_cli/config.py (modified, +61/-1)

Code Example

custom_providers:
  - name: My Provider
    base_url: https://...
    api_key: ...
    model: my-model
    models:
      my-model:
        context_length: 1000000
        max_tokens: 131072       # new field, per-provider
RAW_BUFFERClick to expand / collapse

Problem

Currently, model.max_tokens in config.yaml is a global setting. When set, it applies to all providers including the fallback chain. There is no way to specify a per-provider max_tokens override, unlike context_length which already supports per-provider overrides via custom_providers[].models.<model>.context_length.

This is problematic when:

  • A custom provider (e.g. Ark DeepSeek) needs an explicit max_tokens because auto-detection doesn't work
  • Fallback providers (e.g. MiniMax, NVIDIA) should NOT inherit that same max_tokens value

Current workaround

Putting max_tokens in model: makes it global — every provider including fallbacks sends max_tokens=131072 in every API call. The only way to avoid this today is to leave max_tokens unset entirely and accept whatever default each provider chooses.

Proposed solution

Add max_tokens support to custom_providers[].models.<model>.max_tokens, following the exact same pattern as the existing context_length override:

custom_providers:
  - name: My Provider
    base_url: https://...
    api_key: ...
    model: my-model
    models:
      my-model:
        context_length: 1000000
        max_tokens: 131072       # new field, per-provider

Implementation scope

  1. hermes_cli/config.py — Add get_custom_provider_max_tokens() function parallel to get_custom_provider_context_length().
  2. agent/agent_init.py — After the existing model.max_tokens fallback (around line 1166), add a second fallback that checks custom_providers for a per-provider max_tokens when agent.max_tokens is still None.
  3. hermes_cli/main.py — Optionally update _save_custom_provider to save max_tokens into models.<model>.max_tokens.

Priority

Medium. Not a bug (everything works without it), but a missing feature that causes real confusion — users who set model.max_tokens expecting it to only affect their primary provider may inadvertently pollute their fallback API calls.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix feat: custom_providers should support per-provider max_tokens override [1 pull requests, 1 comments, 2 participants]