hermes - 💡(How to fix) Fix feat(config): support per-model max_tokens in custom_providers config [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#15037Fetched 2026-04-25 06:24:51
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×4

Code Example

custom_providers:
  deepseek-v4:
    base_url: https://api.deepseek.com/v1
    models:
      deepseek-chat:
        context_length: 1000000
        max_tokens: 384000
RAW_BUFFERClick to expand / collapse

Problem

When using models with large output limits (e.g., DeepSeek V4 with 384K max_output_tokens), there is currently no way to configure max_tokens per-provider or per-model in config.yaml.

The custom_providers section already supports context_length per-model, but max_tokens is not read from config at all. The AIAgent.__init__ only accepts max_tokens as a direct parameter, and neither cli.py nor gateway/run.py pass any config-based max_tokens value through.

This forces users who want to take advantage of large output limits (DeepSeek V4 384K, Gemini 2M, etc.) to either:

  1. Accept the API server's default (which may be far lower than the model's capability)
  2. Hardcode a global max_tokens that doesn't work across providers

Technical Details

Resolution chain currently works for context_length:

  • config.yaml -> model.<provider_name>.models.<model_name>.context_length
  • run_agent.py reads from custom_providers config in _config_context_length()

But no equivalent exists for max_tokens:

  • run_agent.py:804 -- self.max_tokens only comes from __init__ parameter
  • run_agent.py:1366-1431 -- custom_providers loop only reads context_length, not max_tokens
  • run_agent.py:6644-6645 -- API call only sends max_tokens if self.max_tokens is not None
  • cli.py:2795-2922 -- no max_tokens passed to AIAgent
  • gateway/run.py:960 -- same
  • Only batch_runner.py:329 has config.get("max_tokens"), but reads root-level config, not model-level

Proposed Solution

Add max_tokens support to the custom_providers model config, similar to how context_length works:

custom_providers:
  deepseek-v4:
    base_url: https://api.deepseek.com/v1
    models:
      deepseek-chat:
        context_length: 1000000
        max_tokens: 384000

And/or add a top-level default_max_tokens config key for models without explicit config.

Additional Context

  • DeepSeek V4 (released 2026-04-24) supports up to 384K output tokens
  • Related issue: #9489 (retry-based max_tokens boost, different scope)
  • This is purely a config-reader enhancement -- no API behavior changes needed

extent analysis

TL;DR

Add max_tokens support to the custom_providers model config in config.yaml to allow per-model configuration.

Guidance

  • Update the custom_providers section in config.yaml to include max_tokens for each model, as shown in the proposed solution.
  • Modify run_agent.py to read max_tokens from the custom_providers config, similar to how context_length is handled.
  • Consider adding a top-level default_max_tokens config key for models without explicit config to provide a fallback value.
  • Review the changes to ensure that max_tokens is properly passed to the AIAgent and used in API calls.

Example

custom_providers:
  deepseek-v4:
    base_url: https://api.deepseek.com/v1
    models:
      deepseek-chat:
        context_length: 1000000
        max_tokens: 384000

Notes

The proposed solution only requires changes to the config reader and does not affect API behavior. The max_tokens value should be configurable per-model to take advantage of large output limits offered by certain providers.

Recommendation

Apply the proposed workaround by adding max_tokens support to the custom_providers model config, as it allows for per-model configuration and does not require any API behavior changes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix feat(config): support per-model max_tokens in custom_providers config [1 participants]