hermes - 💡(How to fix) Fix feat(config): support per-model max_tokens in custom_providers config [1 participants]

uwings · 2026-04-24T09:53:01Z

[hermes] Problem When using models with large output limits e.g., DeepSeek V4 with 384K max output tokens , there is currently no way to configure max tokens p… ## Problem When using models with large output limits (e.g., DeepSeek V4 with 384K max_output_tokens), there is currently no way to configure `max_tokens` per-provider or per-model in `config.yaml`. The `custom_providers` section already supports `context_length` per-model, but `max_tokens` is not read from config at all. The `AIAgent.__init__` only accepts `max_tokens` as a direct parameter, and neither `cli.py` nor `gateway/run.py` pass any config-based `max_tokens` value through. This forces users who want to take advantage of large output limits (DeepSeek V4 384K, Gemini 2M, etc.) to either: 1. Accept the API server's default (which may be far lower than the model's capability) 2. Hardcode a global `max_tokens` that doesn't work across providers ## Technical Details Resolution chain currently works for `context_length`: - `config.yaml` -> `model. .models. .context_length` - `run_agent.py` reads from `custom_providers` config in `_config_context_length()` But no equivalent exists for `max_tokens`: - `run_agent.py:804` -- `self.max_tokens` only comes from `__init__` parameter - `run_agent.py:1366-1431` -- custom_providers loop only reads `context_length`, not `max_tokens` - `run_agent.py:6644-6645` -- API call only sends `max_tokens` if `self.max_tokens is not None` - `cli.py:2795-2922` -- no max_tokens passed to AIAgent - `gateway/run.py:960` -- same - Only `batch_runner.py:329` has `config.get("max_tokens")`, but reads root-level config, not model-level ## Proposed Solution Add `max_tokens` support to the `custom_providers` model config, similar to how `context_length` works: ```yaml custom_providers: deepseek-v4: base_url: https://api.deepseek.com/v1 models: deepseek-chat: context_length: 1000000 max_tokens: 384000 ``` And/or add a top-level `default_max_tokens` config key for models without explicit config. ## Additional Context - DeepSeek V4 (released 2026-04-24) supports up to 384K output tokens - Related issue: #9489 (retry-based max_tokens boost, different scope) - This is purely a config-reader enhancement -- no API behavior changes needed

hermes2026-04-24 09:53:01

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#15037•Fetched 2026-04-25 06:24:51

View on GitHub

Comments

Participants

Timeline

Reactions

Author

uwings

Participants

uwings

Timeline (top)

labeled ×4

Code Example

custom_providers:
  deepseek-v4:
    base_url: https://api.deepseek.com/v1
    models:
      deepseek-chat:
        context_length: 1000000
        max_tokens: 384000

RAW_BUFFERClick to expand / collapse

Problem

When using models with large output limits (e.g., DeepSeek V4 with 384K max_output_tokens), there is currently no way to configure max_tokens per-provider or per-model in config.yaml.

The custom_providers section already supports context_length per-model, but max_tokens is not read from config at all. The AIAgent.__init__ only accepts max_tokens as a direct parameter, and neither cli.py nor gateway/run.py pass any config-based max_tokens value through.

This forces users who want to take advantage of large output limits (DeepSeek V4 384K, Gemini 2M, etc.) to either:

Accept the API server's default (which may be far lower than the model's capability)
Hardcode a global max_tokens that doesn't work across providers

Technical Details

Resolution chain currently works for context_length:

config.yaml -> model.<provider_name>.models.<model_name>.context_length
run_agent.py reads from custom_providers config in _config_context_length()

But no equivalent exists for max_tokens:

run_agent.py:804 -- self.max_tokens only comes from __init__ parameter
run_agent.py:1366-1431 -- custom_providers loop only reads context_length, not max_tokens
run_agent.py:6644-6645 -- API call only sends max_tokens if self.max_tokens is not None
cli.py:2795-2922 -- no max_tokens passed to AIAgent
gateway/run.py:960 -- same
Only batch_runner.py:329 has config.get("max_tokens"), but reads root-level config, not model-level

Proposed Solution

Add max_tokens support to the custom_providers model config, similar to how context_length works:

custom_providers:
  deepseek-v4:
    base_url: https://api.deepseek.com/v1
    models:
      deepseek-chat:
        context_length: 1000000
        max_tokens: 384000

And/or add a top-level default_max_tokens config key for models without explicit config.

Additional Context

DeepSeek V4 (released 2026-04-24) supports up to 384K output tokens
Related issue: #9489 (retry-based max_tokens boost, different scope)
This is purely a config-reader enhancement -- no API behavior changes needed

extent analysis

TL;DR

Add max_tokens support to the custom_providers model config in config.yaml to allow per-model configuration.

Guidance

Update the custom_providers section in config.yaml to include max_tokens for each model, as shown in the proposed solution.
Modify run_agent.py to read max_tokens from the custom_providers config, similar to how context_length is handled.
Consider adding a top-level default_max_tokens config key for models without explicit config to provide a fallback value.
Review the changes to ensure that max_tokens is properly passed to the AIAgent and used in API calls.

Example

custom_providers:
  deepseek-v4:
    base_url: https://api.deepseek.com/v1
    models:
      deepseek-chat:
        context_length: 1000000
        max_tokens: 384000

Notes

The proposed solution only requires changes to the config reader and does not affect API behavior. The max_tokens value should be configurable per-model to take advantage of large output limits offered by certain providers.

Recommendation

Apply the proposed workaround by adding max_tokens support to the custom_providers model config, as it allows for per-model configuration and does not require any API behavior changes.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #API middleware #SSR setup #ISR setup #authentication setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix feat(config): support per-model max_tokens in custom_providers config [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem

Technical Details

Proposed Solution

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix feat(config): support per-model max_tokens in custom_providers config [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem

Technical Details

Proposed Solution

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING