hermes - 💡(How to fix) Fix custom_providers.models should support per-model context_length

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

  • Set it high → switching to a small-context model mid-session risks exceeding its limit and getting a 400 error

Code Example

custom_providers:
  - name: my-api
    base_url: http://...
    model: default-model
    models:
      glm-4.7:
        context_length: 131072
      minimax-m2.7:
        context_length: 1048576

---

glm-4.7:
        context_length: 131072
        compression:
          threshold: 0.7
          target_ratio: 0.3
RAW_BUFFERClick to expand / collapse

Feature Description

When configuring multiple models under a single custom_providers entry, models often have vastly different maximum context lengths (e.g., GLM-4.7 at 128K vs MiniMax-M2.7 at 1M+).

Currently model.context_length is a global setting — it applies to all models regardless of which one is active. This forces a compromise:

  • Set it low → large-context models are underutilized
  • Set it high → switching to a small-context model mid-session risks exceeding its limit and getting a 400 error

Proposed Solution

Extend the models: dict in custom_providers to accept per-model context_length:

custom_providers:
  - name: my-api
    base_url: http://...
    model: default-model
    models:
      glm-4.7:
        context_length: 131072
      minimax-m2.7:
        context_length: 1048576

When a per-model context_length is set, it overrides the global model.context_length for that model. When the user switches via /model, the context_length follows the model.

Optionally, compression settings could also be per-model:

      glm-4.7:
        context_length: 131072
        compression:
          threshold: 0.7
          target_ratio: 0.3

Motivation

I run a New API proxy that hosts 8+ models from different providers (GLM, MiniMax, Qwen, Kimi) with context windows ranging from 128K to 1M+. In a code-generation session, I frequently /model between them depending on the task. The global context_length leaves value on the table or risks request failures, depending on which way I tune it.

Alternatives Considered

  1. Profiles — works but requires hermes profile switch instead of a simple /model, more friction
  2. Manual override before switchinghermes config set model.context_length ... then /reset, loses session context
  3. Min-common-denominator — set 128K globally, wastes 1M models on long code sessions

Implementation Notes

The change should be in the model resolution logic: when reading model.context_length for the currently active model, first check if the active provider is a custom_providers entry and if that entry's models[name] has a context_length field. If so, use it. Otherwise fall back to the global model.context_length.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix custom_providers.models should support per-model context_length