hermes - 💡(How to fix) Fix Per-model or per-provider compression threshold overrides [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18733Fetched 2026-05-03 04:54:36
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×4

As a user running MiMo V2.5 Pro (1M context) as main model, I set threshold to 0.5 but this effectively means compression almost never triggers. Lowering to 0.3 would make more sense for 1M models, but that same 0.3 would be too aggressive for 128K models in the fallback chain.

Root Cause

As a user running MiMo V2.5 Pro (1M context) as main model, I set threshold to 0.5 but this effectively means compression almost never triggers. Lowering to 0.3 would make more sense for 1M models, but that same 0.3 would be too aggressive for 128K models in the fallback chain.

Code Example

compression:
  enabled: true
  threshold: 0.5          # global default
  target_ratio: 0.2
  providers:
    xiaomi:
      threshold: 0.3       # for 1M context models, compress earlier
    anthropic:
      threshold: 0.65      # keep Claude existing behavior

---

compression:
  threshold: 0.5
  model_overrides:
    - model: mimo-v2.5-pro
      threshold: 0.3
    - model: claude-sonnet-4
      threshold: 0.65
RAW_BUFFERClick to expand / collapse

Feature Description

Allow setting different compression.threshold values per model or per provider, instead of a single global value.

Motivation

With 1M+ context models becoming more common (e.g. DeepSeek V4 Flash, MiMo V2.5 Pro, Gemini 2.5 Pro), a single global threshold creates a tension:

  • Large context models (1M): A threshold of 0.5 means compression only triggers at 500K tokens — most conversations never reach this, so the feature is effectively unused.
  • Small context models (128K): The same 0.5 threshold triggers at 64K, which is reasonable.

Users who switch between models (e.g. MiMo 1M for main chat, Claude 200K for coding) need different compression strategies. A global ratio that works well for 128K models is too conservative for 1M models.

Proposed Solution

Option A — Per-provider override:

compression:
  enabled: true
  threshold: 0.5          # global default
  target_ratio: 0.2
  providers:
    xiaomi:
      threshold: 0.3       # for 1M context models, compress earlier
    anthropic:
      threshold: 0.65      # keep Claude existing behavior

Option B — Per-model override:

compression:
  threshold: 0.5
  model_overrides:
    - model: mimo-v2.5-pro
      threshold: 0.3
    - model: claude-sonnet-4
      threshold: 0.65

Option A is simpler and covers most cases (providers tend to have similar context sizes). Option B is more granular but adds config complexity.

Alternatives Considered

  • Dynamic threshold based on context size: Automatically lower the threshold ratio as context size grows (e.g. threshold = min(0.5, 100000 / context_length)). This would be zero-config but less predictable.
  • Absolute token threshold: Instead of a ratio, use a fixed token count (e.g. compress at 64K tokens regardless of model). Simpler reasoning but loses the relative scaling.

Context

As a user running MiMo V2.5 Pro (1M context) as main model, I set threshold to 0.5 but this effectively means compression almost never triggers. Lowering to 0.3 would make more sense for 1M models, but that same 0.3 would be too aggressive for 128K models in the fallback chain.

extent analysis

TL;DR

Implement per-provider or per-model compression threshold overrides to address the issue of a single global threshold not being suitable for models of different sizes.

Guidance

  • Consider using Option A (per-provider override) for a simpler configuration that covers most cases, as providers tend to have similar context sizes.
  • Evaluate Option B (per-model override) for more granular control, but be aware that it adds config complexity.
  • Assess the trade-offs between predictability and simplicity when choosing between dynamic threshold, absolute token threshold, or override approaches.
  • Test the chosen override method with different models and providers to ensure the desired compression behavior.

Example

compression:
  enabled: true
  threshold: 0.5          # global default
  target_ratio: 0.2
  providers:
    xiaomi:
      threshold: 0.3       # for 1M context models, compress earlier
    anthropic:
      threshold: 0.65      # keep Claude existing behavior

Notes

The best approach depends on the specific use case and model configurations. It's essential to weigh the benefits of simplicity against the need for granular control.

Recommendation

Apply a per-provider override (Option A) as it offers a balance between simplicity and effectiveness for most cases, allowing for more suitable compression thresholds for different model sizes without excessive config complexity.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Per-model or per-provider compression threshold overrides [1 participants]