hermes - 💡(How to fix) Fix Per-model or per-provider compression threshold overrides [1 participants]

hermes2026-05-02 08:24:45

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#18733•Fetched 2026-05-03 04:54:36

View on GitHub

Comments

Participants

Timeline

Reactions

Author

tizerluo

Participants

tizerluo

Timeline (top)

labeled ×4

As a user running MiMo V2.5 Pro (1M context) as main model, I set threshold to 0.5 but this effectively means compression almost never triggers. Lowering to 0.3 would make more sense for 1M models, but that same 0.3 would be too aggressive for 128K models in the fallback chain.

Root Cause

Code Example

compression:
  enabled: true
  threshold: 0.5          # global default
  target_ratio: 0.2
  providers:
    xiaomi:
      threshold: 0.3       # for 1M context models, compress earlier
    anthropic:
      threshold: 0.65      # keep Claude existing behavior

---

compression:
  threshold: 0.5
  model_overrides:
    - model: mimo-v2.5-pro
      threshold: 0.3
    - model: claude-sonnet-4
      threshold: 0.65

RAW_BUFFERClick to expand / collapse

Feature Description

Allow setting different compression.threshold values per model or per provider, instead of a single global value.

Motivation

With 1M+ context models becoming more common (e.g. DeepSeek V4 Flash, MiMo V2.5 Pro, Gemini 2.5 Pro), a single global threshold creates a tension:

Large context models (1M): A threshold of 0.5 means compression only triggers at 500K tokens — most conversations never reach this, so the feature is effectively unused.
Small context models (128K): The same 0.5 threshold triggers at 64K, which is reasonable.

Users who switch between models (e.g. MiMo 1M for main chat, Claude 200K for coding) need different compression strategies. A global ratio that works well for 128K models is too conservative for 1M models.

Proposed Solution

Option A — Per-provider override:

compression:
  enabled: true
  threshold: 0.5          # global default
  target_ratio: 0.2
  providers:
    xiaomi:
      threshold: 0.3       # for 1M context models, compress earlier
    anthropic:
      threshold: 0.65      # keep Claude existing behavior

Option B — Per-model override:

compression:
  threshold: 0.5
  model_overrides:
    - model: mimo-v2.5-pro
      threshold: 0.3
    - model: claude-sonnet-4
      threshold: 0.65

Option A is simpler and covers most cases (providers tend to have similar context sizes). Option B is more granular but adds config complexity.

Alternatives Considered

Dynamic threshold based on context size: Automatically lower the threshold ratio as context size grows (e.g. threshold = min(0.5, 100000 / context_length)). This would be zero-config but less predictable.
Absolute token threshold: Instead of a ratio, use a fixed token count (e.g. compress at 64K tokens regardless of model). Simpler reasoning but loses the relative scaling.

Context

extent analysis

TL;DR

Implement per-provider or per-model compression threshold overrides to address the issue of a single global threshold not being suitable for models of different sizes.

Guidance

Consider using Option A (per-provider override) for a simpler configuration that covers most cases, as providers tend to have similar context sizes.
Evaluate Option B (per-model override) for more granular control, but be aware that it adds config complexity.
Assess the trade-offs between predictability and simplicity when choosing between dynamic threshold, absolute token threshold, or override approaches.
Test the chosen override method with different models and providers to ensure the desired compression behavior.

Example

compression:
  enabled: true
  threshold: 0.5          # global default
  target_ratio: 0.2
  providers:
    xiaomi:
      threshold: 0.3       # for 1M context models, compress earlier
    anthropic:
      threshold: 0.65      # keep Claude existing behavior

Notes

The best approach depends on the specific use case and model configurations. It's essential to weigh the benefits of simplicity against the need for granular control.

Recommendation

Apply a per-provider override (Option A) as it offers a balance between simplicity and effectiveness for most cases, allowing for more suitable compression thresholds for different model sizes without excessive config complexity.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#autograd error #model save/load #optimization #mixed precision #training loop

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Per-model or per-provider compression threshold overrides [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Feature Description

Motivation

Proposed Solution

Alternatives Considered

Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Per-model or per-provider compression threshold overrides [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Feature Description

Motivation

Proposed Solution

Alternatives Considered

Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING