hermes - 💡(How to fix) Fix feat: Support fixed token threshold and per-model compression settings in config.yaml [1 pull requests]

hermes2026-05-13 01:10:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

Keep everything as-is and ask users to calculate percentages manually. Works but is error-prone and unfriendly.

Fix Action

Fixed

Fixed by PR: feat: support fixed token threshold and per-model compression settings (https://github.com/NousResearch/hermes-agent/pull/24704)

Code Example

compression:
  enabled: true
  threshold: 0.5                         # global default (percentage)
  threshold_tokens: null                 # global fixed token override
  per_model:
    deepseek-chat:
      threshold_tokens: 120000           # deepseek-chat: compress at 120K tokens
    claude-sonnet-4:
      threshold: 0.75                    # claude-sonnet-4: compress at 75%
    openai/gpt-4o:
      threshold_tokens: 60000            # gpt-4o: compress at 60K tokens

RAW_BUFFERClick to expand / collapse

Is your feature request related to a problem? Please describe.

Currently compression.threshold in config.yaml only accepts a float (0.0~1.0) representing a percentage of the model's total context window. This causes two issues:

Varying behavior across models. A threshold of 0.5 means 500K tokens for deepseek-chat (1M context) but only 100K for claude-sonnet-4 (200K context). Users who switch models frequently cannot set a threshold that feels right for all of them.
No way to set an absolute token limit. Users who want compression to trigger at a specific token count (e.g., 120K tokens regardless of model) are forced to calculate a percentage per model or modify source code.

Describe the solution you'd like.

Extend the compression section in config.yaml to support:

threshold_tokens — an absolute token count that overrides the percentage-based threshold.
per_model — a map of model-specific overrides, each supporting both threshold (percentage) and threshold_tokens (fixed count).

Priority (highest to lowest):

compression.per_model.<model>.threshold_tokens
compression.per_model.<model>.threshold
compression.threshold_tokens (global fixed)
compression.threshold (global percentage, current behavior — backward compatible)

Example config:

compression:
  enabled: true
  threshold: 0.5                         # global default (percentage)
  threshold_tokens: null                 # global fixed token override
  per_model:
    deepseek-chat:
      threshold_tokens: 120000           # deepseek-chat: compress at 120K tokens
    claude-sonnet-4:
      threshold: 0.75                    # claude-sonnet-4: compress at 75%
    openai/gpt-4o:
      threshold_tokens: 60000            # gpt-4o: compress at 60K tokens

Files that would need changes:

config.yaml schema / hermes_cli/config.py — add threshold_tokens and per_model fields
agent/context_compressor.py — __init__ and should_compress() — accept either percentage or absolute value
run_agent.py — pass the per-model overrides to the compressor
hermes_cli/setup.py — support for hermes setup interactive configuration
agent/auxiliary_client.py — _compression_threshold_for_model() could be replaced by the new config-driven approach
Tests for each case (global fixed, per-model percentage, per-model fixed)

Describe alternatives you've considered.

Only add threshold_tokens globally (no per_model). Simpler but doesn't solve the model-switching use case.
Keep everything as-is and ask users to calculate percentages manually. Works but is error-prone and unfriendly.
Modify _compression_threshold_for_model() in source code for each model. Not scalable and gets overwritten on updates.

Additional context.

Originally raised by a user who primarily uses deepseek-chat (1M context) but also switches to claude-sonnet-4 (200K context) and gpt-4o (128K context). A global 0.5 threshold means compression fires at wildly different absolute points depending on the current model.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#model loading #dependency error #configuration error #environment variable #network issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix feat: Support fixed token threshold and per-model compression settings in config.yaml [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

Code Example

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix feat: Support fixed token threshold and per-model compression settings in config.yaml [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

Code Example

Still need to ship something?

RELATED_DISCOVERY

TRENDING