hermes - 💡(How to fix) Fix feat: Support fixed token threshold and per-model compression settings in config.yaml [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

  1. Keep everything as-is and ask users to calculate percentages manually. Works but is error-prone and unfriendly.

Fix Action

Fixed

Code Example

compression:
  enabled: true
  threshold: 0.5                         # global default (percentage)
  threshold_tokens: null                 # global fixed token override
  per_model:
    deepseek-chat:
      threshold_tokens: 120000           # deepseek-chat: compress at 120K tokens
    claude-sonnet-4:
      threshold: 0.75                    # claude-sonnet-4: compress at 75%
    openai/gpt-4o:
      threshold_tokens: 60000            # gpt-4o: compress at 60K tokens
RAW_BUFFERClick to expand / collapse

Is your feature request related to a problem? Please describe.

Currently compression.threshold in config.yaml only accepts a float (0.0~1.0) representing a percentage of the model's total context window. This causes two issues:

  1. Varying behavior across models. A threshold of 0.5 means 500K tokens for deepseek-chat (1M context) but only 100K for claude-sonnet-4 (200K context). Users who switch models frequently cannot set a threshold that feels right for all of them.

  2. No way to set an absolute token limit. Users who want compression to trigger at a specific token count (e.g., 120K tokens regardless of model) are forced to calculate a percentage per model or modify source code.

Describe the solution you'd like.

Extend the compression section in config.yaml to support:

  1. threshold_tokens — an absolute token count that overrides the percentage-based threshold.
  2. per_model — a map of model-specific overrides, each supporting both threshold (percentage) and threshold_tokens (fixed count).

Priority (highest to lowest):

  1. compression.per_model.<model>.threshold_tokens
  2. compression.per_model.<model>.threshold
  3. compression.threshold_tokens (global fixed)
  4. compression.threshold (global percentage, current behavior — backward compatible)

Example config:

compression:
  enabled: true
  threshold: 0.5                         # global default (percentage)
  threshold_tokens: null                 # global fixed token override
  per_model:
    deepseek-chat:
      threshold_tokens: 120000           # deepseek-chat: compress at 120K tokens
    claude-sonnet-4:
      threshold: 0.75                    # claude-sonnet-4: compress at 75%
    openai/gpt-4o:
      threshold_tokens: 60000            # gpt-4o: compress at 60K tokens

Files that would need changes:

  • config.yaml schema / hermes_cli/config.py — add threshold_tokens and per_model fields
  • agent/context_compressor.py__init__ and should_compress() — accept either percentage or absolute value
  • run_agent.py — pass the per-model overrides to the compressor
  • hermes_cli/setup.py — support for hermes setup interactive configuration
  • agent/auxiliary_client.py_compression_threshold_for_model() could be replaced by the new config-driven approach
  • Tests for each case (global fixed, per-model percentage, per-model fixed)

Describe alternatives you've considered.

  1. Only add threshold_tokens globally (no per_model). Simpler but doesn't solve the model-switching use case.
  2. Keep everything as-is and ask users to calculate percentages manually. Works but is error-prone and unfriendly.
  3. Modify _compression_threshold_for_model() in source code for each model. Not scalable and gets overwritten on updates.

Additional context.

Originally raised by a user who primarily uses deepseek-chat (1M context) but also switches to claude-sonnet-4 (200K context) and gpt-4o (128K context). A global 0.5 threshold means compression fires at wildly different absolute points depending on the current model.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix feat: Support fixed token threshold and per-model compression settings in config.yaml [1 pull requests]