hermes - 💡(How to fix) Fix Enhancement: Context-Aware Compression Threshold in Setup Wizard [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18957Fetched 2026-05-03 04:53:21
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×4

Error Message

In hermes_cli/setup.py, inside setup_agent_settings() or new function:

THINKING_BUDGET_MAP = { "none": 0, "minimal": 4000, "low": 4000, "medium": 8000, "high": 16000, "xhigh": 32000, }

def _estimate_tool_buffer(task_hint="medium"): return {"small": 5000, "medium": 15000, "large": 30000}.get(task_hint, 15000)

def setup_compression_threshold(config): model = cfg_get(config, "model", "default", default="") base_url = cfg_get(config, "model", "base_url", default="") api_key = cfg_get(config, "model", "api_key", default="")

# Auto-detect context length from model metadata
detected_ctx = get_model_context_length(
    model, base_url=base_url, api_key=api_key,
    config_context_length=cfg_get(config, "model", "context_length", default=None),
)

# Allow user to override context length (local servers, proxies, deliberate limits)
ctx_input = prompt("Context window (tokens)", str(detected_ctx))
try:
    context_length = int(ctx_input)
    if context_length >= 8000:
        config.setdefault("model", {})["context_length"] = context_length
    else:
        context_length = detected_ctx
except ValueError:
    context_length = detected_ctx

max_tokens = cfg_get(config, "model", "max_tokens", default=None)
if max_tokens is None:
    max_tokens = _get_model_native_output_ceiling(model)  # or default 128K

reasoning = cfg_get(config, "agent", "reasoning_effort", default="medium")
thinking = THINKING_BUDGET_MAP.get(reasoning, 8000)

tool_buffer = _estimate_tool_buffer(
    cfg_get(config, "compression", "task_size_hint", default="medium")
)

raw_buffer = max_tokens + thinking + tool_buffer
safety = int(raw_buffer * 0.10)
required = raw_buffer + safety

recommended = 1.0 - (required / context_length)
recommended = max(0.50, min(0.95, recommended))

# Display to user...
# Prompt with recommended as default, allow override

Code Example

thinking_budget  = THINKING_BUDGET[reasoning_effort]  # 0 / 4K / 8K / 16K
   output_cap         = max_tokens or model_native_ceiling
   tool_buffer        = estimated_tool_return(task_size_hint)  # 5K–25K
   safety_margin      = 10% of (output_cap + thinking_budget + tool_buffer)
   
   required_buffer    = output_cap + thinking_budget + tool_buffer + safety_margin
   recommended        = 1 - (required_buffer / context_length)
   recommended        = clamp(recommended, 0.50, 0.95)

---

Detected model: kimi-for-coding
   Context window (auto-detected) [262144]: 262144
   Max response length [32768]: 32768
   Reasoning effort [medium]: medium
   Estimated tool buffer: 15000 tokens
   
   Recommended compression threshold: 0.75
   (based on: 262K context - 32K response - 8K thinking - 15K tool buffer - 10% safety)
   
   Compression threshold [0.75]: _

---

# In hermes_cli/setup.py, inside setup_agent_settings() or new function:

THINKING_BUDGET_MAP = {
    "none": 0, "minimal": 4000, "low": 4000,
    "medium": 8000, "high": 16000, "xhigh": 32000,
}

def _estimate_tool_buffer(task_hint="medium"):
    return {"small": 5000, "medium": 15000, "large": 30000}.get(task_hint, 15000)

def setup_compression_threshold(config):
    model = cfg_get(config, "model", "default", default="")
    base_url = cfg_get(config, "model", "base_url", default="")
    api_key = cfg_get(config, "model", "api_key", default="")
    
    # Auto-detect context length from model metadata
    detected_ctx = get_model_context_length(
        model, base_url=base_url, api_key=api_key,
        config_context_length=cfg_get(config, "model", "context_length", default=None),
    )
    
    # Allow user to override context length (local servers, proxies, deliberate limits)
    ctx_input = prompt("Context window (tokens)", str(detected_ctx))
    try:
        context_length = int(ctx_input)
        if context_length >= 8000:
            config.setdefault("model", {})["context_length"] = context_length
        else:
            context_length = detected_ctx
    except ValueError:
        context_length = detected_ctx
    
    max_tokens = cfg_get(config, "model", "max_tokens", default=None)
    if max_tokens is None:
        max_tokens = _get_model_native_output_ceiling(model)  # or default 128K
    
    reasoning = cfg_get(config, "agent", "reasoning_effort", default="medium")
    thinking = THINKING_BUDGET_MAP.get(reasoning, 8000)
    
    tool_buffer = _estimate_tool_buffer(
        cfg_get(config, "compression", "task_size_hint", default="medium")
    )
    
    raw_buffer = max_tokens + thinking + tool_buffer
    safety = int(raw_buffer * 0.10)
    required = raw_buffer + safety
    
    recommended = 1.0 - (required / context_length)
    recommended = max(0.50, min(0.95, recommended))
    
    # Display to user...
    # Prompt with recommended as default, allow override

---

═══════════════════════════════════════════════════════
  Context & Compression Settings
═══════════════════════════════════════════════════════

Detected model: kimi-for-coding
Context window (auto-detected) [262144]: 262144
Max response length (tokens) [auto]: 32768
Reasoning effort [medium]: medium
Show reasoning output [yes]: yes
Typical file size you work with [medium]: medium

── Calculated compression threshold ──
  Context window:  262144
  Output cap:      32768
  Thinking budget: 8000
  Tool buffer:     15000
  Safety margin:   5577 (10%)
  ─────────────────────────────
  Required buffer: 61345
  
  Recommended threshold: 0.77
  (compresses at ~201K tokens, leaving ~61K for response)

Compression threshold [0.77]: _

Press Enter to accept, or type 0.500.95 to override.
RAW_BUFFERClick to expand / collapse

Enhancement: Context-Aware Compression Threshold in Setup Wizard

Problem

The current hermes setup wizard uses a static default of compression.threshold = 0.50 regardless of:

  • Model's context window size
  • User's max_tokens (output cap)
  • Reasoning effort / thinking budget
  • Typical tool result sizes

Real-world impact:

  • User with kimi-for-coding (262K context) and medium reasoning (8K thinking) gets compression triggered at 131K tokens — approximately 2× earlier than necessary.
  • The 50% default was designed for 32K–128K context models. On modern 256K+ models, this causes:
    • Premature context loss (model "forgets" details from early turns)
    • Unnecessary summarization overhead (extra API calls for compression)
    • Degraded performance on long-running tasks (debugging, multi-file refactors)

Proposed Solution

Add a context-aware step to hermes setup agent (or a new dedicated step) that:

  1. Collects runtime-relevant parameters:

    • model.context_length — total context window (input + output). Auto-detected from model metadata, but user can override (e.g., for local servers with custom num_ctx, proxies without /v1/models, or to deliberately limit context)
    • model.max_tokens — maximum response length per turn
    • agent.reasoning_effort — thinking budget (none / low / medium / high)
    • display.show_reasoning — whether reasoning output is visible
    • compression.task_size_hint — optional: small/medium/large files (for tool buffer estimate)
  2. Auto-calculates recommended threshold:

    thinking_budget  = THINKING_BUDGET[reasoning_effort]  # 0 / 4K / 8K / 16K
    output_cap         = max_tokens or model_native_ceiling
    tool_buffer        = estimated_tool_return(task_size_hint)  # 5K–25K
    safety_margin      = 10% of (output_cap + thinking_budget + tool_buffer)
    
    required_buffer    = output_cap + thinking_budget + tool_buffer + safety_margin
    recommended        = 1 - (required_buffer / context_length)
    recommended        = clamp(recommended, 0.50, 0.95)
  3. Shows calculation transparently to user:

    Detected model: kimi-for-coding
    Context window (auto-detected) [262144]: 262144
    Max response length [32768]: 32768
    Reasoning effort [medium]: medium
    Estimated tool buffer: 15000 tokens
    
    Recommended compression threshold: 0.75
    (based on: 262K context - 32K response - 8K thinking - 15K tool buffer - 10% safety)
    
    Compression threshold [0.75]: _

    Press Enter on any line to accept the auto-detected/calculated default. Type a custom value to override.

Why This Approach

ModelContextmax_tokensReasoning50% triggerCalculatedImprovement
kimi-for-coding262K32768medium (8K)131K196K+50% usable context
claude-sonnet-4-61M64000high (16K)500K892K+78% usable context
gpt-4o128K8192none64K112K+75% usable context
local qwen:32b32K4096low (4K)16K22K+37% usable context

Implementation Sketch

Files to modify

  1. hermes_cli/setup.py

    • Add new function setup_compression_threshold(config) or extend setup_agent_settings(config)
    • Import THINKING_BUDGET from agent.anthropic_adapter (or duplicate lightweight mapping)
    • Use agent.model_metadata.get_model_context_length() for context window detection
    • Use cfg_get(config, "model", "max_tokens", default=None) for output cap
  2. hermes_cli/config.py (optional)

    • Add helper calculate_compression_threshold(context_length, max_tokens, reasoning_effort) for reuse by hermes doctor

Pseudocode

# In hermes_cli/setup.py, inside setup_agent_settings() or new function:

THINKING_BUDGET_MAP = {
    "none": 0, "minimal": 4000, "low": 4000,
    "medium": 8000, "high": 16000, "xhigh": 32000,
}

def _estimate_tool_buffer(task_hint="medium"):
    return {"small": 5000, "medium": 15000, "large": 30000}.get(task_hint, 15000)

def setup_compression_threshold(config):
    model = cfg_get(config, "model", "default", default="")
    base_url = cfg_get(config, "model", "base_url", default="")
    api_key = cfg_get(config, "model", "api_key", default="")
    
    # Auto-detect context length from model metadata
    detected_ctx = get_model_context_length(
        model, base_url=base_url, api_key=api_key,
        config_context_length=cfg_get(config, "model", "context_length", default=None),
    )
    
    # Allow user to override context length (local servers, proxies, deliberate limits)
    ctx_input = prompt("Context window (tokens)", str(detected_ctx))
    try:
        context_length = int(ctx_input)
        if context_length >= 8000:
            config.setdefault("model", {})["context_length"] = context_length
        else:
            context_length = detected_ctx
    except ValueError:
        context_length = detected_ctx
    
    max_tokens = cfg_get(config, "model", "max_tokens", default=None)
    if max_tokens is None:
        max_tokens = _get_model_native_output_ceiling(model)  # or default 128K
    
    reasoning = cfg_get(config, "agent", "reasoning_effort", default="medium")
    thinking = THINKING_BUDGET_MAP.get(reasoning, 8000)
    
    tool_buffer = _estimate_tool_buffer(
        cfg_get(config, "compression", "task_size_hint", default="medium")
    )
    
    raw_buffer = max_tokens + thinking + tool_buffer
    safety = int(raw_buffer * 0.10)
    required = raw_buffer + safety
    
    recommended = 1.0 - (required / context_length)
    recommended = max(0.50, min(0.95, recommended))
    
    # Display to user...
    # Prompt with recommended as default, allow override

UI Flow

═══════════════════════════════════════════════════════
  Context & Compression Settings
═══════════════════════════════════════════════════════

Detected model: kimi-for-coding
Context window (auto-detected) [262144]: 262144
Max response length (tokens) [auto]: 32768
Reasoning effort [medium]: medium
Show reasoning output [yes]: yes
Typical file size you work with [medium]: medium

── Calculated compression threshold ──
  Context window:  262144
  Output cap:      32768
  Thinking budget: 8000
  Tool buffer:     15000
  Safety margin:   5577 (10%)
  ─────────────────────────────
  Required buffer: 61345
  
  Recommended threshold: 0.77
  (compresses at ~201K tokens, leaving ~61K for response)

Compression threshold [0.77]: _

Press Enter to accept, or type 0.50–0.95 to override.

Additional Benefits

  1. Discoverability — Users learn that compression threshold exists and is tunable
  2. Education — Transparent calculation teaches how context_length, max_tokens, and thinking interact
  3. hermes doctor integration — Same formula can warn: "Your threshold is 0.50 but recommended is 0.75"
  4. Future-proof — When models grow to 1M+ context, formula still works; static 50% does not

Backward Compatibility

  • Existing installs: unchanged (no migration needed)
  • Fresh installs: benefit from smarter default
  • Re-running hermes setup agent: shows current value + recalculates if model/settings changed
  • Enter to skip: preserves current/default values exactly as today

Related Code References

  • agent/anthropic_adapter.py:47THINKING_BUDGET mapping
  • agent/anthropic_adapter.py:112_ANTHROPIC_DEFAULT_OUTPUT_LIMIT = 128_000
  • agent/model_metadata.py:1229get_model_context_length()
  • agent/context_compressor.py:379threshold_percent: float = 0.50
  • hermes_cli/setup.py:1668setup_agent_settings(config)

Acceptance Criteria

  • Setup wizard shows context window auto-detected value with editable prompt (Enter = accept auto-detected, custom value = override)
  • Setup wizard allows configuring model.context_length, model.max_tokens, agent.reasoning_effort, display.show_reasoning
  • Setup wizard calculates and displays recommended compression.threshold with transparent formula
  • Pressing Enter accepts the calculated default; custom value overrides
  • hermes doctor warns when actual threshold deviates significantly from recommended
  • All defaults maintain backward compatibility (no breaking changes)

Type: Enhancement
Priority: Medium — improves UX for all new installs and reconfigures
Affected: hermes setup, hermes doctor, hermes_cli/setup.py

extent analysis

TL;DR

To address the issue, implement a context-aware compression threshold calculation in the hermes setup wizard, allowing users to override the auto-detected value.

Guidance

  1. Modify hermes_cli/setup.py: Add a new function setup_compression_threshold(config) to calculate the recommended compression threshold based on the model's context window size, max tokens, reasoning effort, and tool buffer.
  2. Implement transparent calculation: Display the calculation process to the user, showing how the recommended threshold is derived from the input parameters.
  3. Allow user override: Permit users to enter a custom compression threshold value, defaulting to the calculated recommendation if they press Enter.
  4. Update hermes_cli/config.py (optional): Consider adding a helper function calculate_compression_threshold for reuse in hermes doctor.

Example

def setup_compression_threshold(config):
    # Auto-detect context length from model metadata
    detected_ctx = get_model_context_length(model, base_url, api_key)
    # ... (rest of the calculation and user prompt code)

Notes

The provided implementation sketch and pseudocode should be reviewed and adapted to the specific requirements of the hermes project. The calculation formula and user interface flow may need adjustments based on further discussion and testing.

Recommendation

Apply the proposed workaround by implementing the context-aware compression threshold calculation in the hermes setup wizard, as it improves the user experience and provides a more accurate compression threshold for various models and use cases.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Enhancement: Context-Aware Compression Threshold in Setup Wizard [1 participants]