hermes - 💡(How to fix) Fix Enhancement: Context-Aware Compression Threshold in Setup Wizard [1 participants]

hermes2026-05-02 19:48:37

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#18957•Fetched 2026-05-03 04:53:21

View on GitHub

Comments

Participants

Timeline

Reactions

Author

vokasug

Participants

vokasug

Timeline (top)

labeled ×4

Error Message

In hermes_cli/setup.py, inside setup_agent_settings() or new function:

THINKING_BUDGET_MAP = { "none": 0, "minimal": 4000, "low": 4000, "medium": 8000, "high": 16000, "xhigh": 32000, }

def _estimate_tool_buffer(task_hint="medium"): return {"small": 5000, "medium": 15000, "large": 30000}.get(task_hint, 15000)

def setup_compression_threshold(config): model = cfg_get(config, "model", "default", default="") base_url = cfg_get(config, "model", "base_url", default="") api_key = cfg_get(config, "model", "api_key", default="")

# Auto-detect context length from model metadata
detected_ctx = get_model_context_length(
    model, base_url=base_url, api_key=api_key,
    config_context_length=cfg_get(config, "model", "context_length", default=None),
)

# Allow user to override context length (local servers, proxies, deliberate limits)
ctx_input = prompt("Context window (tokens)", str(detected_ctx))
try:
    context_length = int(ctx_input)
    if context_length >= 8000:
        config.setdefault("model", {})["context_length"] = context_length
    else:
        context_length = detected_ctx
except ValueError:
    context_length = detected_ctx

max_tokens = cfg_get(config, "model", "max_tokens", default=None)
if max_tokens is None:
    max_tokens = _get_model_native_output_ceiling(model)  # or default 128K

reasoning = cfg_get(config, "agent", "reasoning_effort", default="medium")
thinking = THINKING_BUDGET_MAP.get(reasoning, 8000)

tool_buffer = _estimate_tool_buffer(
    cfg_get(config, "compression", "task_size_hint", default="medium")
)

raw_buffer = max_tokens + thinking + tool_buffer
safety = int(raw_buffer * 0.10)
required = raw_buffer + safety

recommended = 1.0 - (required / context_length)
recommended = max(0.50, min(0.95, recommended))

# Display to user...
# Prompt with recommended as default, allow override

Code Example

thinking_budget  = THINKING_BUDGET[reasoning_effort]  # 0 / 4K / 8K / 16K
   output_cap         = max_tokens or model_native_ceiling
   tool_buffer        = estimated_tool_return(task_size_hint)  # 5K–25K
   safety_margin      = 10% of (output_cap + thinking_budget + tool_buffer)
   
   required_buffer    = output_cap + thinking_budget + tool_buffer + safety_margin
   recommended        = 1 - (required_buffer / context_length)
   recommended        = clamp(recommended, 0.50, 0.95)

---

Detected model: kimi-for-coding
   Context window (auto-detected) [262144]: 262144
   Max response length [32768]: 32768
   Reasoning effort [medium]: medium
   Estimated tool buffer: 15000 tokens
   
   Recommended compression threshold: 0.75
   (based on: 262K context - 32K response - 8K thinking - 15K tool buffer - 10% safety)
   
   Compression threshold [0.75]: _

---

# In hermes_cli/setup.py, inside setup_agent_settings() or new function:

THINKING_BUDGET_MAP = {
    "none": 0, "minimal": 4000, "low": 4000,
    "medium": 8000, "high": 16000, "xhigh": 32000,
}

def _estimate_tool_buffer(task_hint="medium"):
    return {"small": 5000, "medium": 15000, "large": 30000}.get(task_hint, 15000)

def setup_compression_threshold(config):
    model = cfg_get(config, "model", "default", default="")
    base_url = cfg_get(config, "model", "base_url", default="")
    api_key = cfg_get(config, "model", "api_key", default="")
    
    # Auto-detect context length from model metadata
    detected_ctx = get_model_context_length(
        model, base_url=base_url, api_key=api_key,
        config_context_length=cfg_get(config, "model", "context_length", default=None),
    )
    
    # Allow user to override context length (local servers, proxies, deliberate limits)
    ctx_input = prompt("Context window (tokens)", str(detected_ctx))
    try:
        context_length = int(ctx_input)
        if context_length >= 8000:
            config.setdefault("model", {})["context_length"] = context_length
        else:
            context_length = detected_ctx
    except ValueError:
        context_length = detected_ctx
    
    max_tokens = cfg_get(config, "model", "max_tokens", default=None)
    if max_tokens is None:
        max_tokens = _get_model_native_output_ceiling(model)  # or default 128K
    
    reasoning = cfg_get(config, "agent", "reasoning_effort", default="medium")
    thinking = THINKING_BUDGET_MAP.get(reasoning, 8000)
    
    tool_buffer = _estimate_tool_buffer(
        cfg_get(config, "compression", "task_size_hint", default="medium")
    )
    
    raw_buffer = max_tokens + thinking + tool_buffer
    safety = int(raw_buffer * 0.10)
    required = raw_buffer + safety
    
    recommended = 1.0 - (required / context_length)
    recommended = max(0.50, min(0.95, recommended))
    
    # Display to user...
    # Prompt with recommended as default, allow override

---

═══════════════════════════════════════════════════════
  Context & Compression Settings
═══════════════════════════════════════════════════════

Detected model: kimi-for-coding
Context window (auto-detected) [262144]: 262144
Max response length (tokens) [auto]: 32768
Reasoning effort [medium]: medium
Show reasoning output [yes]: yes
Typical file size you work with [medium]: medium

── Calculated compression threshold ──
  Context window:  262144
  Output cap:      32768
  Thinking budget: 8000
  Tool buffer:     15000
  Safety margin:   5577 (10%)
  ─────────────────────────────
  Required buffer: 61345
  
  Recommended threshold: 0.77
  (compresses at ~201K tokens, leaving ~61K for response)

Compression threshold [0.77]: _

Press Enter to accept, or type 0.50–0.95 to override.

RAW_BUFFERClick to expand / collapse

Enhancement: Context-Aware Compression Threshold in Setup Wizard

Problem

The current hermes setup wizard uses a static default of compression.threshold = 0.50 regardless of:

Model's context window size
User's max_tokens (output cap)
Reasoning effort / thinking budget
Typical tool result sizes

Real-world impact:

User with kimi-for-coding (262K context) and medium reasoning (8K thinking) gets compression triggered at 131K tokens — approximately 2× earlier than necessary.
The 50% default was designed for 32K–128K context models. On modern 256K+ models, this causes:
- Premature context loss (model "forgets" details from early turns)
- Unnecessary summarization overhead (extra API calls for compression)
- Degraded performance on long-running tasks (debugging, multi-file refactors)

Proposed Solution

Add a context-aware step to hermes setup agent (or a new dedicated step) that:

Collects runtime-relevant parameters:
- model.context_length — total context window (input + output). Auto-detected from model metadata, but user can override (e.g., for local servers with custom num_ctx, proxies without /v1/models, or to deliberately limit context)
- model.max_tokens — maximum response length per turn
- agent.reasoning_effort — thinking budget (none / low / medium / high)
- display.show_reasoning — whether reasoning output is visible
- compression.task_size_hint — optional: small/medium/large files (for tool buffer estimate)

Auto-calculates recommended threshold:

thinking_budget  = THINKING_BUDGET[reasoning_effort]  # 0 / 4K / 8K / 16K
output_cap         = max_tokens or model_native_ceiling
tool_buffer        = estimated_tool_return(task_size_hint)  # 5K–25K
safety_margin      = 10% of (output_cap + thinking_budget + tool_buffer)

required_buffer    = output_cap + thinking_budget + tool_buffer + safety_margin
recommended        = 1 - (required_buffer / context_length)
recommended        = clamp(recommended, 0.50, 0.95)

Shows calculation transparently to user:

Detected model: kimi-for-coding
Context window (auto-detected) [262144]: 262144
Max response length [32768]: 32768
Reasoning effort [medium]: medium
Estimated tool buffer: 15000 tokens

Recommended compression threshold: 0.75
(based on: 262K context - 32K response - 8K thinking - 15K tool buffer - 10% safety)

Compression threshold [0.75]: _

Press Enter on any line to accept the auto-detected/calculated default. Type a custom value to override.

Why This Approach

Model	Context	max_tokens	Reasoning	50% trigger	Calculated	Improvement
kimi-for-coding	262K	32768	medium (8K)	131K	196K	+50% usable context
claude-sonnet-4-6	1M	64000	high (16K)	500K	892K	+78% usable context
gpt-4o	128K	8192	none	64K	112K	+75% usable context
local qwen:32b	32K	4096	low (4K)	16K	22K	+37% usable context

Implementation Sketch

Files to modify

hermes_cli/setup.py
- Add new function setup_compression_threshold(config) or extend setup_agent_settings(config)
- Import THINKING_BUDGET from agent.anthropic_adapter (or duplicate lightweight mapping)
- Use agent.model_metadata.get_model_context_length() for context window detection
- Use cfg_get(config, "model", "max_tokens", default=None) for output cap
hermes_cli/config.py (optional)
- Add helper calculate_compression_threshold(context_length, max_tokens, reasoning_effort) for reuse by hermes doctor

Pseudocode

# In hermes_cli/setup.py, inside setup_agent_settings() or new function:

THINKING_BUDGET_MAP = {
    "none": 0, "minimal": 4000, "low": 4000,
    "medium": 8000, "high": 16000, "xhigh": 32000,
}

def _estimate_tool_buffer(task_hint="medium"):
    return {"small": 5000, "medium": 15000, "large": 30000}.get(task_hint, 15000)

def setup_compression_threshold(config):
    model = cfg_get(config, "model", "default", default="")
    base_url = cfg_get(config, "model", "base_url", default="")
    api_key = cfg_get(config, "model", "api_key", default="")
    
    # Auto-detect context length from model metadata
    detected_ctx = get_model_context_length(
        model, base_url=base_url, api_key=api_key,
        config_context_length=cfg_get(config, "model", "context_length", default=None),
    )
    
    # Allow user to override context length (local servers, proxies, deliberate limits)
    ctx_input = prompt("Context window (tokens)", str(detected_ctx))
    try:
        context_length = int(ctx_input)
        if context_length >= 8000:
            config.setdefault("model", {})["context_length"] = context_length
        else:
            context_length = detected_ctx
    except ValueError:
        context_length = detected_ctx
    
    max_tokens = cfg_get(config, "model", "max_tokens", default=None)
    if max_tokens is None:
        max_tokens = _get_model_native_output_ceiling(model)  # or default 128K
    
    reasoning = cfg_get(config, "agent", "reasoning_effort", default="medium")
    thinking = THINKING_BUDGET_MAP.get(reasoning, 8000)
    
    tool_buffer = _estimate_tool_buffer(
        cfg_get(config, "compression", "task_size_hint", default="medium")
    )
    
    raw_buffer = max_tokens + thinking + tool_buffer
    safety = int(raw_buffer * 0.10)
    required = raw_buffer + safety
    
    recommended = 1.0 - (required / context_length)
    recommended = max(0.50, min(0.95, recommended))
    
    # Display to user...
    # Prompt with recommended as default, allow override

UI Flow

═══════════════════════════════════════════════════════
  Context & Compression Settings
═══════════════════════════════════════════════════════

Detected model: kimi-for-coding
Context window (auto-detected) [262144]: 262144
Max response length (tokens) [auto]: 32768
Reasoning effort [medium]: medium
Show reasoning output [yes]: yes
Typical file size you work with [medium]: medium

── Calculated compression threshold ──
  Context window:  262144
  Output cap:      32768
  Thinking budget: 8000
  Tool buffer:     15000
  Safety margin:   5577 (10%)
  ─────────────────────────────
  Required buffer: 61345
  
  Recommended threshold: 0.77
  (compresses at ~201K tokens, leaving ~61K for response)

Compression threshold [0.77]: _

Press Enter to accept, or type 0.50–0.95 to override.

Additional Benefits

Discoverability — Users learn that compression threshold exists and is tunable
Education — Transparent calculation teaches how context_length, max_tokens, and thinking interact
hermes doctor integration — Same formula can warn: "Your threshold is 0.50 but recommended is 0.75"
Future-proof — When models grow to 1M+ context, formula still works; static 50% does not

Backward Compatibility

Existing installs: unchanged (no migration needed)
Fresh installs: benefit from smarter default
Re-running hermes setup agent: shows current value + recalculates if model/settings changed
Enter to skip: preserves current/default values exactly as today

Related Code References

agent/anthropic_adapter.py:47 — THINKING_BUDGET mapping
agent/anthropic_adapter.py:112 — _ANTHROPIC_DEFAULT_OUTPUT_LIMIT = 128_000
agent/model_metadata.py:1229 — get_model_context_length()
agent/context_compressor.py:379 — threshold_percent: float = 0.50
hermes_cli/setup.py:1668 — setup_agent_settings(config)

Acceptance Criteria

Setup wizard shows context window auto-detected value with editable prompt (Enter = accept auto-detected, custom value = override)
Setup wizard allows configuring model.context_length, model.max_tokens, agent.reasoning_effort, display.show_reasoning
Setup wizard calculates and displays recommended compression.threshold with transparent formula
Pressing Enter accepts the calculated default; custom value overrides
hermes doctor warns when actual threshold deviates significantly from recommended
All defaults maintain backward compatibility (no breaking changes)

Type: Enhancement
Priority: Medium — improves UX for all new installs and reconfigures
Affected: hermes setup, hermes doctor, hermes_cli/setup.py

extent analysis

TL;DR

To address the issue, implement a context-aware compression threshold calculation in the hermes setup wizard, allowing users to override the auto-detected value.

Guidance

Modify hermes_cli/setup.py: Add a new function setup_compression_threshold(config) to calculate the recommended compression threshold based on the model's context window size, max tokens, reasoning effort, and tool buffer.
Implement transparent calculation: Display the calculation process to the user, showing how the recommended threshold is derived from the input parameters.
Allow user override: Permit users to enter a custom compression threshold value, defaulting to the calculated recommendation if they press Enter.
Update hermes_cli/config.py (optional): Consider adding a helper function calculate_compression_threshold for reuse in hermes doctor.

Example

def setup_compression_threshold(config):
    # Auto-detect context length from model metadata
    detected_ctx = get_model_context_length(model, base_url, api_key)
    # ... (rest of the calculation and user prompt code)

Notes

The provided implementation sketch and pseudocode should be reviewed and adapted to the specific requirements of the hermes project. The calculation formula and user interface flow may need adjustments based on further discussion and testing.

Recommendation

Apply the proposed workaround by implementing the context-aware compression threshold calculation in the hermes setup wizard, as it improves the user experience and provides a more accurate compression threshold for various models and use cases.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #vector store #embedding generation #cache error #pipeline error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Enhancement: Context-Aware Compression Threshold in Setup Wizard [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

In hermes_cli/setup.py, inside setup_agent_settings() or new function:

Code Example

Enhancement: Context-Aware Compression Threshold in Setup Wizard

Problem

Proposed Solution

Why This Approach

Implementation Sketch

Files to modify

Pseudocode

UI Flow

Additional Benefits

Backward Compatibility

Related Code References

Acceptance Criteria

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Enhancement: Context-Aware Compression Threshold in Setup Wizard [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

In hermes_cli/setup.py, inside setup_agent_settings() or new function:

Code Example

Enhancement: Context-Aware Compression Threshold in Setup Wizard

Problem

Proposed Solution

Why This Approach

Implementation Sketch

Files to modify

Pseudocode

UI Flow

Additional Benefits

Backward Compatibility

Related Code References

Acceptance Criteria

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING