hermes - ✅(Solved) Fix [Bug]: update_model() leaves tail_token_budget and max_summary_tokens stale after model switch [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#15558Fetched 2026-04-26 05:26:42
View on GitHub
Comments
2
Participants
3
Timeline
6
Reactions
0
Timeline (top)
labeled ×3commented ×2cross-referenced ×1

Root Cause

update_model() in agent/context_compressor.py:

self.context_length = context_length
self.threshold_tokens = max(
    int(context_length * self.threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)
# tail_token_budget  ← NOT updated
# max_summary_tokens ← NOT updated

__init__ computes both correctly:

target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
self.tail_token_budget = target_tokens
self.max_summary_tokens = min(int(self.context_length * 0.05), _SUMMARY_TOKENS_CEILING)

Fix Action

Fixed

PR fix notes

PR #15560: fix(compressor): re-derive tail_token_budget and max_summary_tokens in update_model()

Description (problem / solution / changelog)

What does this PR do?

ContextCompressor.update_model() correctly updates context_length and threshold_tokens when the active model changes, but does not re-derive tail_token_budget and max_summary_tokens. Both are computed from those same values in __init__ — omitting the re-derivation leaves them stale after a model switch to one with a significantly different context window.

Large → small context switch (e.g. gemini 1M → local 8K): tail_token_budget stays at ~100 K. _find_tail_cut_by_tokens() treats the entire conversation as "tail" (soft ceiling is never reached), leaving almost nothing in the middle region to summarize. Compression silently does nothing.

Small → large context switch (e.g. local 8K → claude-sonnet 200K): tail_token_budget stays at ~800 tokens. The tail is drastically under-protected; recent turns and the active task are aggressively summarized away.

Fix: mirror the two __init__ assignments at the end of update_model() using the already-updated self.threshold_tokens and self.context_length. Symmetric with __init__ lines 365–369. No behavior change when the context length is unchanged.

Related Issue

Fixes #15558

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests
  • ♻️ Refactor
  • 🎯 New skill

Changes Made

  • agent/context_compressor.py: add tail_token_budget and max_summary_tokens re-derivation to update_model() (+9 lines)

How to Test

from unittest.mock import patch
with patch("agent.context_compressor.get_model_context_length", return_value=1_000_000):
    c = ContextCompressor(model="big/model", quiet_mode=True)

assert c.tail_token_budget == int(c.threshold_tokens * c.summary_target_ratio)

c.update_model(model="small/model", context_length=8_000)
assert c.tail_token_budget == int(c.threshold_tokens * c.summary_target_ratio)
assert c.max_summary_tokens == min(int(8_000 * 0.05), 12_000)

Checklist

Code

  • Contributing Guide okundu
  • Conventional Commits
  • Duplicate PR yok
  • Sadece bu fix
  • pytest çalıştırıldı
  • Platform: macOS

Documentation & Housekeeping

  • Docs güncellendi — N/A
  • cli-config.yaml.example — N/A
  • CONTRIBUTING.md/AGENTS.md — N/A
  • Cross-platform impact — N/A
  • Tool descriptions — N/A

Changed files

  • agent/context_compressor.py (modified, +9/-0)

Code Example

self.context_length = context_length
self.threshold_tokens = max(
    int(context_length * self.threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)
# tail_token_budget  ← NOT updated
# max_summary_tokens ← NOT updated

---

target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
self.tail_token_budget = target_tokens
self.max_summary_tokens = min(int(self.context_length * 0.05), _SUMMARY_TOKENS_CEILING)
RAW_BUFFERClick to expand / collapse

Bug Description

ContextCompressor.update_model() updates context_length and threshold_tokens when the active model changes, but does not re-derive tail_token_budget and max_summary_tokens. Both values are computed from threshold_tokens and context_length in __init__ — without the same re-derivation in update_model(), they go stale whenever the user switches to a model with a significantly different context window.

Root Cause

update_model() in agent/context_compressor.py:

self.context_length = context_length
self.threshold_tokens = max(
    int(context_length * self.threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)
# tail_token_budget  ← NOT updated
# max_summary_tokens ← NOT updated

__init__ computes both correctly:

target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
self.tail_token_budget = target_tokens
self.max_summary_tokens = min(int(self.context_length * 0.05), _SUMMARY_TOKENS_CEILING)

Impact

Large → small context switch (e.g. gemini-3-flash 1M → local model 8K): tail_token_budget stays at ~100K. _find_tail_cut_by_tokens() tries to protect 100K tokens of "tail" in an 8K context — the entire conversation is treated as tail, leaving nothing to summarize. Compression silently does nothing.

Small → large context switch (e.g. local 8K → claude-sonnet 200K): tail_token_budget stays at ~800 tokens. The tail is drastically under-protected; recent turns and the active task are aggressively summarized away.

Steps to Reproduce

  1. Start a session with a large-context model (e.g. OpenRouter gemini or claude-sonnet).
  2. Fill context past the compression threshold.
  3. Switch to a model with a much smaller context window via /model.
  4. Continue chatting until the new threshold is reached.
  5. Observe: compression triggers but summarizes only 1–2 messages (tail budget exceeds total context).

Are you willing to submit a PR?

Yes, fix is ready.

extent analysis

TL;DR

Update update_model() to re-derive tail_token_budget and max_summary_tokens after changing the active model.

Guidance

  • In update_model(), recompute tail_token_budget and max_summary_tokens using the updated threshold_tokens and context_length values.
  • Verify the fix by switching between models with different context windows and checking that compression behaves as expected.
  • To mitigate the issue, ensure that update_model() updates all dependent variables when the model changes.
  • Consider adding tests to cover model switching scenarios and verify that tail_token_budget and max_summary_tokens are correctly updated.

Example

def update_model(self, ...):
    self.context_length = context_length
    self.threshold_tokens = max(
        int(context_length * self.threshold_percent),
        MINIMUM_CONTEXT_LENGTH,
    )
    target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
    self.tail_token_budget = target_tokens
    self.max_summary_tokens = min(int(self.context_length * 0.05), _SUMMARY_TOKENS_CEILING)

Notes

The provided code snippet assumes that the necessary variables and methods are already defined and accessible within the update_model() method.

Recommendation

Apply workaround: update update_model() to re-derive tail_token_budget and max_summary_tokens to ensure correct compression behavior when switching between models.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug]: update_model() leaves tail_token_budget and max_summary_tokens stale after model switch [1 pull requests, 2 comments, 3 participants]