hermes - 💡(How to fix) Fix [Bug] 预检压缩的消息数量守卫在 token 已超阈值时静默跳过

Code Example

len(messages) > protect_first_n + protect_last_n + 1
# 默认 3 + 20 + 1 = 24

---

if (
    self.compression_enabled
    and len(messages) > self.context_compressor.protect_first_n
                        + self.context_compressor.protect_last_n + 1
                        # 3 + 20 + 1 = 24
):
    _preflight_tokens = estimate_request_tokens_rough(
        messages,
        system_prompt=active_system_prompt or "",
        tools=self.tools or None,
    )
    if _preflight_tokens >= self.context_compressor.threshold_tokens:
        # ... compress

---

messages: 共 8 条消息
  [0] role=system   content= 19,556 chars
  [1] role=user     content=  8,928 chars
  [4] role=user     content=  4,781 chars  (压缩摘要标记)
  [7] role=user     content=5,334,419 chars  ← 失败的压缩产物
  其余为工具调用和结果消息

总字符数：5,368,491（约 ~1.34M tokens）

---

_approx_chars = sum(len(str(m.get('content', '')) or '') for m in messages)
_approx_tokens = _approx_chars // 4

if (
    self.compression_enabled
    and (len(messages) > 24 or _approx_tokens > self.context_compressor.threshold_tokens)
):
    _preflight_tokens = estimate_request_tokens_rough(...)
    if _preflight_tokens >= self.context_compressor.threshold_tokens:
        self._compress_context(...)

摘要

run_agent.py 的预检压缩（preflight compression）入口处有一个消息数量守卫条件：

len(messages) > protect_first_n + protect_last_n + 1
# 默认 3 + 20 + 1 = 24

这个守卫是门而不是提示（gate, not a hint）——当它结果为 False 时，后续的 token 估算和压缩逻辑完全不会执行。即使会话的总 token 数已经远超配置的 compression.threshold（默认 500K token），压缩也不会触发。

这导致在一个很常见但没有被覆盖的场景下静默的上下文溢出（context overflow）：

会话只有少量消息（< 24 条），但每条消息内容极大（例如读取大文件的输出、失败的压缩尝试产生了一条 1M+ token 的消息）。

在 API Server 模式下（通过 hermes-web-ui / Open WebUI 等前端请求），每次请求创建一个新的 AIAgent 并执行一次 preflight。如果客户端发回的会话历史恰好是"少量消息 + 超大内容"，守卫就会永久阻止压缩，会话只能靠手动 /compress 或 /new 恢复。

根因分析

主要 Bug：`len(messages)` 守卫抢在 token 检查前短路

run_agent.py 第 11220 行：

if (
    self.compression_enabled
    and len(messages) > self.context_compressor.protect_first_n
                        + self.context_compressor.protect_last_n + 1
                        # 3 + 20 + 1 = 24
):
    _preflight_tokens = estimate_request_tokens_rough(
        messages,
        system_prompt=active_system_prompt or "",
        tools=self.tools or None,
    )
    if _preflight_tokens >= self.context_compressor.threshold_tokens:
        # ... compress

当 len(messages) <= 24 时，整个 if 判断为 False，token 检查和后续压缩全部跳过。

次要问题：`update_from_response` 获取的数据从未用于压缩触发

ContextCompressor.update_from_response() 方法（context_compressor.py 第 488 行）已经在每次 API 响应后接收了真实的 prompt_tokens。should_compress() 方法已实现但在主循环中从未被调用过——搜索 run_agent.py 中所有 should_compress 调用，结果为 0。

第三层问题：`estimate_request_tokens_rough()` 可能低估 token 数

这个函数使用 _CHARS_PER_TOKEN = 4。对于中文（平均 ~1.5-2 字符/token）、代码、JSON、base64 等情况，实际 token 数可能是估算值的 1.5-2 倍。

实际证据

来自 agent.log：

Preflight compression 出现次数：0
context compression started 出现次数：0
Auxiliary compression: using auto 出现次数：24 —— 辅助模型初始化

来自 request_dump 分析（6.75MB）：

messages: 共 8 条消息
  [0] role=system   content= 19,556 chars
  [1] role=user     content=  8,928 chars
  [4] role=user     content=  4,781 chars  (压缩摘要标记)
  [7] role=user     content=5,334,419 chars  ← 失败的压缩产物
  其余为工具调用和结果消息

总字符数：5,368,491（约 ~1.34M tokens）

会话只有 8 条消息（远低于 24 条的门槛），但 1.34M tokens（远超 500K 阈值）。

重现步骤

使用 Hermes 的 API Server 模式（网关通过 conversation_history 传递会话历史）
创建一个少于 24 条消息、但每条消息内容极大的会话（例如读取一个 200K token 的大文件，或一次失败的压缩产生了一条 1M+ token 的摘要消息）
观察：预检压缩从未触发
下一次 API 调用达到模型上下文限制，可能报 413 或不可重试的错误
检查 agent.log——没有 preflight compression 日志

环境

Hermes Agent 版本：在 main 分支最新提交上观察到（2026-05-17）
模型：deepseek-v4-flash（1M 上下文，threshold=500K）
运行模式：API Server（hermes-web-ui 前端）
配置：默认值（compression.enabled: true, threshold: 0.5, protect_last_n: 20）

建议修复方案

修复 1（最小改动，高收益）⭐ 推荐

在预检守卫条件中增加 or _approx_tokens > threshold_tokens：

_approx_chars = sum(len(str(m.get('content', '')) or '') for m in messages)
_approx_tokens = _approx_chars // 4

if (
    self.compression_enabled
    and (len(messages) > 24 or _approx_tokens > self.context_compressor.threshold_tokens)
):
    _preflight_tokens = estimate_request_tokens_rough(...)
    if _preflight_tokens >= self.context_compressor.threshold_tokens:
        self._compress_context(...)

修复 2（中等）

利用 LLM API 返回的真实 prompt_tokens（已通过 context_compressor.last_prompt_tokens 获取）作为每次 API 响应后的二次压缩触发条件。

修复 3（可选）

在主循环中增加中间检查：在 tool 结果追加后、下一次 API 调用前，如果 token 估算超阈值，则先压缩再发送。

关联 Issue

#6202： /compress 报告成功但实际未压缩——也提到了相同的 24 条消息守卫，但讨论的是手动 /compress 路径
#22871： 修复预检压缩 pass budget
#25921： Gateway 压缩分裂后重复使用父级会话历史
#12213： Feature request：将 compress 作为原生 tool

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug] 预检压缩的消息数量守卫在 token 已超阈值时静默跳过

Recommended Tools

GitHub issue graph ai analysis

Code Example

摘要

根因分析

主要 Bug：`len(messages)` 守卫抢在 token 检查前短路

次要问题：`update_from_response` 获取的数据从未用于压缩触发

第三层问题：`estimate_request_tokens_rough()` 可能低估 token 数

实际证据

来自 agent.log：

来自 request_dump 分析（6.75MB）：

重现步骤

环境

建议修复方案

修复 1（最小改动，高收益）⭐ 推荐

修复 2（中等）

修复 3（可选）

关联 Issue

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Bug] 预检压缩的消息数量守卫在 token 已超阈值时静默跳过

Recommended Tools

GitHub issue graph ai analysis

Code Example

摘要

根因分析

主要 Bug：len(messages) 守卫抢在 token 检查前短路

次要问题：update_from_response 获取的数据从未用于压缩触发

第三层问题：estimate_request_tokens_rough() 可能低估 token 数

实际证据

来自 agent.log：

来自 request_dump 分析（6.75MB）：

重现步骤

环境

建议修复方案

修复 1（最小改动，高收益）⭐ 推荐

修复 2（中等）

修复 3（可选）

关联 Issue

Still need to ship something?

RELATED_DISCOVERY

TRENDING

主要 Bug：`len(messages)` 守卫抢在 token 检查前短路

次要问题：`update_from_response` 获取的数据从未用于压缩触发

第三层问题：`estimate_request_tokens_rough()` 可能低估 token 数