hermes - 💡(How to fix) Fix feat(compression): integrate headroom-ai for tool output compression

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

| terminal | LogCompressor | ~93% | Preserves [WARN], [ERROR], [FAIL] lines; removes repetitive [INFO] |

Root Cause

These are fundamentally hard to solve at the conversation-summary level because the compressor operates on already-assembled context.

Fix Action

Fix / Workaround

  1. headroom_compressor.py — wrapper module that routes tool outputs to headroom's compressors
  2. sitecustomize.py — auto-patches tool_executor.py on startup via Python's sitecustomize mechanism
  3. patch_tool_executor.py — idempotent patch that injects the compression call into the tool execution pipeline

Code Example

Tool Execution → headroom compress_tool_output()Compressed OutputMessage ListLLM Context
              ContentRouter detects type:
              - terminal → LogCompressor (~93% reduction)
              - search_files → SearchCompressor (~87% reduction)
              - web_search → SmartCrusher (~2-5% reduction)
              - read_file → CodeAwareCompressor (tree-sitter, currently buggy)
              - browser_snapshot → noop (plain text, not supported yet)
              - web_extract → noop (markdown, not supported yet)

---

headroom:
  enabled: false  # opt-in, default off
  mode: audit     # audit (log only) | optimize (compress)
  threshold: 300  # minimum tokens to trigger compression
  tools:
    - terminal
    - search_files
    - web_search
    - read_file
    - browser_snapshot
    - web_extract

---

# Pseudocode for the integration point
function_result = execute_tool(name, args)

# NEW: Compress tool output before adding to context
if headroom_config.enabled:
    compressed = headroom_compress(name, function_result)
    if compressed is not None:
        function_result = compressed

messages.append(make_tool_result_message(name, function_result, tc.id))
RAW_BUFFERClick to expand / collapse

Problem or Use Case

Hermes Agent's current context compression system (context_compressor.py, conversation_compression.py) works at the conversation level — it summarizes the entire context window via LLM calls when the session approaches its token limit. This approach has several known issues:

  • Premature compression triggers due to token estimation inaccuracies (#23902, #14690)
  • Compression can increase prompt size instead of reducing it (#23767)
  • Silent data loss when summary generation fails (#25585, #10719)
  • Anti-thrashing protection permanently disables compression with no recovery (#14690)
  • Preflight guard bypasses token threshold for sessions with few but huge messages (#27405)

These are fundamentally hard to solve at the conversation-summary level because the compressor operates on already-assembled context.

Headroom-ai (headroom-ai on PyPI, github.com/chopratejas/headroom, 13K+ stars) takes a different approach: it compresses individual tool outputs before they enter the context, using specialized compressors per content type (logs, grep results, JSON, code). This is complementary to the existing conversation-level compression — it reduces the rate at which context grows in the first place.

Proposed Solution

Integrate headroom-ai as an optional tool-output compression layer that sits between tool execution and context insertion. The integration would:

  1. Intercept tool outputs after execution but before they are appended to the message list
  2. Route each output to the appropriate headroom compressor based on tool name / content type
  3. Replace the original output with the compressed version (in optimize mode) or log metrics only (in audit mode)
  4. Fall back gracefully if headroom is not installed or fails

Architecture

Tool Execution → headroom compress_tool_output() → Compressed Output → Message List → LLM Context
              ContentRouter detects type:
              - terminal → LogCompressor (~93% reduction)
              - search_files → SearchCompressor (~87% reduction)
              - web_search → SmartCrusher (~2-5% reduction)
              - read_file → CodeAwareCompressor (tree-sitter, currently buggy)
              - browser_snapshot → noop (plain text, not supported yet)
              - web_extract → noop (markdown, not supported yet)

Configuration

New optional config section in config.yaml:

headroom:
  enabled: false  # opt-in, default off
  mode: audit     # audit (log only) | optimize (compress)
  threshold: 300  # minimum tokens to trigger compression
  tools:
    - terminal
    - search_files
    - web_search
    - read_file
    - browser_snapshot
    - web_extract

Proof of Concept

I've been running a working integration in production for several days. The implementation consists of:

  1. headroom_compressor.py — wrapper module that routes tool outputs to headroom's compressors
  2. sitecustomize.py — auto-patches tool_executor.py on startup via Python's sitecustomize mechanism
  3. patch_tool_executor.py — idempotent patch that injects the compression call into the tool execution pipeline

Measured results (headroom-ai v0.23.0, optimize mode):

ToolCompressorReductionNotes
terminalLogCompressor~93%Preserves [WARN], [ERROR], [FAIL] lines; removes repetitive [INFO]
search_filesSearchCompressor~87%Preserves matching lines; deduplicates context
web_searchSmartCrusher~2-5%Light JSON array compression
read_fileCodeAwareCompressor~0%tree-sitter bug in v0.23.0, falls back to generic
browser_snapshotnoop0%plain text not supported by headroom yet
web_extractnoop0%markdown not supported by headroom yet

Token savings example: A session with 50 tool calls averaging 2000 tokens each would save approximately 30-40K tokens total (depending on tool mix), significantly delaying the need for conversation-level compression.

Integration Points

The cleanest integration point is in agent/tool_executor.py, in the _execute_tool_calls_sequential and _execute_tool_calls_concurrent functions, right before make_tool_result_message() is called:

# Pseudocode for the integration point
function_result = execute_tool(name, args)

# NEW: Compress tool output before adding to context
if headroom_config.enabled:
    compressed = headroom_compress(name, function_result)
    if compressed is not None:
        function_result = compressed

messages.append(make_tool_result_message(name, function_result, tc.id))

Advantages Over Current Approach

  1. Complementary: Works before context compression, reducing its frequency and improving its effectiveness
  2. No LLM calls: Unlike conversation-level compression, headroom uses deterministic algorithms — no API costs, no latency, no summary quality issues
  3. Reversible (CCR): Headroom's Context Compression & Retrieval system stores originals locally; the LLM can retrieve them on demand
  4. Content-aware: Different compressors for different content types, vs. one-size-fits-all LLM summarization
  5. Opt-in: Zero impact on existing users who don't enable it

Dependencies

  • headroom-ai package (Apache 2.0 license, Python >= 3.10)
  • Optional: tree-sitter for code compression (already a transitive dependency of headroom-ai)

Alternatives Considered

  1. Fix the existing compression system — addresses symptoms at the conversation level but doesn't reduce context growth rate. The two approaches are complementary.
  2. Use headroom as a proxy — headroom supports proxy mode (headroom proxy --port 8787), but this intercepts all LLM traffic and is a heavier integration. Library mode is more targeted.
  3. Build custom compressors — headroom already provides well-tested, content-specific compressors. Reinventing them would duplicate effort.

Scope

Medium — new optional feature, no breaking changes, ~200-300 lines of integration code plus config schema changes.

Related Issues

  • #23902 — premature compression trigger (headroom reduces context growth rate, making this less frequent)
  • #23767 — compression can increase prompt size (headroom's deterministic compressors don't have this problem)
  • #25585 — failed summaries discard context (headroom doesn't use LLM summarization)
  • #14690 — anti-thrashing permanently disables compression (headroom operates per-tool, not per-session)
  • #27405 — preflight guard bypasses token threshold (headroom compresses before messages reach the guard)
  • #14695 — post-compression token estimate excludes tools schema (headroom reduces tool output size, making estimates more accurate)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING