hermes - 💡(How to fix) Fix RFC: Pluggable type-aware output-compressor pipeline for tool results

I'd like to gauge interest in upstreaming a pluggable output-compressor pipeline that detects the type of a tool result (pytest output, git diff, grep matches, docker ps, cargo test, npm install, etc.) and applies a type-specific compression strategy — preserving signal lines (tracebacks, hunk headers, error messages) while trimming noise (passing tests, unchanged lines, repetitive headers).

I've been running this in production for ~4 weeks across multiple profiles. Backed by ~50 golden-fixture regression files so a compressor change can't silently destroy a class of output.

This is runtime tool-result compression, distinct from the trajectory-compression mentioned in the README (which is for training-data export).

Error Message

tool result │ ▼ ┌────────────────────────┐ │ pattern_detector() │ ← regex + content heuristics; identifies "this looks like pytest output" └────────────────────────┘ │ ▼ detected_type = "pytest" ┌────────────────────────┐ │ compressor_registry │ ← registry of {type → compressor_fn} └────────────────────────┘ │ ▼ ┌────────────────────────┐ │ pytest_compressor() │ ← keeps FAIL lines + traceback + summary; drops PASS lines └────────────────────────┘ │ ▼ compressed result + metric {input_bytes, output_bytes, ratio, type}

Code Example

tool result
    │
    ▼
┌────────────────────────┐
│  pattern_detector()    │  ← regex + content heuristics; identifies "this looks like pytest output"
└────────────────────────┘
    │
    ▼  detected_type = "pytest"
┌────────────────────────┐
│  compressor_registry   │  ← registry of {type → compressor_fn}
└────────────────────────┘
    │
    ▼
┌────────────────────────┐
│  pytest_compressor()   │  ← keeps FAIL lines + traceback + summary; drops PASS lines
└────────────────────────┘
    │
    ▼
compressed result + metric {input_bytes, output_bytes, ratio, type}

Summary

I've been running this in production for ~4 weeks across multiple profiles. Backed by ~50 golden-fixture regression files so a compressor change can't silently destroy a class of output.

This is runtime tool-result compression, distinct from the trajectory-compression mentioned in the README (which is for training-data export).

Why

Tool results are the largest non-skill contributor to turn context in my deployments. Typical observed sizes:

Tool	Raw output	After type-aware compression
`pytest -v` (1000 tests, 3 fail)	~80 KB	~3 KB (keep failures + summary, drop pass lines)
`git diff` (medium PR)	~25 KB	~6 KB (keep hunk headers + changed lines, drop unchanged context beyond N)
`rg <pattern>` (200 matches)	~40 KB	~5 KB (keep first/last N matches + count, dedupe near-identical)
`docker ps -a` (50 containers)	~12 KB	~2 KB (tabulate, drop verbose mount lines)
`cargo test` (large workspace)	~150 KB	~8 KB (keep failed test detail, drop progress)
`pip install` (deep dep tree)	~30 KB	~1 KB (keep summary + errors)

Across a long working session this compounds — without compression the same useful info pushes 5-10× more tokens through the model. With compression, prompt-cache hit rate also improves because the noise that varies per run is what gets stripped.

Design sketch

tool result
    │
    ▼
┌────────────────────────┐
│  pattern_detector()    │  ← regex + content heuristics; identifies "this looks like pytest output"
└────────────────────────┘
    │
    ▼  detected_type = "pytest"
┌────────────────────────┐
│  compressor_registry   │  ← registry of {type → compressor_fn}
└────────────────────────┘
    │
    ▼
┌────────────────────────┐
│  pytest_compressor()   │  ← keeps FAIL lines + traceback + summary; drops PASS lines
└────────────────────────┘
    │
    ▼
compressed result + metric {input_bytes, output_bytes, ratio, type}

Plug points:

New compressors are stand-alone functions registered via decorator; no core changes needed
Per-type config (max_lines_kept, signal_patterns, dedup_threshold) in cli-config.yaml
Per-tool override possible (e.g., "for pytest from project X, use a different compressor")
Lossless mode toggle (env var or per-turn flag) disables all compression for debugging — full raw output passes through

Fixture-driven testing:

Each compressor ships with golden fixtures under fixtures/output_compressor/<type>/<scenario>.txt (input) and <type>/<scenario>.expected.txt (compressed output). Pytest enforces no-regression; adding a new compressor requires a fixture pair.

I currently have ~50 fixtures covering: pytest (pass/fail/error), git (status/diff/log), grep/rg variants, ls, docker ps, cargo (pass/fail), npm/pip install, curl JSON, ESLint, mypy, coverage, PowerShell native commands, etc.

What this is NOT

Not training-data trajectory compression (different layer — that's post-hoc, this is per-turn)
Not an LLM-based summarizer (pure regex / structured rules; ~0 added latency, ~0 added cost)
Not lossy by default for unknown types — when no compressor matches, output passes through unchanged
Not opinionated about which model / transport / profile — purely a turn-result transform

Why this isn't covered by existing features

The agent's existing context window management is line-count / token-count based, not type-aware (it truncates blindly)
Anthropic's automatic context compaction kicks in only when the window fills; this prevents the window from filling so fast
LLM-based summarization on every tool result would add latency + cost; this is sub-millisecond per call

Questions before I open a PR

In scope? Pluggable pipeline under agent/ or tools/? Or better as an optional bundled skill that wraps tool calls?
Compressor registration UX — decorator + entry-point discovery, or explicit registry list in cli-config?
Fixture format — keep as raw .txt pairs (current local form) or move to a structured YAML?
Scope split — would you prefer (a) the framework + 3-5 common compressors (pytest / git / grep / ls / docker) as Phase 1, deferring the rest? Or (b) framework only, with compressors added separately by the community?

Not opening a PR yet. Related batch: #31385 (bridge), #31387 (drift hook, withdrawn), #31388 (multi-profile memory), #31392 (task relay), and a parallel SKILL-scheduling proposal I'm filing alongside this.

Thanks!

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix RFC: Pluggable type-aware output-compressor pipeline for tool results

Recommended Tools

GitHub issue graph ai analysis