hermes - 💡(How to fix) Fix [Feature]: Five-Layer Context Pipeline + Plan-Mode — Coordinated Core Agent Parity with Claude Code and Codex

StepCodex · 2026-05-30T12:02:19Z

[hermes] Problem or Use Case Hermes Agent today is the most feature-rich open-source agent on the market. It beats both Claude Code and Codex on provider bread… ## Fix / Workaround **Current Hermes state:** `/plan` exists as a bundled skill that writes a markdown plan to `.hermes/plans/`. There is also a `plan-execute` skill for two-stage review. But plan mode is **not a first-class agent runtime mode** — it has no effect on the approval system, the iteration budget, or tool dispatch. The agent can be in "plan mode" (having written a plan to disk) and simultaneously running destructive terminal commands, because the runtime has no concept of "only read/think in this phase, then unlock execution after plan approval." **Kanban integration:** `kanban create --require-plan` marks a task as requiring plan approval before the worker can execute write/exec tools. The dispatcher enforces this before spawning the worker. ### Problem or Use Case Hermes Agent today is the most feature-rich open-source agent on the market. It beats both Claude Code and Codex on provider breadth, messaging platforms, voice, memory, browser automation, Kanban multi-agent orchestration, and self-improvement. But there are **three interlocking gaps in the agent runtime layer itself** — not skills, not integrations — that keep Hermes behind in head-to-head coding-agent benchmarks and user trust: --- ### Gap 1 — Binary Approval System (vs. Claude Code's 7-mode graduated trust spectrum) **Current Hermes state:** `approvals.mode` is a three-position toggle: `manual` (ask every time), `smart` (regex-based heuristic classifier), `off` (YOLO). There is no concept of *scope* (session vs. command vs. tool-class vs. path glob), no structured deny-overrides-allow layering, and no plan-scoped trust (approve the plan, auto-run the tools within it). The ACP editor bridge adds `allow_once / allow_session / allow_always`, but this is isolated to editor sessions and not surfaced to the core CLI or gateway. **What this causes:** - Users working on large refactors have to choose between constant interruptions (`manual`) and fully open (`--yolo`) — there is no middle ground that says "auto-approve edits inside `src/`, still ask about `rm -rf`". - Multi-agent Kanban workers that need different trust levels (orchestrator vs. leaf worker) must share the same global `approvals.mode`. A kanban worker running a `--yolo` session inherits the same permissions as an interactive user. - CI/CD pipelines and headless cron jobs have no way to express "approve `git commit` and `pytest` forever, but always ask before `git push` to `main`". - Community contributors consistently open issues asking for scoped YOLO (e.g. "allow all file edits, block all network calls") — this is architecturally impossible today without a real permission model. **Claude Code's approach:** Seven graduated modes (`plan → default → acceptEdits → auto → dontAsk → bypassPermissions + bubble`) layered over per-tool allow/deny/ask glob patterns and an ML-based classifier. Deny-first: a broad deny always overrides a narrow allow. Permissions are never restored on session resume. **Codex's approach:** Named permission profiles selectable at launch (`codex --profile strict`), sandbox CLI profile selection, cwd-scoped controls, and per-task permission overrides in the MultiAgentV2 config. --- ### Gap 2 — Two-Layer Context Compression (vs. Claude Code's Five-Strategy Ordered Pipeline) **Current Hermes state:** Two independent compressors fire at 50% (agent) and 85% (gateway safety net). Both use the same single strategy: LLM-based lossy summarization of the middle history. The `ContextEngine` ABC exists and is pluggable, but the built-in `ContextCompressor` has only one lever (summarize). There is no graduated approach that exhausts cheap local strategies before invoking the expensive auxiliary-LLM summarization call. **What this causes:** - A 10K-token tool output that could be trimmed with a budget reducer fires a full auxiliary LLM summarization call — a $0.002–0.01 cost hit and a 2–5 second latency spike. - Context window pressure during high-frequency Kanban worker turns causes cascade compression events, destroying task context in child agents. - Users on local models (8K–32K contexts) hit the auxiliary summarization path on nearly every task, making local deployments dramatically slower than cloud ones. - Tool output snipping (trim oversized individual results before they enter history) and temporal snipping (drop old low-value turns without summarizing) are architecturally absent — every compression event is an LLM call. **Claude Code's five-layer pipeline (cheapest-first):** 1. **Budget reduction** — truncate individual tool outputs that exceed size limits (no LLM) 2. **Snip** — drop old temporally stale turns (no LLM) 3. **Microcompact** — reduce cache overhead (no LLM) 4. **Context collapse** — managed truncation for very long histories (no LLM) 5. **Auto-compact** — semantic LLM summarization (a

hermes2026-05-30 12:02:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

Current Hermes state: /plan exists as a bundled skill that writes a markdown plan to .hermes/plans/. There is also a plan-execute skill for two-stage review. But plan mode is not a first-class agent runtime mode — it has no effect on the approval system, the iteration budget, or tool dispatch. The agent can be in "plan mode" (having written a plan to disk) and simultaneously running destructive terminal commands, because the runtime has no concept of "only read/think in this phase, then unlock execution after plan approval."

Fix Action

Fix / Workaround

Kanban integration: kanban create --require-plan marks a task as requiring plan approval before the worker can execute write/exec tools. The dispatcher enforces this before spawning the worker.

Code Example

@dataclass
class PermissionProfile:
    name: str  # "default" | "plan" | "accept-edits" | "auto" | "full-auto" | custom
    tool_rules: list[ToolRule]  # ordered, deny-first
    scope: Literal["session", "task", "permanent"]
    inheritable: bool  # subagents inherit a restricted copy
    
@dataclass 
class ToolRule:
    pattern: str        # glob: "terminal:rm*", "file:write:src/**", "*"
    decision: Literal["allow", "deny", "ask"]
    classifier: Literal["regex", "ml", "none"]  # existing smart mode = "ml"

---

hermes chat --permission-profile accept-edits
hermes chat --allow "file:write:src/**" --deny "terminal:git push*"
hermes chat --plan   # shorthand for --permission-profile plan
/permissions          # show current profile + active rules
/permissions accept-edits   # switch profile mid-session

---

approvals:
  mode: default           # backward-compat key — maps to profile name
  profile: default        # new key — takes precedence if set
  profiles:               # user-defined custom profiles
    my-safe-coding:
      tool_rules:
        - pattern: "file:write:src/**"
          decision: allow
        - pattern: "terminal:git push*"
          decision: ask
        - pattern: "*"
          decision: auto

---

class StrategyChain:
    """Ordered pipeline: cheapest strategies run first."""
    
    def compress(self, messages, token_budget) -> list[Message]:
        # Layer 1: Budget reduction — trim oversized individual tool outputs
        messages = self._budget_reduce(messages, max_tool_output=8192)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 2: Snip — drop temporally stale middle turns (no LLM)
        messages = self._snip_stale(messages, protect_head=3, protect_tail=20)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 3: Microcompact — deduplicate repeated tool schemas in history
        messages = self._dedup_schemas(messages)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 4: Hard truncation — drop oldest turns above protect_head
        messages = self._hard_truncate(messages)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 5: LLM summarization — existing ContextCompressor logic (last resort)
        return self._llm_summarize(messages)

---

compression:
  strategy: "pipeline"    # "pipeline" (new default) | "summarize" (legacy)
  budget_reduce_limit: 8192   # max tokens per tool output before Layer 1 trims
  snip_stale_after: 50        # turns before a middle turn is snip-eligible
  plan_compression_priority: high   # plan turns survive layers 1-4

---

─────────────────────────────────────────────
  PLAN READY — review .hermes/plans/task_abc.md
  
  [A] Approve and run   (switches to accept-edits)
  [E] Approve and run   (switches to default, ask on each tool)
  [R] Revise plan       (stay in plan mode, send feedback)
  [X] Abort
─────────────────────────────────────────────

---

RAW_BUFFERClick to expand / collapse

Problem or Use Case

Hermes Agent today is the most feature-rich open-source agent on the market. It beats both Claude Code and Codex on provider breadth, messaging platforms, voice, memory, browser automation, Kanban multi-agent orchestration, and self-improvement. But there are three interlocking gaps in the agent runtime layer itself — not skills, not integrations — that keep Hermes behind in head-to-head coding-agent benchmarks and user trust:

Gap 1 — Binary Approval System (vs. Claude Code's 7-mode graduated trust spectrum)

Current Hermes state: approvals.mode is a three-position toggle: manual (ask every time), smart (regex-based heuristic classifier), off (YOLO). There is no concept of scope (session vs. command vs. tool-class vs. path glob), no structured deny-overrides-allow layering, and no plan-scoped trust (approve the plan, auto-run the tools within it). The ACP editor bridge adds allow_once / allow_session / allow_always, but this is isolated to editor sessions and not surfaced to the core CLI or gateway.

What this causes:

Users working on large refactors have to choose between constant interruptions (manual) and fully open (--yolo) — there is no middle ground that says "auto-approve edits inside src/, still ask about rm -rf".
Multi-agent Kanban workers that need different trust levels (orchestrator vs. leaf worker) must share the same global approvals.mode. A kanban worker running a --yolo session inherits the same permissions as an interactive user.
CI/CD pipelines and headless cron jobs have no way to express "approve git commit and pytest forever, but always ask before git push to main".
Community contributors consistently open issues asking for scoped YOLO (e.g. "allow all file edits, block all network calls") — this is architecturally impossible today without a real permission model.

Claude Code's approach: Seven graduated modes (plan → default → acceptEdits → auto → dontAsk → bypassPermissions + bubble) layered over per-tool allow/deny/ask glob patterns and an ML-based classifier. Deny-first: a broad deny always overrides a narrow allow. Permissions are never restored on session resume.

Codex's approach: Named permission profiles selectable at launch (codex --profile strict), sandbox CLI profile selection, cwd-scoped controls, and per-task permission overrides in the MultiAgentV2 config.

Gap 2 — Two-Layer Context Compression (vs. Claude Code's Five-Strategy Ordered Pipeline)

Current Hermes state: Two independent compressors fire at 50% (agent) and 85% (gateway safety net). Both use the same single strategy: LLM-based lossy summarization of the middle history. The ContextEngine ABC exists and is pluggable, but the built-in ContextCompressor has only one lever (summarize). There is no graduated approach that exhausts cheap local strategies before invoking the expensive auxiliary-LLM summarization call.

What this causes:

A 10K-token tool output that could be trimmed with a budget reducer fires a full auxiliary LLM summarization call — a $0.002–0.01 cost hit and a 2–5 second latency spike.
Context window pressure during high-frequency Kanban worker turns causes cascade compression events, destroying task context in child agents.
Users on local models (8K–32K contexts) hit the auxiliary summarization path on nearly every task, making local deployments dramatically slower than cloud ones.
Tool output snipping (trim oversized individual results before they enter history) and temporal snipping (drop old low-value turns without summarizing) are architecturally absent — every compression event is an LLM call.

Claude Code's five-layer pipeline (cheapest-first):

Budget reduction — truncate individual tool outputs that exceed size limits (no LLM)
Snip — drop old temporally stale turns (no LLM)
Microcompact — reduce cache overhead (no LLM)
Context collapse — managed truncation for very long histories (no LLM)
Auto-compact — semantic LLM summarization (auxiliary call, last resort)

This architecture means 80%+ of compression events never touch the auxiliary model, cutting cost and latency dramatically on long sessions.

Gap 3 — Plan Mode Not Integrated Into the Agent Runtime

What this causes:

High-stakes operations (database migrations, large refactors, infrastructure changes) cannot safely use Hermes without either heavy manual supervision or --yolo.
Kanban orchestrators that want to require "plan first, then execute" for certain task types have no enforcement mechanism — the kanban skill's two-stage review is advisory, not enforced by the runtime.
New users attempting complex tasks frequently get Hermes deep into execution before realizing the plan was wrong, with no structured way to review-and-continue or review-and-abort.

Claude Code's approach: plan is one of the 7 permission modes. When active, only read/analysis tools fire; write/execute tools are blocked. User approves the plan → mode shifts to acceptEdits or default → execution proceeds within the approved scope.

Codex's approach: /goal mode locks the agent onto a target across turns. --approval-mode at launch controls whether the agent pauses for plan review before executing.

Why These Three Together, Not Separately

These three gaps are architecturally coupled:

Plan mode requires a permission system that can enforce "only reads" during the plan phase and then transition to a broader scope on approval.
The five-layer compression pipeline needs to know which context is "plan context" (should be preserved through compression) vs. "tool output noise" (candidate for budget reduction).
Kanban workers need to inherit scoped permissions from their parent orchestrator, which only works if permissions are composable objects, not a global config flag.

Implementing any one of these without the others produces an incomplete and partially broken feature. The right decomposition is:

Permission primitives (data model + enforcement layer in tools/approval.py and run_agent.py)
Compression pipeline (new strategy chain in agent/context_compressor.py, keeping the ContextEngine ABC unchanged)
Plan mode (new permission mode that uses the permission primitives, with /plan skill becoming a thin wrapper)

Proposed Solution

Part 1: Graduated Permission System

Data model — new PermissionProfile object in tools/approval.py:

@dataclass
class PermissionProfile:
    name: str  # "default" | "plan" | "accept-edits" | "auto" | "full-auto" | custom
    tool_rules: list[ToolRule]  # ordered, deny-first
    scope: Literal["session", "task", "permanent"]
    inheritable: bool  # subagents inherit a restricted copy
    
@dataclass 
class ToolRule:
    pattern: str        # glob: "terminal:rm*", "file:write:src/**", "*"
    decision: Literal["allow", "deny", "ask"]
    classifier: Literal["regex", "ml", "none"]  # existing smart mode = "ml"

Built-in profiles (backward-compatible — existing approvals.mode maps to these):

Profile	Replaces	Behavior
`plan`	—	Only read/analysis tools; write/exec blocked; user approves to advance
`default`	`manual`	Ask on dangerous commands (current behavior)
`accept-edits`	—	Auto-approve file edits; ask on exec/network
`auto`	`smart`	ML classifier decides; no interactive prompt
`full-auto`	`off`/`--yolo`	All tools auto-approved (YOLO, current behavior)

CLI changes:

hermes chat --permission-profile accept-edits
hermes chat --allow "file:write:src/**" --deny "terminal:git push*"
hermes chat --plan   # shorthand for --permission-profile plan
/permissions          # show current profile + active rules
/permissions accept-edits   # switch profile mid-session

Config:

approvals:
  mode: default           # backward-compat key — maps to profile name
  profile: default        # new key — takes precedence if set
  profiles:               # user-defined custom profiles
    my-safe-coding:
      tool_rules:
        - pattern: "file:write:src/**"
          decision: allow
        - pattern: "terminal:git push*"
          decision: ask
        - pattern: "*"
          decision: auto

Kanban integration: kanban create --permission-profile accept-edits pins a profile to a task. Worker processes inherit a restricted copy (same rules, inheritable: false stripped). Orchestrators keep their own profile. The gateway maps allow_session to the default profile's session scope.

Backward compatibility: Existing approvals.mode: manual/smart/off maps to default/auto/full-auto. All existing configs continue to work. --yolo continues to work as an alias for --permission-profile full-auto.

Part 2: Five-Layer Context Pipeline

Extend agent/context_compressor.py with a strategy chain. The ContextEngine ABC is unchanged — this is purely an implementation improvement to the default engine.

class StrategyChain:
    """Ordered pipeline: cheapest strategies run first."""
    
    def compress(self, messages, token_budget) -> list[Message]:
        # Layer 1: Budget reduction — trim oversized individual tool outputs
        messages = self._budget_reduce(messages, max_tool_output=8192)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 2: Snip — drop temporally stale middle turns (no LLM)
        messages = self._snip_stale(messages, protect_head=3, protect_tail=20)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 3: Microcompact — deduplicate repeated tool schemas in history
        messages = self._dedup_schemas(messages)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 4: Hard truncation — drop oldest turns above protect_head
        messages = self._hard_truncate(messages)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 5: LLM summarization — existing ContextCompressor logic (last resort)
        return self._llm_summarize(messages)

Plan context preservation: Messages tagged as plan content (written by /plan mode) are marked compression_priority: high and survive layers 1–4. Only layer 5 may summarize them, and the summarizer prompt explicitly says "preserve all plan decisions verbatim."

Config additions:

compression:
  strategy: "pipeline"    # "pipeline" (new default) | "summarize" (legacy)
  budget_reduce_limit: 8192   # max tokens per tool output before Layer 1 trims
  snip_stale_after: 50        # turns before a middle turn is snip-eligible
  plan_compression_priority: high   # plan turns survive layers 1-4

Backward compatibility: strategy: "summarize" restores legacy single-layer behavior. Default switches to "pipeline" only for new installs; existing configs with no strategy key keep the old behavior for one minor version, then migrate.

Part 3: Plan Mode as a First-Class Runtime Mode

When --plan / --permission-profile plan is active:

The agent loop in run_agent.py gates all write/exec tool calls: if current_profile == "plan" and tool.category in ("write", "exec", "network"), the tool call is blocked and the model receives: PLAN_MODE_ACTIVE: This tool is not available in plan mode. Complete your analysis and output a plan for user review.
The agent is nudged (via a system prompt layer appended in prompt_builder.py) to produce a structured plan in .hermes/plans/<task_id>.md.
After plan output, the CLI/TUI shows an approval prompt:

─────────────────────────────────────────────
  PLAN READY — review .hermes/plans/task_abc.md
  
  [A] Approve and run   (switches to accept-edits)
  [E] Approve and run   (switches to default, ask on each tool)
  [R] Revise plan       (stay in plan mode, send feedback)
  [X] Abort
─────────────────────────────────────────────

On [A] or [E], the session profile transitions atomically. The plan file is tagged compression_priority: high in the compressor.
Gateway platforms show this as an interactive button message (Telegram inline keyboard, Discord buttons, Slack block actions) — already supported by gateway/delivery.py.

/plan skill becomes a thin wrapper — it simply calls hermes chat --plan for the current context, so the skill behavior is unchanged but the enforcement is now real.

Kanban integration: kanban create --require-plan marks a task as requiring plan approval before the worker can execute write/exec tools. The dispatcher enforces this before spawning the worker.

Alternatives Considered

Alternative 1: Implement only YOLO scoping (allow/deny globs without full profiles)

Pros: Smaller scope, can be done in tools/approval.py without touching run_agent.py.
Cons: No plan mode, no session/task scope inheritance for Kanban, doesn't close the competitive gap. Still a binary system with one extra dimension.

Alternative 2: Use an external policy engine (OPA / Cedar)

Pros: Very expressive, industry-standard, auditable.
Cons: Adds a heavy runtime dependency, complex to integrate with the existing approval_callback pattern, overkill for agent-to-tool permission semantics. Maintenance burden for a project that runs on a $5 VPS.

Alternative 3: Implement plan mode as a skill only (no runtime enforcement)

Pros: No core changes, no backward compatibility risk.
Cons: Already exists (/plan skill, plan-execute skill) and already insufficient. Skill-only plan mode is advisory. Users have already reported that agents ignore it mid-task. The value of plan mode comes entirely from runtime enforcement, not from the skill wrapper.

Alternative 4: Replace the compression system with Anthropic's Context Compaction API

Pros: Simpler, server-side handling, battle-tested.
Cons: Anthropic-only. Hermes's core value is provider-agnosticism. This approach breaks all non-Anthropic providers, breaks local model users, and removes configurability. The five-layer pipeline achieves the same cost savings while remaining fully provider-agnostic.

Alternative 5: Ship the three items in separate releases

Pros: Smaller PRs, easier review.
Cons: As explained in the problem section, these are architecturally coupled. Plan mode without permission primitives is just the existing skill. The compression pipeline's plan-context preservation is useless without plan mode tagging content. Shipping separately means shipping incomplete features twice.

Feature Type

New bundled skill

Scope

Medium (few files, < 300 lines)

Contribution

I'd like to implement this myself and submit a PR

Debug Report (optional)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature]: Five-Layer Context Pipeline + Plan-Mode — Coordinated Core Agent Parity with Claude Code and Codex

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Problem or Use Case

Gap 1 — Binary Approval System (vs. Claude Code's 7-mode graduated trust spectrum)

Gap 2 — Two-Layer Context Compression (vs. Claude Code's Five-Strategy Ordered Pipeline)

Gap 3 — Plan Mode Not Integrated Into the Agent Runtime

Why These Three Together, Not Separately

Proposed Solution

Part 1: Graduated Permission System

Part 2: Five-Layer Context Pipeline

Part 3: Plan Mode as a First-Class Runtime Mode

Alternatives Considered

Alternative 1: Implement only YOLO scoping (allow/deny globs without full profiles)

Alternative 2: Use an external policy engine (OPA / Cedar)

Alternative 3: Implement plan mode as a skill only (no runtime enforcement)

Alternative 4: Replace the compression system with Anthropic's Context Compaction API

Alternative 5: Ship the three items in separate releases

Feature Type

Scope

Contribution

Debug Report (optional)

Still need to ship something?

TRENDING