hermes - 💡(How to fix) Fix [Feature]: Five-Layer Context Pipeline + Plan-Mode — Coordinated Core Agent Parity with Claude Code and Codex

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Current Hermes state: /plan exists as a bundled skill that writes a markdown plan to .hermes/plans/. There is also a plan-execute skill for two-stage review. But plan mode is not a first-class agent runtime mode — it has no effect on the approval system, the iteration budget, or tool dispatch. The agent can be in "plan mode" (having written a plan to disk) and simultaneously running destructive terminal commands, because the runtime has no concept of "only read/think in this phase, then unlock execution after plan approval."

Fix Action

Fix / Workaround

Current Hermes state: /plan exists as a bundled skill that writes a markdown plan to .hermes/plans/. There is also a plan-execute skill for two-stage review. But plan mode is not a first-class agent runtime mode — it has no effect on the approval system, the iteration budget, or tool dispatch. The agent can be in "plan mode" (having written a plan to disk) and simultaneously running destructive terminal commands, because the runtime has no concept of "only read/think in this phase, then unlock execution after plan approval."

Kanban integration: kanban create --require-plan marks a task as requiring plan approval before the worker can execute write/exec tools. The dispatcher enforces this before spawning the worker.

Code Example

@dataclass
class PermissionProfile:
    name: str  # "default" | "plan" | "accept-edits" | "auto" | "full-auto" | custom
    tool_rules: list[ToolRule]  # ordered, deny-first
    scope: Literal["session", "task", "permanent"]
    inheritable: bool  # subagents inherit a restricted copy
    
@dataclass 
class ToolRule:
    pattern: str        # glob: "terminal:rm*", "file:write:src/**", "*"
    decision: Literal["allow", "deny", "ask"]
    classifier: Literal["regex", "ml", "none"]  # existing smart mode = "ml"

---

hermes chat --permission-profile accept-edits
hermes chat --allow "file:write:src/**" --deny "terminal:git push*"
hermes chat --plan   # shorthand for --permission-profile plan
/permissions          # show current profile + active rules
/permissions accept-edits   # switch profile mid-session

---

approvals:
  mode: default           # backward-compat key — maps to profile name
  profile: default        # new key — takes precedence if set
  profiles:               # user-defined custom profiles
    my-safe-coding:
      tool_rules:
        - pattern: "file:write:src/**"
          decision: allow
        - pattern: "terminal:git push*"
          decision: ask
        - pattern: "*"
          decision: auto

---

class StrategyChain:
    """Ordered pipeline: cheapest strategies run first."""
    
    def compress(self, messages, token_budget) -> list[Message]:
        # Layer 1: Budget reduction — trim oversized individual tool outputs
        messages = self._budget_reduce(messages, max_tool_output=8192)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 2: Snip — drop temporally stale middle turns (no LLM)
        messages = self._snip_stale(messages, protect_head=3, protect_tail=20)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 3: Microcompact — deduplicate repeated tool schemas in history
        messages = self._dedup_schemas(messages)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 4: Hard truncation — drop oldest turns above protect_head
        messages = self._hard_truncate(messages)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 5: LLM summarization — existing ContextCompressor logic (last resort)
        return self._llm_summarize(messages)

---

compression:
  strategy: "pipeline"    # "pipeline" (new default) | "summarize" (legacy)
  budget_reduce_limit: 8192   # max tokens per tool output before Layer 1 trims
  snip_stale_after: 50        # turns before a middle turn is snip-eligible
  plan_compression_priority: high   # plan turns survive layers 1-4

---

─────────────────────────────────────────────
  PLAN READY — review .hermes/plans/task_abc.md
  
  [A] Approve and run   (switches to accept-edits)
  [E] Approve and run   (switches to default, ask on each tool)
  [R] Revise plan       (stay in plan mode, send feedback)
  [X] Abort
─────────────────────────────────────────────

---
RAW_BUFFERClick to expand / collapse

Problem or Use Case

Hermes Agent today is the most feature-rich open-source agent on the market. It beats both Claude Code and Codex on provider breadth, messaging platforms, voice, memory, browser automation, Kanban multi-agent orchestration, and self-improvement. But there are three interlocking gaps in the agent runtime layer itself — not skills, not integrations — that keep Hermes behind in head-to-head coding-agent benchmarks and user trust:


Gap 1 — Binary Approval System (vs. Claude Code's 7-mode graduated trust spectrum)

Current Hermes state: approvals.mode is a three-position toggle: manual (ask every time), smart (regex-based heuristic classifier), off (YOLO). There is no concept of scope (session vs. command vs. tool-class vs. path glob), no structured deny-overrides-allow layering, and no plan-scoped trust (approve the plan, auto-run the tools within it). The ACP editor bridge adds allow_once / allow_session / allow_always, but this is isolated to editor sessions and not surfaced to the core CLI or gateway.

What this causes:

  • Users working on large refactors have to choose between constant interruptions (manual) and fully open (--yolo) — there is no middle ground that says "auto-approve edits inside src/, still ask about rm -rf".
  • Multi-agent Kanban workers that need different trust levels (orchestrator vs. leaf worker) must share the same global approvals.mode. A kanban worker running a --yolo session inherits the same permissions as an interactive user.
  • CI/CD pipelines and headless cron jobs have no way to express "approve git commit and pytest forever, but always ask before git push to main".
  • Community contributors consistently open issues asking for scoped YOLO (e.g. "allow all file edits, block all network calls") — this is architecturally impossible today without a real permission model.

Claude Code's approach: Seven graduated modes (plan → default → acceptEdits → auto → dontAsk → bypassPermissions + bubble) layered over per-tool allow/deny/ask glob patterns and an ML-based classifier. Deny-first: a broad deny always overrides a narrow allow. Permissions are never restored on session resume.

Codex's approach: Named permission profiles selectable at launch (codex --profile strict), sandbox CLI profile selection, cwd-scoped controls, and per-task permission overrides in the MultiAgentV2 config.


Gap 2 — Two-Layer Context Compression (vs. Claude Code's Five-Strategy Ordered Pipeline)

Current Hermes state: Two independent compressors fire at 50% (agent) and 85% (gateway safety net). Both use the same single strategy: LLM-based lossy summarization of the middle history. The ContextEngine ABC exists and is pluggable, but the built-in ContextCompressor has only one lever (summarize). There is no graduated approach that exhausts cheap local strategies before invoking the expensive auxiliary-LLM summarization call.

What this causes:

  • A 10K-token tool output that could be trimmed with a budget reducer fires a full auxiliary LLM summarization call — a $0.002–0.01 cost hit and a 2–5 second latency spike.
  • Context window pressure during high-frequency Kanban worker turns causes cascade compression events, destroying task context in child agents.
  • Users on local models (8K–32K contexts) hit the auxiliary summarization path on nearly every task, making local deployments dramatically slower than cloud ones.
  • Tool output snipping (trim oversized individual results before they enter history) and temporal snipping (drop old low-value turns without summarizing) are architecturally absent — every compression event is an LLM call.

Claude Code's five-layer pipeline (cheapest-first):

  1. Budget reduction — truncate individual tool outputs that exceed size limits (no LLM)
  2. Snip — drop old temporally stale turns (no LLM)
  3. Microcompact — reduce cache overhead (no LLM)
  4. Context collapse — managed truncation for very long histories (no LLM)
  5. Auto-compact — semantic LLM summarization (auxiliary call, last resort)

This architecture means 80%+ of compression events never touch the auxiliary model, cutting cost and latency dramatically on long sessions.


Gap 3 — Plan Mode Not Integrated Into the Agent Runtime

Current Hermes state: /plan exists as a bundled skill that writes a markdown plan to .hermes/plans/. There is also a plan-execute skill for two-stage review. But plan mode is not a first-class agent runtime mode — it has no effect on the approval system, the iteration budget, or tool dispatch. The agent can be in "plan mode" (having written a plan to disk) and simultaneously running destructive terminal commands, because the runtime has no concept of "only read/think in this phase, then unlock execution after plan approval."

What this causes:

  • High-stakes operations (database migrations, large refactors, infrastructure changes) cannot safely use Hermes without either heavy manual supervision or --yolo.
  • Kanban orchestrators that want to require "plan first, then execute" for certain task types have no enforcement mechanism — the kanban skill's two-stage review is advisory, not enforced by the runtime.
  • New users attempting complex tasks frequently get Hermes deep into execution before realizing the plan was wrong, with no structured way to review-and-continue or review-and-abort.

Claude Code's approach: plan is one of the 7 permission modes. When active, only read/analysis tools fire; write/execute tools are blocked. User approves the plan → mode shifts to acceptEdits or default → execution proceeds within the approved scope.

Codex's approach: /goal mode locks the agent onto a target across turns. --approval-mode at launch controls whether the agent pauses for plan review before executing.


Why These Three Together, Not Separately

These three gaps are architecturally coupled:

  • Plan mode requires a permission system that can enforce "only reads" during the plan phase and then transition to a broader scope on approval.
  • The five-layer compression pipeline needs to know which context is "plan context" (should be preserved through compression) vs. "tool output noise" (candidate for budget reduction).
  • Kanban workers need to inherit scoped permissions from their parent orchestrator, which only works if permissions are composable objects, not a global config flag.

Implementing any one of these without the others produces an incomplete and partially broken feature. The right decomposition is:

  1. Permission primitives (data model + enforcement layer in tools/approval.py and run_agent.py)
  2. Compression pipeline (new strategy chain in agent/context_compressor.py, keeping the ContextEngine ABC unchanged)
  3. Plan mode (new permission mode that uses the permission primitives, with /plan skill becoming a thin wrapper)

Proposed Solution

Part 1: Graduated Permission System

Data model — new PermissionProfile object in tools/approval.py:

@dataclass
class PermissionProfile:
    name: str  # "default" | "plan" | "accept-edits" | "auto" | "full-auto" | custom
    tool_rules: list[ToolRule]  # ordered, deny-first
    scope: Literal["session", "task", "permanent"]
    inheritable: bool  # subagents inherit a restricted copy
    
@dataclass 
class ToolRule:
    pattern: str        # glob: "terminal:rm*", "file:write:src/**", "*"
    decision: Literal["allow", "deny", "ask"]
    classifier: Literal["regex", "ml", "none"]  # existing smart mode = "ml"

Built-in profiles (backward-compatible — existing approvals.mode maps to these):

ProfileReplacesBehavior
planOnly read/analysis tools; write/exec blocked; user approves to advance
defaultmanualAsk on dangerous commands (current behavior)
accept-editsAuto-approve file edits; ask on exec/network
autosmartML classifier decides; no interactive prompt
full-autooff/--yoloAll tools auto-approved (YOLO, current behavior)

CLI changes:

hermes chat --permission-profile accept-edits
hermes chat --allow "file:write:src/**" --deny "terminal:git push*"
hermes chat --plan   # shorthand for --permission-profile plan
/permissions          # show current profile + active rules
/permissions accept-edits   # switch profile mid-session

Config:

approvals:
  mode: default           # backward-compat key — maps to profile name
  profile: default        # new key — takes precedence if set
  profiles:               # user-defined custom profiles
    my-safe-coding:
      tool_rules:
        - pattern: "file:write:src/**"
          decision: allow
        - pattern: "terminal:git push*"
          decision: ask
        - pattern: "*"
          decision: auto

Kanban integration: kanban create --permission-profile accept-edits pins a profile to a task. Worker processes inherit a restricted copy (same rules, inheritable: false stripped). Orchestrators keep their own profile. The gateway maps allow_session to the default profile's session scope.

Backward compatibility: Existing approvals.mode: manual/smart/off maps to default/auto/full-auto. All existing configs continue to work. --yolo continues to work as an alias for --permission-profile full-auto.


Part 2: Five-Layer Context Pipeline

Extend agent/context_compressor.py with a strategy chain. The ContextEngine ABC is unchanged — this is purely an implementation improvement to the default engine.

class StrategyChain:
    """Ordered pipeline: cheapest strategies run first."""
    
    def compress(self, messages, token_budget) -> list[Message]:
        # Layer 1: Budget reduction — trim oversized individual tool outputs
        messages = self._budget_reduce(messages, max_tool_output=8192)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 2: Snip — drop temporally stale middle turns (no LLM)
        messages = self._snip_stale(messages, protect_head=3, protect_tail=20)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 3: Microcompact — deduplicate repeated tool schemas in history
        messages = self._dedup_schemas(messages)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 4: Hard truncation — drop oldest turns above protect_head
        messages = self._hard_truncate(messages)
        if self._fits(messages, token_budget):
            return messages
        
        # Layer 5: LLM summarization — existing ContextCompressor logic (last resort)
        return self._llm_summarize(messages)

Plan context preservation: Messages tagged as plan content (written by /plan mode) are marked compression_priority: high and survive layers 1–4. Only layer 5 may summarize them, and the summarizer prompt explicitly says "preserve all plan decisions verbatim."

Config additions:

compression:
  strategy: "pipeline"    # "pipeline" (new default) | "summarize" (legacy)
  budget_reduce_limit: 8192   # max tokens per tool output before Layer 1 trims
  snip_stale_after: 50        # turns before a middle turn is snip-eligible
  plan_compression_priority: high   # plan turns survive layers 1-4

Backward compatibility: strategy: "summarize" restores legacy single-layer behavior. Default switches to "pipeline" only for new installs; existing configs with no strategy key keep the old behavior for one minor version, then migrate.


Part 3: Plan Mode as a First-Class Runtime Mode

When --plan / --permission-profile plan is active:

  • The agent loop in run_agent.py gates all write/exec tool calls: if current_profile == "plan" and tool.category in ("write", "exec", "network"), the tool call is blocked and the model receives: PLAN_MODE_ACTIVE: This tool is not available in plan mode. Complete your analysis and output a plan for user review.
  • The agent is nudged (via a system prompt layer appended in prompt_builder.py) to produce a structured plan in .hermes/plans/<task_id>.md.
  • After plan output, the CLI/TUI shows an approval prompt:
─────────────────────────────────────────────
  PLAN READY — review .hermes/plans/task_abc.md
  
  [A] Approve and run   (switches to accept-edits)
  [E] Approve and run   (switches to default, ask on each tool)
  [R] Revise plan       (stay in plan mode, send feedback)
  [X] Abort
─────────────────────────────────────────────
  • On [A] or [E], the session profile transitions atomically. The plan file is tagged compression_priority: high in the compressor.
  • Gateway platforms show this as an interactive button message (Telegram inline keyboard, Discord buttons, Slack block actions) — already supported by gateway/delivery.py.

/plan skill becomes a thin wrapper — it simply calls hermes chat --plan for the current context, so the skill behavior is unchanged but the enforcement is now real.

Kanban integration: kanban create --require-plan marks a task as requiring plan approval before the worker can execute write/exec tools. The dispatcher enforces this before spawning the worker.


Alternatives Considered

Alternative 1: Implement only YOLO scoping (allow/deny globs without full profiles)

Pros: Smaller scope, can be done in tools/approval.py without touching run_agent.py.
Cons: No plan mode, no session/task scope inheritance for Kanban, doesn't close the competitive gap. Still a binary system with one extra dimension.

Alternative 2: Use an external policy engine (OPA / Cedar)

Pros: Very expressive, industry-standard, auditable.
Cons: Adds a heavy runtime dependency, complex to integrate with the existing approval_callback pattern, overkill for agent-to-tool permission semantics. Maintenance burden for a project that runs on a $5 VPS.

Alternative 3: Implement plan mode as a skill only (no runtime enforcement)

Pros: No core changes, no backward compatibility risk.
Cons: Already exists (/plan skill, plan-execute skill) and already insufficient. Skill-only plan mode is advisory. Users have already reported that agents ignore it mid-task. The value of plan mode comes entirely from runtime enforcement, not from the skill wrapper.

Alternative 4: Replace the compression system with Anthropic's Context Compaction API

Pros: Simpler, server-side handling, battle-tested.
Cons: Anthropic-only. Hermes's core value is provider-agnosticism. This approach breaks all non-Anthropic providers, breaks local model users, and removes configurability. The five-layer pipeline achieves the same cost savings while remaining fully provider-agnostic.

Alternative 5: Ship the three items in separate releases

Pros: Smaller PRs, easier review.
Cons: As explained in the problem section, these are architecturally coupled. Plan mode without permission primitives is just the existing skill. The compression pipeline's plan-context preservation is useless without plan mode tagging content. Shipping separately means shipping incomplete features twice.


Feature Type

New bundled skill

Scope

Medium (few files, < 300 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Five-Layer Context Pipeline + Plan-Mode — Coordinated Core Agent Parity with Claude Code and Codex