hermes - 💡(How to fix) Fix feat: dedupe loaded skills via active skill context registry

StepCodex · 2026-05-20T12:29:18Z

[hermes] Problem Hermes currently treats skill view output like any other tool result: the full SKILL.md content is appended into the conversation history. Thi… ## Problem Hermes currently treats `skill_view()` output like any other tool result: the full `SKILL.md` content is appended into the conversation history. This is simple and auditable, but it has poor context behavior in long sessions: - Re-loading the same skill can duplicate a large block of text in the model context. - Large skills (for example `hermes-agent`) become noisy persistent tool outputs even after they have served their purpose. - The transcript and the prompt context are coupled: preserving a faithful audit trail means also preserving bulky skill bodies in the active model input. - Mandatory skill-loading policies can repeatedly insert the same static content even when the skill has not changed. - Client-side deletion/replacement of old tool outputs risks damaging prompt-cache locality if it mutates the message prefix. This is related to broader context-management work, but skills are a special case: they are named, versionable resources that can be addressed by path/hash and are often semantically closer to "attached context" than to ordinary transient tool output. ## Proposed solution Introduce a first-class **skill context registry** / **active skill block** mechanism that decouples skill content from ordinary transcript messages. High-level model: ```text Transcript / session DB: user/assistant/tool events remain honest and auditable skill_view event can be stored as compact metadata: loaded skill hermes-agent@sha256:abc123 Prompt builder: maintains an active skill context registry: hermes-agent@sha256:abc123 -> rendered SKILL.md content github-issues@sha256:def456 -> rendered SKILL.md content Model input: system prompt memory/profile active skill blocks, deduped by skill name + hash recent conversation compacted/summarized old tool outputs ``` Behavior: 1. When `skill_view(name)` is called, compute a stable hash/version of the rendered skill content, including linked file content when requested. 2. If `name@hash` is already active in the current session, do not append the full skill body again. Return/record a compact event such as: - `Skill hermes-agent@abc123 already loaded; active context unchanged.` 3. If the skill content changed, replace the active block for that skill with the new `name@hash` content. 4. The prompt builder injects active skill blocks once, in a dedicated section, rather than relying on historical tool outputs to keep them visible. 5. The session transcript can still store full tool outputs for audit/debug if desired, but prompt assembly should be able to collapse old skill-view outputs to metadata/placeholders. ## Why this is useful - Avoids repeated giant skill bodies in the active context. - Preserves a faithful transcript without forcing every historical skill body into the model input. - Makes skill freshness explicit through hashes/versions. - Gives prompt assembly a natural place to budget skill context separately from chat history and generic tool outputs. - Aligns with the direction of on-demand context, memory, and context editing used in modern coding agents. - Helps mandatory skill policies stay safe without unnecessary token bloat. ## Possible implementation shape ### Data model Add a session-scoped structure, for example: ```python active_context = { "skills": { "hermes-agent": { "hash": "sha256:abc123", "path": "/.../SKILL.md", "content": "...rendered skill content...", "loaded_at_message_id": 123, "source": "skill_view", } } } ``` This could live: - in memory during a run; - persisted in session metadata for resume; - or reconstructed from compact skill-load events in the transcript. ### Prompt assembly Prompt builder renders a bounded block, e.g.: ```markdown ## hermes-agent @ sha256:abc123 ... ## github-issues @ sha256:def456 ... ``` Budget policy could be added later: - max total active skill tokens; - max per skill; - evict least recently used skills; - keep mandatory skills pinned; - show warnings when a skill is too large. ### Tool response behavior `skill_view()` can return compact metadata when a skill is already active: ```json { "success": true, "name": "hermes-agent", "hash": "sha256:abc123", "already_active": true, "message": "Skill already active in prompt context; full body not reinserted." } ``` When content changed: ```json { "success": true, "name": "hermes-agent", "hash": "sha256:newhash", "updated_active_context": true, "content": "..." } ``` ## Open questions - Should full skill content still be stored in the session DB for audit, but excluded from model input at prompt-build time? - Should active skill context survive `/resume`? If yes, should it be restored by hash from disk rather than from old tool output? - How should linked skill files be hashed and represented? - Should skills be evictable under context pressure, or should used skil

Code Example

Transcript / session DB:
  user/assistant/tool events remain honest and auditable
  skill_view event can be stored as compact metadata:
    loaded skill hermes-agent@sha256:abc123

Prompt builder:
  maintains an active skill context registry:
    hermes-agent@sha256:abc123 -> rendered SKILL.md content
    github-issues@sha256:def456 -> rendered SKILL.md content

Model input:
  system prompt
  memory/profile
  active skill blocks, deduped by skill name + hash
  recent conversation
  compacted/summarized old tool outputs

---

active_context = {
    "skills": {
        "hermes-agent": {
            "hash": "sha256:abc123",
            "path": "/.../SKILL.md",
            "content": "...rendered skill content...",
            "loaded_at_message_id": 123,
            "source": "skill_view",
        }
    }
}

---

<active_skills>
## hermes-agent @ sha256:abc123
...

## github-issues @ sha256:def456
...
</active_skills>

---

{
  "success": true,
  "name": "hermes-agent",
  "hash": "sha256:abc123",
  "already_active": true,
  "message": "Skill already active in prompt context; full body not reinserted."
}

---

{
  "success": true,
  "name": "hermes-agent",
  "hash": "sha256:newhash",
  "updated_active_context": true,
  "content": "..."
}

Problem

Hermes currently treats skill_view() output like any other tool result: the full SKILL.md content is appended into the conversation history. This is simple and auditable, but it has poor context behavior in long sessions:

Re-loading the same skill can duplicate a large block of text in the model context.
Large skills (for example hermes-agent) become noisy persistent tool outputs even after they have served their purpose.
The transcript and the prompt context are coupled: preserving a faithful audit trail means also preserving bulky skill bodies in the active model input.
Mandatory skill-loading policies can repeatedly insert the same static content even when the skill has not changed.
Client-side deletion/replacement of old tool outputs risks damaging prompt-cache locality if it mutates the message prefix.

This is related to broader context-management work, but skills are a special case: they are named, versionable resources that can be addressed by path/hash and are often semantically closer to "attached context" than to ordinary transient tool output.

Proposed solution

Introduce a first-class skill context registry / active skill block mechanism that decouples skill content from ordinary transcript messages.

High-level model:

Transcript / session DB:
  user/assistant/tool events remain honest and auditable
  skill_view event can be stored as compact metadata:
    loaded skill hermes-agent@sha256:abc123

Prompt builder:
  maintains an active skill context registry:
    hermes-agent@sha256:abc123 -> rendered SKILL.md content
    github-issues@sha256:def456 -> rendered SKILL.md content

Model input:
  system prompt
  memory/profile
  active skill blocks, deduped by skill name + hash
  recent conversation
  compacted/summarized old tool outputs

Behavior:

When skill_view(name) is called, compute a stable hash/version of the rendered skill content, including linked file content when requested.
If name@hash is already active in the current session, do not append the full skill body again. Return/record a compact event such as:
- Skill hermes-agent@abc123 already loaded; active context unchanged.
If the skill content changed, replace the active block for that skill with the new name@hash content.
The prompt builder injects active skill blocks once, in a dedicated section, rather than relying on historical tool outputs to keep them visible.
The session transcript can still store full tool outputs for audit/debug if desired, but prompt assembly should be able to collapse old skill-view outputs to metadata/placeholders.

Why this is useful

Avoids repeated giant skill bodies in the active context.
Preserves a faithful transcript without forcing every historical skill body into the model input.
Makes skill freshness explicit through hashes/versions.
Gives prompt assembly a natural place to budget skill context separately from chat history and generic tool outputs.
Aligns with the direction of on-demand context, memory, and context editing used in modern coding agents.
Helps mandatory skill policies stay safe without unnecessary token bloat.

Possible implementation shape

Data model

Add a session-scoped structure, for example:

active_context = {
    "skills": {
        "hermes-agent": {
            "hash": "sha256:abc123",
            "path": "/.../SKILL.md",
            "content": "...rendered skill content...",
            "loaded_at_message_id": 123,
            "source": "skill_view",
        }
    }
}

This could live:

in memory during a run;
persisted in session metadata for resume;
or reconstructed from compact skill-load events in the transcript.

Prompt assembly

Prompt builder renders a bounded block, e.g.:

<active_skills>
## hermes-agent @ sha256:abc123
...

## github-issues @ sha256:def456
...
</active_skills>

Budget policy could be added later:

max total active skill tokens;
max per skill;
evict least recently used skills;
keep mandatory skills pinned;
show warnings when a skill is too large.

Tool response behavior

skill_view() can return compact metadata when a skill is already active:

{
  "success": true,
  "name": "hermes-agent",
  "hash": "sha256:abc123",
  "already_active": true,
  "message": "Skill already active in prompt context; full body not reinserted."
}

When content changed:

{
  "success": true,
  "name": "hermes-agent",
  "hash": "sha256:newhash",
  "updated_active_context": true,
  "content": "..."
}

Open questions

Should full skill content still be stored in the session DB for audit, but excluded from model input at prompt-build time?
Should active skill context survive /resume? If yes, should it be restored by hash from disk rather than from old tool output?
How should linked skill files be hashed and represented?
Should skills be evictable under context pressure, or should used skills remain pinned until /clear/new session?
How should this interact with prompt caching? A dedicated block near the system prompt may improve dedupe but can also move cache boundaries if it changes frequently.
Should this be implemented as part of a general attached-context/resource system rather than skill-specific machinery?

Related issues

#10164 — context-aware skills prompt and system prompt budget management
#2045 — Lazy skill loading: remove skill listing from system prompt, use on-demand tool instead
#526 — Anthropic Context Editing API Integration — Server-Side, Cache-Friendly Tool/Thinking Cleanup for Claude Models
#513 — Two-Phase Context Management — Prune Tool Outputs Before Full Compaction
#415 — Insertion-Time Tool Result Trimming — Cache-Friendly Context Management
#15618 — Expose real prompt context usage and compaction metadata in API run events
#14603 — Context compaction SUMMARY_PREFIX causes cross-session task leakage
#17344 — Context compression + session resume causes model to re-execute the original first task instead of continuing from compressed state
#26730 — Kanban workers fail due to context pollution from parent session

Prior art / references

Claude Code skills docs: https://docs.anthropic.com/en/docs/claude-code/skills
- Skill bodies are on-demand context rather than always-on prompt text.
Claude Code extension overview: https://code.claude.com/docs/en/features-overview
- Separates always-on CLAUDE.md, on-demand skills, subagents for isolation, hooks, and MCP.
Anthropic context management announcement: https://www.anthropic.com/news/context-management
- Context editing removes stale tool calls/results while preserving conversation flow.
- Memory tool stores information outside the active context window.
Anthropic effective context engineering: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
- Recommends the smallest high-signal context set and managing system instructions, tools, memory, external data, and history as one context state.

Acceptance criteria

A minimal version would be successful if:

Loading the same skill twice in one session does not duplicate the full skill body in model input.
Prompt assembly includes only the latest active version of each loaded skill.
Skill freshness is tracked with a content hash or equivalent version marker.
Session transcript remains auditable, but model input can collapse old skill-view outputs to metadata/placeholders.
Existing skill behavior remains backward-compatible when the feature is disabled.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering