hermes - ✅(Solved) Fix Skills: user-locked vs self-improving tiers + user-feedback-driven skill updates [1 pull requests, 1 participants]

alt-glitch · 2026-04-29T17:56:34Z

[hermes] PR 17609: feat skills : add user-lock and approval gate for skill mutations - Repository: NousResearch/hermes-agent - Author: CaptainTimon - State: op… # PR #17609: feat(skills): add user-lock and approval gate for skill mutations - Repository: NousResearch/hermes-agent - Author: CaptainTimon - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/17609 ## Description (problem / solution / changelog) ## Related Issue Fixes #17583 ## Summary The agent could silently overwrite any skill — including user-authored ones — with no visibility or approval step. This PR adds two opt-in protections to prevent uncontrolled self-improvement on skills. ## Changes -locked: true frontmatter flag — marks a skill as immutable to the agent. Any edit, patch, delete, write_file, or remove_file call on a locked skill returns an error. The agent can still read and follow it. -skills.require_approval config key — when enabled, every agent skill mutation pauses, shows a unified diff, and waits for user confirmation before writing. CLI gets a y/N prompt; gateway platforms use /approve / /deny via the existing approval queue. -authored_by: agent stamp — agent-created skills are automatically tagged in their frontmatter for auditability. ## Validation 141/141 tests/gateway/test_api_server.py and tests/gateway/test_api_server_runs.py pass. ## Changed files - `hermes_cli/config.py` (modified, +9/-0) - `tools/skill_manager_tool.py` (modified, +288/-19) ## Fix / Workaround - The CLI already has a pattern for this with dangerous command approval — similar UX could apply - Gateway platforms (Telegram, Discord) would need a confirm/reject flow for skill diffs - Could tie into the existing `skill_manage(action='patch')` path with an approval gate - Related to broader drift-prevention work on longer running horizons # Problem 1. **Self-improvement overrides manual instructions** — when the agent updates its knowledge base / skills, it can hallucinate corrections that overwrite user-set instructions. There is no distinction between "user authored this, do not touch it" and "agent generated this, fair game to refine." 2. **Self-feedback loop risk** — skills updated purely from the agent's own reasoning can drift or amplify errors. Users have no approval gate on *how* a skill was reasoned about before it gets saved. ## Proposed Solution ### Two-tier skill system - **User-locked skills** — authored or edited by the user, immutable to self-improvement. Agent can read and follow them but cannot modify. - **Agent-improvable skills** — generated by the agent, eligible for self-refinement but constrained by parameters/boundaries set by the user (e.g. "do not change the verification steps", "keep the API endpoint fixed"). ### User-feedback-driven skill updates Instead of silently saving refined skills, the agent should: 1. Present the reasoning chain it used to arrive at a skill update 2. Show a diff of proposed changes 3. Wait for user approval (or allow user edits) before persisting This keeps the human in the loop on knowledge evolution without requiring them to author everything from scratch. ## Notes - The CLI already has a pattern for this with dangerous command approval — similar UX could apply - Gateway platforms (Telegram, Discord) would need a confirm/reject flow for skill diffs - Could tie into the existing `skill_manage(action='patch')` path with an approval gate - Related to broader drift-prevention work on longer running horizons

hermes2026-04-29 17:56:34

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#17583•Fetched 2026-04-30 06:46:32

View on GitHub

Comments

Participants

Timeline

Reactions

Author

alt-glitch

Participants

alt-glitch

Timeline (top)

labeled ×3cross-referenced ×1

Fix Action

Fix / Workaround

The CLI already has a pattern for this with dangerous command approval — similar UX could apply
Gateway platforms (Telegram, Discord) would need a confirm/reject flow for skill diffs
Could tie into the existing skill_manage(action='patch') path with an approval gate
Related to broader drift-prevention work on longer running horizons

PR fix notes

PR #17609: feat(skills): add user-lock and approval gate for skill mutations

Repository: NousResearch/hermes-agent
Author: CaptainTimon
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/17609

Description (problem / solution / changelog)

Related Issue

Fixes #17583

Summary

The agent could silently overwrite any skill — including user-authored ones — with no visibility or approval step. This PR adds two opt-in protections to prevent uncontrolled self-improvement on skills.

Changes

-locked: true frontmatter flag — marks a skill as immutable to the agent. Any edit, patch, delete, write_file, or remove_file call on a locked skill returns an error. The agent can still read and follow it.

-skills.require_approval config key — when enabled, every agent skill mutation pauses, shows a unified diff, and waits for user confirmation before writing. CLI gets a y/N prompt; gateway platforms use /approve / /deny via the existing approval queue.

-authored_by: agent stamp — agent-created skills are automatically tagged in their frontmatter for auditability.

Validation

141/141 tests/gateway/test_api_server.py and tests/gateway/test_api_server_runs.py pass.

Changed files

hermes_cli/config.py (modified, +9/-0)
tools/skill_manager_tool.py (modified, +288/-19)

RAW_BUFFERClick to expand / collapse

Problem

Self-improvement overrides manual instructions — when the agent updates its knowledge base / skills, it can hallucinate corrections that overwrite user-set instructions. There is no distinction between "user authored this, do not touch it" and "agent generated this, fair game to refine."
Self-feedback loop risk — skills updated purely from the agent's own reasoning can drift or amplify errors. Users have no approval gate on how a skill was reasoned about before it gets saved.

Proposed Solution

Two-tier skill system

User-locked skills — authored or edited by the user, immutable to self-improvement. Agent can read and follow them but cannot modify.
Agent-improvable skills — generated by the agent, eligible for self-refinement but constrained by parameters/boundaries set by the user (e.g. "do not change the verification steps", "keep the API endpoint fixed").

User-feedback-driven skill updates

Instead of silently saving refined skills, the agent should:

Present the reasoning chain it used to arrive at a skill update
Show a diff of proposed changes
Wait for user approval (or allow user edits) before persisting

This keeps the human in the loop on knowledge evolution without requiring them to author everything from scratch.

Notes

The CLI already has a pattern for this with dangerous command approval — similar UX could apply
Gateway platforms (Telegram, Discord) would need a confirm/reject flow for skill diffs
Could tie into the existing skill_manage(action='patch') path with an approval gate
Related to broader drift-prevention work on longer running horizons

extent analysis

TL;DR

Implement a two-tier skill system with user-locked and agent-improvable skills to prevent self-improvement overrides and introduce user-feedback-driven skill updates for approval and transparency.

Guidance

Introduce a distinction between user-authored and agent-generated skills to prevent overwrites, using a two-tier system.
Implement a user-feedback mechanism for agent-improvable skills, presenting the reasoning chain, proposed changes, and requiring user approval before persisting updates.
Leverage existing patterns, such as the CLI's dangerous command approval, to inform the UX for skill update approvals.
Consider integrating the approval gate with the existing skill_manage(action='patch') path for a seamless implementation.

Example

No explicit code example is provided due to the high-level nature of the issue, but the skill_manage(action='patch') path could be modified to include an approval gate, such as adding an approve parameter.

Notes

The proposed solution focuses on introducing a two-tier skill system and user-feedback-driven updates, which may require adjustments to the existing skill management infrastructure and user interface, particularly on gateway platforms like Telegram and Discord.

Recommendation

Apply the proposed two-tier skill system and user-feedback-driven skill updates to address the self-improvement override and self-feedback loop risks, as this approach introduces necessary distinctions and approval gates to ensure user control and transparency.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #tokenizer error #prompt formatting #chain error #conversation history

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix Skills: user-locked vs self-improving tiers + user-feedback-driven skill updates [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

PR fix notes

PR #17609: feat(skills): add user-lock and approval gate for skill mutations

Description (problem / solution / changelog)

Related Issue

Summary

Changes

Validation

Changed files

Problem

Proposed Solution

Two-tier skill system

User-feedback-driven skill updates

Notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix Skills: user-locked vs self-improving tiers + user-feedback-driven skill updates [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

PR fix notes

PR #17609: feat(skills): add user-lock and approval gate for skill mutations

Description (problem / solution / changelog)

Related Issue

Summary

Changes

Validation

Changed files

Problem

Proposed Solution

Two-tier skill system

User-feedback-driven skill updates

Notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING