hermes - ✅(Solved) Fix Skills: user-locked vs self-improving tiers + user-feedback-driven skill updates [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17583Fetched 2026-04-30 06:46:32
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
labeled ×3cross-referenced ×1

Fix Action

Fix / Workaround

  • The CLI already has a pattern for this with dangerous command approval — similar UX could apply
  • Gateway platforms (Telegram, Discord) would need a confirm/reject flow for skill diffs
  • Could tie into the existing skill_manage(action='patch') path with an approval gate
  • Related to broader drift-prevention work on longer running horizons

PR fix notes

PR #17609: feat(skills): add user-lock and approval gate for skill mutations

Description (problem / solution / changelog)

Related Issue

<!-- Link the issue this PR addresses. If no issue exists, consider creating one first. -->

Fixes #17583

Summary

The agent could silently overwrite any skill — including user-authored ones — with no visibility or approval step. This PR adds two opt-in protections to prevent uncontrolled self-improvement on skills.

Changes

-locked: true frontmatter flag — marks a skill as immutable to the agent. Any edit, patch, delete, write_file, or remove_file call on a locked skill returns an error. The agent can still read and follow it.

-skills.require_approval config key — when enabled, every agent skill mutation pauses, shows a unified diff, and waits for user confirmation before writing. CLI gets a y/N prompt; gateway platforms use /approve / /deny via the existing approval queue.

-authored_by: agent stamp — agent-created skills are automatically tagged in their frontmatter for auditability.

Validation

141/141 tests/gateway/test_api_server.py and tests/gateway/test_api_server_runs.py pass.

Changed files

  • hermes_cli/config.py (modified, +9/-0)
  • tools/skill_manager_tool.py (modified, +288/-19)
RAW_BUFFERClick to expand / collapse

Problem

  1. Self-improvement overrides manual instructions — when the agent updates its knowledge base / skills, it can hallucinate corrections that overwrite user-set instructions. There is no distinction between "user authored this, do not touch it" and "agent generated this, fair game to refine."

  2. Self-feedback loop risk — skills updated purely from the agent's own reasoning can drift or amplify errors. Users have no approval gate on how a skill was reasoned about before it gets saved.

Proposed Solution

Two-tier skill system

  • User-locked skills — authored or edited by the user, immutable to self-improvement. Agent can read and follow them but cannot modify.
  • Agent-improvable skills — generated by the agent, eligible for self-refinement but constrained by parameters/boundaries set by the user (e.g. "do not change the verification steps", "keep the API endpoint fixed").

User-feedback-driven skill updates

Instead of silently saving refined skills, the agent should:

  1. Present the reasoning chain it used to arrive at a skill update
  2. Show a diff of proposed changes
  3. Wait for user approval (or allow user edits) before persisting

This keeps the human in the loop on knowledge evolution without requiring them to author everything from scratch.

Notes

  • The CLI already has a pattern for this with dangerous command approval — similar UX could apply
  • Gateway platforms (Telegram, Discord) would need a confirm/reject flow for skill diffs
  • Could tie into the existing skill_manage(action='patch') path with an approval gate
  • Related to broader drift-prevention work on longer running horizons

extent analysis

TL;DR

Implement a two-tier skill system with user-locked and agent-improvable skills to prevent self-improvement overrides and introduce user-feedback-driven skill updates for approval and transparency.

Guidance

  • Introduce a distinction between user-authored and agent-generated skills to prevent overwrites, using a two-tier system.
  • Implement a user-feedback mechanism for agent-improvable skills, presenting the reasoning chain, proposed changes, and requiring user approval before persisting updates.
  • Leverage existing patterns, such as the CLI's dangerous command approval, to inform the UX for skill update approvals.
  • Consider integrating the approval gate with the existing skill_manage(action='patch') path for a seamless implementation.

Example

No explicit code example is provided due to the high-level nature of the issue, but the skill_manage(action='patch') path could be modified to include an approval gate, such as adding an approve parameter.

Notes

The proposed solution focuses on introducing a two-tier skill system and user-feedback-driven updates, which may require adjustments to the existing skill management infrastructure and user interface, particularly on gateway platforms like Telegram and Discord.

Recommendation

Apply the proposed two-tier skill system and user-feedback-driven skill updates to address the self-improvement override and self-feedback loop risks, as this approach introduces necessary distinctions and approval gates to ensure user control and transparency.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Skills: user-locked vs self-improving tiers + user-feedback-driven skill updates [1 pull requests, 1 participants]