hermes - ✅(Solved) Fix scan_skill_commands: unconditionally clears _skill_commands before try block [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18656Fetched 2026-05-03 04:55:12
View on GitHub
Comments
2
Participants
2
Timeline
10
Reactions
0
Timeline (top)
labeled ×5commented ×2cross-referenced ×2closed ×1

agent/skill_commands.py:222 unconditionally sets _skill_commands = {} before entering the try block that populates it. If any exception occurs between the clear and the first successful skill addition, all skill slash commands are silently lost.

# agent/skill_commands.py:221-227
global _skill_commands
_skill_commands = {}           # ← cleared BEFORE try
try:
    from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, ...
    from agent.skill_utils import get_external_skills_dirs, iter_skill_index_files
    disabled = _get_disabled_skill_names()
    ...
except Exception:
    pass                        # ← exception swallowed, stays empty
return _skill_commands

Error Message

agent/skill_commands.py:221-227

global _skill_commands _skill_commands = {} # ← cleared BEFORE try try: from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, ... from agent.skill_utils import get_external_skills_dirs, iter_skill_index_files disabled = _get_disabled_skill_names() ... except Exception: pass # ← exception swallowed, stays empty return _skill_commands

Root Cause

agent/skill_commands.py:222 unconditionally sets _skill_commands = {} before entering the try block that populates it. If any exception occurs between the clear and the first successful skill addition, all skill slash commands are silently lost.

# agent/skill_commands.py:221-227
global _skill_commands
_skill_commands = {}           # ← cleared BEFORE try
try:
    from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, ...
    from agent.skill_utils import get_external_skills_dirs, iter_skill_index_files
    disabled = _get_disabled_skill_names()
    ...
except Exception:
    pass                        # ← exception swallowed, stays empty
return _skill_commands

Fix Action

Fix

Build the new dict in a local variable, assign to the global only on success:

def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
    global _skill_commands
    new_commands: Dict[str, Dict[str, Any]] = {}
    try:
        ...
        # build new_commands instead of _skill_commands
    except Exception:
        pass  # keep old _skill_commands intact
    else:
        _skill_commands = new_commands
    return _skill_commands

PR fix notes

PR #18669: fix(agent): preserve skill commands on scan failure + log diagnostic warning

Description (problem / solution / changelog)

Closes #18659.

Symptom

Any exception during scan_skill_commands() silently blanks all 90+ skill slash commands (/apple-notes, /claude-code, etc.) with zero user-facing error. The user sees their skills disappear but has no log line to diagnose it. Reporter's reproducer: chmod 000 ~/.hermes/skills; hermes → all /skill-name autocompletions gone.

Root cause

agent/skill_commands.py:222 unconditionally clears the global before entering the try block:

```python _skill_commands = {} # ← cleared FIRST try: from tools.skills_tool import ... ... populate _skill_commands ... except Exception: pass # ← swallowed, global stays {} return _skill_commands ```

Any failure in the imports, outer scan loop, or get_external_skills_dirs() leaves the global at {}, and the except: pass swallows the reason.

Fix

Atomic swap — build into local new_commands, only assign to the global on success. Failure returns the last-known-good mapping.

Diagnostic log — replace except Exception: pass with logger.warning(..., exc_info=True), gated by a module-level _scan_error_logged flag so get_skill_commands() hot-loop callers don't flood the log. First failure emits a full traceback; subsequent failures stay silent until a successful scan resets the flag.

Before / after

ScenarioBeforeAfter
First scan hits unreadable skills dirglobal = {}, all slash commands gone, no logglobal unchanged at initial {}, one WARNING traceback
Scan starts healthy then dir breaks mid-sessionglobal wiped to {}, 90 skills goneglobal retains last-known-good 90 entries, one WARNING traceback
Perms restored + /reload-skillsshows 90 as "added" from empty before-snapshot (misleading diff)shows correct no-op diff when state was preserved
Persistent failure, 1000 calls to get_skill_commands()1000× traceback log flood1× traceback, then silent

Tests

5 new, 10 existing → 15/15 pass on TestScanSkillCommands (40/40 on full file):

  • test_preserves_previous_mapping_on_import_failure
  • test_logs_warning_on_scan_failure
  • test_repeated_scan_failures_do_not_flood_logs
  • test_scan_error_flag_resets_on_success
  • test_successful_scan_still_replaces_mapping

Cross-refs

@alt-glitch noted #18656 (closed, same report) and #2868 (open PR addressing the logging dimension). This PR addresses the core atomic-swap bug plus the diagnostic log dimension together with hot-loop-safe guarding; happy to rebase if #2868's logging approach lands first.

Files

  • agent/skill_commands.py: +17/-3
  • tests/agent/test_skill_commands.py: +132

Changed files

  • agent/skill_commands.py (modified, +25/-4)
  • tests/agent/test_skill_commands.py (modified, +114/-0)

Code Example

# agent/skill_commands.py:221-227
global _skill_commands
_skill_commands = {}           # ← cleared BEFORE try
try:
    from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, ...
    from agent.skill_utils import get_external_skills_dirs, iter_skill_index_files
    disabled = _get_disabled_skill_names()
    ...
except Exception:
    pass                        # ← exception swallowed, stays empty
return _skill_commands

---

def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
    global _skill_commands
    new_commands: Dict[str, Dict[str, Any]] = {}
    try:
        ...
        # build new_commands instead of _skill_commands
    except Exception:
        pass  # keep old _skill_commands intact
    else:
        _skill_commands = new_commands
    return _skill_commands
RAW_BUFFERClick to expand / collapse

Description

agent/skill_commands.py:222 unconditionally sets _skill_commands = {} before entering the try block that populates it. If any exception occurs between the clear and the first successful skill addition, all skill slash commands are silently lost.

# agent/skill_commands.py:221-227
global _skill_commands
_skill_commands = {}           # ← cleared BEFORE try
try:
    from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, ...
    from agent.skill_utils import get_external_skills_dirs, iter_skill_index_files
    disabled = _get_disabled_skill_names()
    ...
except Exception:
    pass                        # ← exception swallowed, stays empty
return _skill_commands

Trigger conditions

In current production code, tools.skills_tool is already cached in sys.modules before scan_skill_commands() runs (imported during tool discovery at startup), so the specific from tools.skills_tool import ... line cannot fail. get_disabled_skill_names() and iter_skill_index_files() are also defensively written.

However, the bug can trigger when SKILLS_DIR exists but os.walk fails (permission error, broken filesystem). More importantly, it is a latent risk for any future code added between line 222 and the try block, or any refactor that changes the import order or scanning logic.

Impact

Scenario 1: startup permanently broken

  1. scan_skill_commands() called at startup -> exception -> _skill_commands = {}
  2. User runs /reload-skills -> reload_skills() snapshots before={}, scans fail again -> after={}
  3. User sees: No new skills detected. 0 skill(s) available
  4. All 90+ skill slash commands are gone with zero indication

Scenario 2: hot-reload after startup

  1. Startup succeeds -> 90 skills loaded
  2. Something causes scan_skill_commands() to fail on re-invocation
  3. /reload-skills -> before=90 skills snapshotted, after=0 -> "Removed 90 skills"
  4. User sees 90 skills removed, 0 skill(s) available -- looks intentional but cause is opaque

Fix

Build the new dict in a local variable, assign to the global only on success:

def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
    global _skill_commands
    new_commands: Dict[str, Dict[str, Any]] = {}
    try:
        ...
        # build new_commands instead of _skill_commands
    except Exception:
        pass  # keep old _skill_commands intact
    else:
        _skill_commands = new_commands
    return _skill_commands

extent analysis

TL;DR

The issue can be fixed by building the new dictionary in a local variable and assigning it to the global variable only on success.

Guidance

  • Identify the potential points of failure in the try block, such as the os.walk call, and consider adding specific error handling for these cases.
  • Verify that the except block is not swallowing important exceptions, and consider logging or re-raising the exception instead of passing silently.
  • Review the code for any other places where the _skill_commands dictionary is modified, to ensure that it is not being cleared or overwritten unexpectedly.
  • Consider adding additional logging or monitoring to detect when the scan_skill_commands function fails, to help identify and diagnose issues.

Example

The provided fix code snippet is a good example of how to build the new dictionary in a local variable and assign it to the global variable only on success:

def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
    global _skill_commands
    new_commands: Dict[str, Dict[str, Any]] = {}
    try:
        ...
        # build new_commands instead of _skill_commands
    except Exception:
        pass  # keep old _skill_commands intact
    else:
        _skill_commands = new_commands
    return _skill_commands

Notes

This fix assumes that the scan_skill_commands function is the only place where the _skill_commands dictionary is modified. If there are other places where the dictionary is modified, additional changes may be needed to ensure that the fix is effective.

Recommendation

Apply the provided workaround by building the new dictionary in a local variable and assigning it to the global variable only on success, as shown in the example code snippet. This will help prevent the loss of skill slash commands in case of an exception.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix scan_skill_commands: unconditionally clears _skill_commands before try block [1 pull requests, 2 comments, 2 participants]