hermes - ✅(Solved) Fix Managed/shared Hermes runtime: atomic writes recreate SKILL.md and .bundled_manifest as 0600, causing permission-denied failures [2 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14181Fetched 2026-04-23 07:46:26
View on GitHub
Comments
2
Participants
2
Timeline
8
Reactions
0
Author
Participants
Timeline (top)
labeled ×4commented ×2cross-referenced ×2

Fix Action

Fix / Workaround

Reproduction sketch

  1. Use a shared HERMES_HOME with group-based access (e.g. service user + interactive user in same group).
  2. In managed/shared mode, normalize skill files and manifest to group-readable/writable (e.g. 0660 or 0640 depending policy).
  3. Trigger one of:
    • create/edit/patch a local skill via skill_manage
    • run bundled skill sync that updates .bundled_manifest
  4. Observe that the rewritten file becomes 0600.
  5. A different process/user sharing the same runtime later fails to read it.

Local workaround used here A local NixOS activation step was added to re-normalize:

  • /var/lib/hermes/.hermes/skills/**/SKILL.md -> 0660
  • /var/lib/hermes/.hermes/skills/.bundled_manifest -> 0660
  • runtime dirs -> 2770

Potential follow-up If useful, I can turn this into a PR by patching:

  • tools/skill_manager_tool.py::_atomic_write_text
  • tools/skills_sync.py::_write_manifest so they preserve existing mode and/or apply managed-mode-safe permissions after replace.

PR fix notes

PR #14280: fix(skills): preserve file modes across atomic writes

Description (problem / solution / changelog)

Summary

Preserve existing file permissions when skill writes and bundled skill manifest updates replace files atomically.

Before this change, both tools.skill_manager_tool._atomic_write_text() and tools.skills_sync._write_manifest() rewrote existing files through mkstemp() + os.replace() without restoring the original mode first. That let SKILL.md and .bundled_manifest collapse to 0600, which breaks managed/shared Hermes runtimes that expect group-readable files.

Root cause

tempfile.mkstemp() creates the temp file with owner-only permissions. os.replace() then swaps that temp file into place, so the target inherits the temp file mode unless the write path explicitly preserves the old mode.

Fix

  • capture the existing target mode before each atomic write
  • apply that mode to the temp file before os.replace()
  • reapply the mode after replacement as a defensive follow-up

Regression coverage

Added focused tests that verify both write paths preserve an existing 0664 mode, including the temp file mode at replace time:

  • tools.skill_manager_tool._atomic_write_text() for SKILL.md
  • tools.skills_sync._write_manifest() for .bundled_manifest

Testing

  • scripts/run_tests.sh tests/tools/test_atomic_write_permissions.py -q
  • scripts/run_tests.sh tests/tools/test_skill_manager_tool.py -q
  • scripts/run_tests.sh tests/tools/test_skills_sync.py -q

Closes #14181

Changed files

  • tests/tools/test_atomic_write_permissions.py (added, +54/-0)
  • tools/skill_manager_tool.py (modified, +6/-0)
  • tools/skills_sync.py (modified, +6/-0)

PR #14410: fix(skills): preserve modes during atomic writes

Description (problem / solution / changelog)

Summary

Fixes #14181.

This preserves file permissions when Hermes rewrites skill runtime files through atomic temp-file replacement.

Root cause

tempfile.mkstemp() creates replacement files as 0600. Both skill_manage writes and bundled skill manifest writes then used os.replace() directly, so existing group-readable/group-writable files could be silently recreated as owner-private files in managed/shared deployments.

Fix

  • Preserve the existing target file mode before replacing SKILL.md or .bundled_manifest.
  • For new files, apply the process umask-derived default file mode instead of leaving the mkstemp() default.
  • Add regressions for skill_manage atomic writes and skills_sync manifest writes preserving 0660 files.

Validation

  • /Users/stephenyu/Documents/hermes-agent/.venv/bin/python -m pytest tests/tools/test_skill_manager_tool.py::TestAtomicWriteText::test_preserves_existing_file_mode tests/tools/test_skills_sync.py::TestReadWriteManifest::test_write_manifest_preserves_existing_file_mode -q --tb=short
  • /Users/stephenyu/Documents/hermes-agent/.venv/bin/python -m pytest tests/tools/test_skill_manager_tool.py tests/tools/test_skills_sync.py -q --tb=short
  • git diff --check

Changed files

  • tests/tools/test_skill_manager_tool.py (modified, +22/-0)
  • tests/tools/test_skills_sync.py (modified, +16/-0)
  • tools/skill_manager_tool.py (modified, +13/-0)
  • tools/skills_sync.py (modified, +13/-0)

Code Example

268|def _atomic_write_text(file_path: Path, content: str, encoding: str = "utf-8") -> None:
281|    file_path.parent.mkdir(parents=True, exist_ok=True)
282|    fd, temp_path = tempfile.mkstemp(
283|        dir=str(file_path.parent),
284|        prefix=f".{file_path.name}.tmp.",
285|        suffix="",
286|    )
287|    try:
288|        with os.fdopen(fd, "w", encoding=encoding) as f:
289|            f.write(content)
290|        os.replace(temp_path, file_path)

---

79|def _write_manifest(entries: Dict[str, str]):
91|        fd, tmp_path = tempfile.mkstemp(
92|            dir=str(MANIFEST_FILE.parent),
93|            prefix=".bundled_manifest_",
94|            suffix=".tmp",
95|        )
96|        try:
97|            with os.fdopen(fd, "w", encoding="utf-8") as f:
98|                f.write(data)
99|                f.flush()
100|                os.fsync(f.fileno())
101|            os.replace(tmp_path, MANIFEST_FILE)

---

222|def _secure_dir(path):
225|    Skipped in managed mode — the NixOS module sets group-readable
226|    permissions (0750) so interactive users in the hermes group can
227|    share state with the gateway service.
...
273|def _secure_file(path):
276|    Skipped in managed mode — the NixOS activation script sets
277|    group-readable permissions (0640) on config files.
282|    if is_managed() or _is_container():
283|        return
RAW_BUFFERClick to expand / collapse

Title: Managed/shared Hermes runtime: atomic writes recreate SKILL.md and .bundled_manifest as 0600, causing permission-denied failures

Summary In a managed/shared-runtime deployment, Hermes can recreate files inside HERMES_HOME with owner-private permissions (0600) even when the deployment expects group-shared access. This shows up most clearly for:

  • ~/.hermes/skills/**/SKILL.md written via skill_manage
  • ~/.hermes/skills/.bundled_manifest written via skills sync

On a system where the gateway runs as one user and interactive sessions may touch the same HERMES_HOME via another user in the same group, this leads to intermittent permission-denied failures when Hermes later scans or loads skills.

This is not just old drift: files are actively recreated with 0600 after being normalized.

Observed symptoms

  • Repeated permission-denied warnings when loading skills, e.g.:
    • Failed to parse skill file .../skills/smart-home/homeassistant-on-this-box/SKILL.md: [Errno 13] Permission denied
  • .bundled_manifest reappears as 0600 after normalization
  • SKILL.md files created/edited through Hermes can end up 0600

Environment

  • NixOS managed deployment
  • Shared HERMES_HOME under /var/lib/hermes/.hermes
  • Gateway/service runs as user hermes
  • Interactive SSH sessions may run Hermes as a different user but point at the same HERMES_HOME
  • Group-sharing expected via hermes:hermes ownership and group-readable/group-writable runtime files

Why this seems upstream, not just local policy The local deployment shape is specific, but the file-creation bug is general:

  • atomic write helpers use tempfile.mkstemp(...)
  • mkstemp creates temp files with 0600
  • os.replace() preserves the temp file mode
  • result: target files silently collapse to 0600 unless chmod is explicitly restored after replace

Affected code paths

  1. tools/skill_manager_tool.py

Current atomic writer:

268|def _atomic_write_text(file_path: Path, content: str, encoding: str = "utf-8") -> None:
281|    file_path.parent.mkdir(parents=True, exist_ok=True)
282|    fd, temp_path = tempfile.mkstemp(
283|        dir=str(file_path.parent),
284|        prefix=f".{file_path.name}.tmp.",
285|        suffix="",
286|    )
287|    try:
288|        with os.fdopen(fd, "w", encoding=encoding) as f:
289|            f.write(content)
290|        os.replace(temp_path, file_path)

This path is used for SKILL.md writes and edits.

  1. tools/skills_sync.py

Current manifest writer:

79|def _write_manifest(entries: Dict[str, str]):
91|        fd, tmp_path = tempfile.mkstemp(
92|            dir=str(MANIFEST_FILE.parent),
93|            prefix=".bundled_manifest_",
94|            suffix=".tmp",
95|        )
96|        try:
97|            with os.fdopen(fd, "w", encoding="utf-8") as f:
98|                f.write(data)
99|                f.flush()
100|                os.fsync(f.fileno())
101|            os.replace(tmp_path, MANIFEST_FILE)

This recreates .bundled_manifest as 0600.

Relevant context in config code The config layer already acknowledges that managed installs want different permissions:

222|def _secure_dir(path):
225|    Skipped in managed mode — the NixOS module sets group-readable
226|    permissions (0750) so interactive users in the hermes group can
227|    share state with the gateway service.
...
273|def _secure_file(path):
276|    Skipped in managed mode — the NixOS activation script sets
277|    group-readable permissions (0640) on config files.
282|    if is_managed() or _is_container():
283|        return

So Hermes already has the concept of managed/shared runtime semantics. The atomic-write paths just do not preserve those semantics.

Reproduction sketch

  1. Use a shared HERMES_HOME with group-based access (e.g. service user + interactive user in same group).
  2. In managed/shared mode, normalize skill files and manifest to group-readable/writable (e.g. 0660 or 0640 depending policy).
  3. Trigger one of:
    • create/edit/patch a local skill via skill_manage
    • run bundled skill sync that updates .bundled_manifest
  4. Observe that the rewritten file becomes 0600.
  5. A different process/user sharing the same runtime later fails to read it.

Expected behavior In managed/shared installations, Hermes should preserve the deployment’s shared-runtime permission model after atomic writes. Rewritten runtime files should not silently fall back to owner-private 0600 unless that is explicitly the intended mode for that file.

Actual behavior Atomic replacement recreates files with mkstemp’s default 0600 mode.

Suggested fix directions Option A: make the atomic write helpers preserve target mode if the target already exists

  • stat existing file before replace
  • chmod the temp file (or final file) to match the existing mode

Option B: make atomic writes managed-aware

  • if is_managed():
    • use a managed/shared file mode policy (for example 0660 or 0640 depending file class)
    • apply chmod after os.replace

Option C: both

  • preserve existing mode when present
  • otherwise use a managed/shared default when in managed mode

At minimum, the following should probably stop being recreated as 0600 in managed mode:

  • skills/**/SKILL.md
  • skills/.bundled_manifest
  • similar runtime metadata written through atomic temp-file replacement

Why this matters This breaks a valid deployment model Hermes already partially supports:

  • managed runtime
  • group-shared state
  • service user + interactive/operator access

Even if that deployment is not the default, Hermes already has managed-mode permission branches, so preserving file modes during atomic writes seems like the right invariant.

Local workaround used here A local NixOS activation step was added to re-normalize:

  • /var/lib/hermes/.hermes/skills/**/SKILL.md -> 0660
  • /var/lib/hermes/.hermes/skills/.bundled_manifest -> 0660
  • runtime dirs -> 2770

That mitigates drift, but it is policy cleanup after the fact, not a source-level fix.

Potential follow-up If useful, I can turn this into a PR by patching:

  • tools/skill_manager_tool.py::_atomic_write_text
  • tools/skills_sync.py::_write_manifest so they preserve existing mode and/or apply managed-mode-safe permissions after replace.

extent analysis

TL;DR

The most likely fix is to modify the atomic write helpers in tools/skill_manager_tool.py and tools/skills_sync.py to preserve the existing file mode or apply a managed/shared file mode policy after os.replace.

Guidance

  • Identify the atomic write helpers in tools/skill_manager_tool.py and tools/skills_sync.py that are causing the issue.
  • Consider implementing one of the suggested fix directions:
    • Option A: preserve the target mode if the target already exists by statting the existing file before replace and chmodding the temp file or final file to match the existing mode.
    • Option B: make atomic writes managed-aware by using a managed/shared file mode policy when in managed mode.
    • Option C: combine both approaches to preserve existing mode when present and use a managed/shared default when in managed mode.
  • Verify the fix by checking the file modes of rewritten files, such as SKILL.md and .bundled_manifest, after atomic writes.

Example

def _atomic_write_text(file_path: Path, content: str, encoding: str = "utf-8") -> None:
    # ...
    fd, temp_path = tempfile.mkstemp(
        dir=str(file_path.parent),
        prefix=f".{file_path.name}.tmp.",
        suffix="",
    )
    try:
        # ...
        os.replace(temp_path, file_path)
        # Preserve existing mode or apply managed mode policy
        if file_path.exists():
            existing_mode = file_path.stat().st_mode
            os.chmod(file_path, existing_mode)
        elif is_managed():
            os.chmod(file_path, 0o0660)  # or other managed mode policy

Notes

The provided code snippets and suggestions are based on the information given in the issue and may require adjustments to fit the specific use case. It's essential to test and verify the fix to ensure it works as expected in different scenarios.

Recommendation

Apply workaround by modifying the atomic write helpers to preserve existing file modes or apply a managed/shared file mode policy, as this approach addresses the root cause of the issue and aligns with the existing managed-mode permission branches in Hermes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING