hermes - ✅(Solved) Fix [Bug]: Docker agent/tool memory writes create root-owned files unreadable by gateway user [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17144Fetched 2026-04-29 06:37:05
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Timeline (top)
labeled ×5commented ×1cross-referenced ×1

In a Docker deployment using nousresearch/hermes-agent:latest / Hermes dashboard v0.11.0 / release 2026.4.23, Hermes persistent state under /opt/data is being written by agent/tool memory operations as root:root with restrictive permissions, while the gateway process runs as the hermes user / UID 10000.

This causes files such as /opt/data/memories/MEMORY.md to become unreadable by the gateway. Memory can appear to be written successfully by the agent/tool side, but the running gateway cannot reliably read/inject it into session context.

This appears reproducible after memory-tool writes/consolidation.

Error Message

Additional Logs / Traceback (optional)

Repeated cron permission error previously observed:

Root Cause

This breaks the non-root gateway process because the gateway runs as hermes / UID 10000.

Fix Action

Fixed

PR fix notes

PR #17221: fix(utils): chown atomic-replace target to match parent when root (#17144)

Description (problem / solution / changelog)

What does this PR do?

Containerized hermes deployments (e.g. nousresearch/hermes-agent:latest on Synology / Docker / Podman) typically have a root-init entrypoint (tini) that spawns the gateway as hermes / UID 10000 against a host-mounted persistent volume that is also owned by 10000. When a memory-tool consolidation, hook script, or any other atomic write happens to land on the root side of that boundary, the new file lands as root:root with restrictive perms — and the running gateway cannot read it. Memory updates appear to succeed (the agent gets a write-OK response) but the gateway never re-injects them on the next session, silently breaking the persistent-memory promise (#17144).

The fix adds a _chown_to_match_parent helper in utils.py and calls it from atomic_replace, so every atomic-write call site (memory tool, atomic_json_write, atomic_yaml_write, future writers) inherits the fix without touching every call site.

The helper is best-effort and POSIX-only: no-op on Windows, no-op when not running as root (where chown to a foreign UID would EPERM anyway), no-op when the parent directory is itself root-owned, and swallows OSError so a chown rejection from a FUSE mount or ACL-locked filesystem doesn't fail the user-visible write.

Related Issue

Fixes #17144

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • utils.py — new private helper _chown_to_match_parent(path) with os.geteuid() == 0 gate, parent-stat read, root-parent skip, and an OSError swallow. atomic_replace calls it on the resolved real path after os.replace. Docstring updated to describe the new behavior.
  • tests/test_atomic_replace_chown_parent.py (new) — four cases pinning the gating logic by monkey-patching os.geteuid / os.stat / os.chown so we don't need real root in CI:
    1. root + non-root parent → chown propagates the parent's owner/gid
    2. non-root → no chown call
    3. root + root-owned parent → no chown call
    4. root + chown EPERM → atomic_replace still commits content

How to Test

  1. Run hermes inside a container where the entrypoint is root and the gateway runs as a service user (hermes/10000). The persistent volume parent should be owned by 10000:10000.
  2. From a root-side code path (e.g. a startup hook), invoke a memory write that touches ~/.hermes/memories/MEMORY.md.
  3. Before this fix: the resulting file is root:root; docker exec hermes ls -l /opt/data/memories shows root root MEMORY.md; the gateway logs cannot read the file on the next session reset.
  4. After this fix: the file is hermes:hermes (matches parent); the gateway re-injects it normally.

Automated:

pytest tests/test_atomic_replace_chown_parent.py tests/test_atomic_replace_symlinks.py tests/hermes_cli/test_atomic_json_write.py tests/hermes_cli/test_atomic_yaml_write.py tests/tools/test_memory_tool.py -q

Result on macOS 15.6 / Python 3.14: 62 passed. The new tests fail on origin/main because no chown call is ever made (the helper does not exist there).

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/test_atomic_replace_chown_parent.py tests/test_atomic_replace_symlinks.py tests/hermes_cli/test_atomic_json_write.py tests/hermes_cli/test_atomic_yaml_write.py tests/tools/test_memory_tool.py -q (62 passed). Full pytest tests/ -q not run; the atomic-write surface above is fully exercised.
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 15.6 (Python 3.14)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A (updated atomic_replace's docstring to describe the new chown step; no user-facing docs reference atomic-write internals)
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A (N/A — no config keys touched)
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A (N/A)
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A (explicit os.name != "posix" early-return on Windows; macOS and Linux both go through the same POSIX path; the helper is also a no-op when not root, so Linux/macOS users running as themselves see zero behavior change)
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A (N/A — internal helper, not a tool)

Screenshots / Logs

$ pytest tests/test_atomic_replace_chown_parent.py tests/test_atomic_replace_symlinks.py tests/hermes_cli/test_atomic_json_write.py tests/hermes_cli/test_atomic_yaml_write.py tests/tools/test_memory_tool.py -q
............                                                             [ 19%]
..................................................                       [100%]
62 passed, 62 warnings in 2.44s

Changed files

  • tests/test_atomic_replace_chown_parent.py (added, +149/-0)
  • utils.py (modified, +53/-0)

Code Example

docker exec hermes sh -lc 'ps aux | grep -Ei "[h]ermes|[g]ateway"'

---

sudo chown 10000:10000 /volume1/docker/hermes/memories/MEMORY.md
sudo chmod 660 /volume1/docker/hermes/memories/MEMORY.md

---

docker exec hermes sh -lc 'ls -la /opt/data/memories/MEMORY.md /opt/data/memories/USER.md'

---

-rw-rw---- 1 hermes hermes ... /opt/data/memories/MEMORY.md

---

-rw------- 1 root root ... /opt/data/memories/MEMORY.md

---

-rw------- 1 root   root   2435 Apr 28 19:35 /opt/data/memories/MEMORY.md
-rw-rw---- 1 hermes hermes 1105 Apr 24 21:12 /opt/data/memories/USER.md

---

IOError reading jobs.json: [Errno 13] Permission denied: '/opt/data/cron/jobs.json'

---

Debug report generated with `docker exec -it hermes hermes debug share`.

Report:
https://paste.rs/lGUr0

Note: agent.log and gateway.log failed to upload due paste service errors. The report link auto-deletes after 6 hours.

I reviewed the report before posting. It does not appear to include raw API keys, but it does include configured-provider names, local container paths, Docker network IPs, cron job names, skill paths, and recent log excerpts.

Additional relevant evidence from the report:
- Gateway is running in Docker.
- Hermes home is `/opt/data`.
- Dashboard/Hermes version is `0.11.0` / `2026.4.23`.
- Cron job execution showed multiple permission-denied errors reading skill files under `/opt/data/skills/...`.

---

Repeated cron permission error previously observed:


IOError reading jobs.json: [Errno 13] Permission denied: '/opt/data/cron/jobs.json'


Dashboard status after API server fix:


"gateway_running": true,
"gateway_state": "running",
"gateway_platforms": {
  "telegram": { "state": "connected" },
  "api_server": { "state": "connected" }
}


The gateway itself runs successfully, but state files created/replaced by root-running agent/tool operations can become unreadable by the gateway runtime user.
RAW_BUFFERClick to expand / collapse

Bug Description

Summary

In a Docker deployment using nousresearch/hermes-agent:latest / Hermes dashboard v0.11.0 / release 2026.4.23, Hermes persistent state under /opt/data is being written by agent/tool memory operations as root:root with restrictive permissions, while the gateway process runs as the hermes user / UID 10000.

This causes files such as /opt/data/memories/MEMORY.md to become unreadable by the gateway. Memory can appear to be written successfully by the agent/tool side, but the running gateway cannot reliably read/inject it into session context.

This appears reproducible after memory-tool writes/consolidation.

Environment

  • Image: nousresearch/hermes-agent:latest
  • Dashboard version: 0.11.0
  • Release date shown by dashboard: 2026.4.23
  • Docker host: Synology NAS / DSM
  • Persistent host path: /volume1/docker/hermes
  • Container path: /opt/data
  • Main container command: gateway run
  • Dashboard container command: dashboard --host 0.0.0.0 --port 9119 --no-open --insecure
  • Gateway process runs as hermes / UID 10000
  • Container PID 1 runs via tini / entrypoint as root, then the gateway process runs as hermes

Steps to Reproduce

  1. Run Hermes in Docker with persistent host storage mounted to /opt/data.
  2. Confirm the gateway process runs as hermes / UID 10000:
docker exec hermes sh -lc 'ps aux | grep -Ei "[h]ermes|[g]ateway"'
  1. Fix memory ownership manually so it is readable by the gateway:
sudo chown 10000:10000 /volume1/docker/hermes/memories/MEMORY.md
sudo chmod 660 /volume1/docker/hermes/memories/MEMORY.md
  1. Ask the agent to write/consolidate memory using the memory tool.
  2. Check ownership again:
docker exec hermes sh -lc 'ls -la /opt/data/memories/MEMORY.md /opt/data/memories/USER.md'
  1. Observe that MEMORY.md has reverted to root:root with mode 0600.

Expected Behavior

All Hermes persistent state written under $HERMES_HOME / /opt/data should remain readable/writable by the Hermes runtime user.

Expected:

-rw-rw---- 1 hermes hermes ... /opt/data/memories/MEMORY.md

or equivalent UID/GID 10000:10000. Memory writes should not make the file unreadable by the running gateway process.

Actual Behavior

Memory/tool writes create or replace state files as:

-rw------- 1 root root ... /opt/data/memories/MEMORY.md

This breaks the non-root gateway process because the gateway runs as hermes / UID 10000.

Observed after memory write:

-rw------- 1 root   root   2435 Apr 28 19:35 /opt/data/memories/MEMORY.md
-rw-rw---- 1 hermes hermes 1105 Apr 24 21:12 /opt/data/memories/USER.md

Earlier cron symptom:

IOError reading jobs.json: [Errno 13] Permission denied: '/opt/data/cron/jobs.json'

The issue is recurring permission drift caused by root-written files under /opt/data.

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

No response

Debug Report

Debug report generated with `docker exec -it hermes hermes debug share`.

Report:
https://paste.rs/lGUr0

Note: agent.log and gateway.log failed to upload due paste service errors. The report link auto-deletes after 6 hours.

I reviewed the report before posting. It does not appear to include raw API keys, but it does include configured-provider names, local container paths, Docker network IPs, cron job names, skill paths, and recent log excerpts.

Additional relevant evidence from the report:
- Gateway is running in Docker.
- Hermes home is `/opt/data`.
- Dashboard/Hermes version is `0.11.0` / `2026.4.23`.
- Cron job execution showed multiple permission-denied errors reading skill files under `/opt/data/skills/...`.

Operating System

Synology DSM / Docker

Python Version

No response

Hermes Version

0.11.0

Additional Logs / Traceback (optional)

Repeated cron permission error previously observed:


IOError reading jobs.json: [Errno 13] Permission denied: '/opt/data/cron/jobs.json'


Dashboard status after API server fix:


"gateway_running": true,
"gateway_state": "running",
"gateway_platforms": {
  "telegram": { "state": "connected" },
  "api_server": { "state": "connected" }
}


The gateway itself runs successfully, but state files created/replaced by root-running agent/tool operations can become unreadable by the gateway runtime user.

Root Cause Analysis (optional)

The apparent root cause is a UID/GID mismatch in Docker:

  • Gateway/runtime process runs as hermes / UID 10000.
  • Agent/tool/memory write operations appear to run as root / UID 0.
  • Persistent state is shared under $HERMES_HOME / /opt/data.

When the memory tool writes or replaces /opt/data/memories/MEMORY.md, the replacement file is created as root:root with mode 0600. The gateway process later needs to read the same file as hermes but cannot.

This same pattern can affect any persistent file under /opt/data — cron state, profile dirs, generated docs, skills, cache, config backups.

Proposed Fix (optional)

Ensure all writes under $HERMES_HOME / /opt/data are performed as the Hermes runtime UID/GID, or normalize ownership/mode after writes.

Preferred behavior:

  • Memory/config/cron/state writes should be owned by hermes:hermes / UID 10000:10000.
  • Regular state files should be readable/writable by the runtime user, e.g. mode 0660.
  • Directories should be accessible by the runtime user, e.g. mode 0770.

Possible fixes:

  1. Run agent/tool write operations as UID/GID 10000:10000.
  2. Ensure memory-tool file replacement preserves existing ownership/mode.
  3. Add ownership/mode normalization after writes under $HERMES_HOME.
  4. Avoid creating persistent runtime files as root:root when the gateway runs as non-root.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

extent analysis

TL;DR

The most likely fix is to ensure all writes under $HERMES_HOME / /opt/data are performed as the Hermes runtime UID/GID, or normalize ownership/mode after writes.

Guidance

  • Verify the current ownership and permissions of the files under /opt/data to confirm the issue.
  • Consider running agent/tool write operations as UID/GID 10000:10000 to match the gateway runtime user.
  • Add ownership/mode normalization after writes under $HERMES_HOME to ensure files are readable/writable by the runtime user.
  • Review the memory-tool file replacement logic to preserve existing ownership/mode.

Example

# Example of normalizing ownership and mode after writes
sudo chown 10000:10000 /opt/data/memories/MEMORY.md
sudo chmod 0660 /opt/data/memories/MEMORY.md

Notes

The proposed fix requires careful consideration of the implications of running agent/tool write operations as a non-root user, and ensuring that the normalization of ownership/mode does not introduce security vulnerabilities.

Recommendation

Apply a workaround by normalizing ownership/mode after writes under $HERMES_HOME, as this is a more straightforward and less invasive fix compared to modifying the agent/tool write operations to run as a non-root user.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING