hermes - 💡(How to fix) Fix [Bug]: s6-log lock collision in multi-container setup with shared /opt/data volume

hermes2026-05-29 08:03:12

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fix / Workaround

Workaround

Code Example

Report       https://paste.rs/OKDSr
  agent.log    https://paste.rs/5Kj7F
  gateway.log  https://paste.rs/RTWDF

---

RAW_BUFFERClick to expand / collapse

Bug Description

When running multiple Hermes containers from the same image sharing a single /opt/data bind mount (e.g., a gateway container + dashboard container + custom profile container), the 02-reconcile-profiles cont-init script in every container registers s6 service slots for all profiles -- not just the one the container actually runs. This causes s6-log instances in the "wrong" containers to crash-loop with fatal: unable to lock errors.

Environment

- Hermes Agent: v0.14.0 (Docker image nousresearch/hermes-agent:latest)
- Docker Compose multi-container setup with shared /opt/data bind mount
- 3 containers: gateway (default profile), gateway (custom profile), dashboard

Steps to Reproduce

Set up a docker-compose.yml with two or more Hermes containers sharing the same /opt/data volume:
- hermes-agent running gateway run (default profile)
- hermes-custom running hermes -p custom gateway run
- dashboard running dashboard --host 0.0.0.0 --insecure
1. Start the stack: docker compose up -d
2. Check logs on the dashboard or agent container:
  
  docker logs hermes-dashboard 2>&1 | grep s6-log

Expected Behavior

Only the container actually running a given profile's gateway should register and run the corresponding s6 service slot (and its log sub-service). Other containers should not attempt to run s6-log against the same log directory.

Actual Behavior

Every container's 02-reconcile-profiles walks all profiles under $HERMES_HOME/profiles/ and creates s6 service directories for all of them under /run/service/gateway-<name>/. The gateway service itself gets a down marker file (so it doesn't actually start in the wrong container), but the log sub-service (/run/service/gateway-<name>/log/) has no down file and always starts.

Each log sub-service runs:
sh
exec s6-setuidgid hermes s6-log 1 n10 s1000000 T "$log_dir"


s6-log tries to exclusively lock $HERMES_HOME/logs/gateways/<name>/lock. The first container to grab the lock wins; all others get:

s6-log: fatal: unable to lock /opt/data/logs/gateways/default/lock: Resource busy
s6-log: fatal: unable to lock /opt/data/logs/gateways/community/lock: Resource busy


Since s6-supervise restarts the log service on every crash, these errors repeat indefinitely in every container that doesn't own the gateway.

Affected Component

Configuration (config.yaml, .env, hermes setup)

Messaging Platform (if gateway-related)

No response

Debug Report

Report       https://paste.rs/OKDSr
  agent.log    https://paste.rs/5Kj7F
  gateway.log  https://paste.rs/RTWDF

Operating System

Ubuntu

Python Version

3.13.5.

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

In hermes_cli/container_boot.py, reconcile_profile_gateways() unconditionally registers a gateway-default slot for the root profile AND walks all named profiles under $HERMES_HOME/profiles/. There is no mechanism to filter which profiles a given container should manage. The function has no awareness of which profile the container was started with (via hermes -p <name>).

The _register_service() helper creates the down marker on the gateway service directory, but the log/ sub-directory (which gets its own s6 service with its own run script) is never given a down marker.

Proposed Fix (optional)

Option A: Scope reconciliation to the active profile

Add an environment variable (e.g., HERMES_PROFILE=default) that 02-reconcile-profiles reads. Only register the s6 slot for the profile matching $HERMES_PROFILE. The dashboard container, which runs no gateway at all, could set HERMES_PROFILE=_none or a new env var like HERMES_SKIP_RECONCILE=1 to skip reconciliation entirely.

Option B: Add down markers to log sub-services

After _register_service() creates the service directory with a down marker on the main service, also create a down marker in the log/ sub-directory. This prevents the log sub-service from starting when the gateway itself is intentionally down. This is a smaller change but doesn't prevent the unnecessary service slot creation.

Option C: Both A and B

Option A prevents unnecessary slot creation; Option B is a safety net for any case where a gateway slot is created but shouldn't auto-start its logger.

Workaround

The errors are noisy but non-functional -- gateways and the dashboard continue to work correctly. The s6-log crash loops consume negligible resources. No production impact beyond log spam.

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: s6-log lock collision in multi-container setup with shared /opt/data volume

Recommended Tools

GitHub issue graph ai analysis

Error Message

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fix / Workaround

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Still need to ship something?

TRENDING