hermes - 💡(How to fix) Fix `--replace` cross-kills sibling gateways when multiple HERMES_PROFILE values share one HERMES_HOME

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When multiple gateway instances run on the same host with different HERMES_PROFILE values but a shared HERMES_HOME (or unset HERMES_HOME, where the default ~/.hermes is shared by all instances of the same user), hermes gateway run --replace SIGKILLs sibling gateways instead of replacing only the matching one.

Root cause: the gateway pidfile resolver at gateway/status.py:45-47 (v2026.5.16) returns get_hermes_home() / "gateway.pid" — pidfile is scoped to HERMES_HOME, not HERMES_PROFILE. Two gateways running under the same HERMES_HOME but different HERMES_PROFILE values share {HERMES_HOME}/gateway.pid. The newer instance's --replace reads that pidfile, finds the sibling's PID, and SIGKILLs it.

The design intent is documented at gateway/run.py:17766-17769:

Prevent two gateways from running under the same HERMES_HOME. The PID file is scoped to HERMES_HOME, so future multi-profile setups (each profile using a distinct HERMES_HOME) will naturally allow concurrent instances without tripping this guard.

That contract is correct, but it is not enforced or surfaced to the operator. A user who sets HERMES_PROFILE=foo in one systemd unit and HERMES_PROFILE=bar in another, without also setting per-instance HERMES_HOME, gets a silent cross-kill the moment either unit restarts with --replace.

Error Message

  • (b) Fail-loud at startup when HERMES_PROFILE is set but HERMES_HOME matches another running gateway's HERMES_HOME, with a clear error: "another gateway is running under this HERMES_HOME with a different HERMES_PROFILE; set a distinct HERMES_HOME for each profile." Pidfile is at {HERMES_HOME}/gateway.pid regardless of HERMES_PROFILE. --replace reads it and kills whatever PID is there, with no awareness of profile boundaries. No warning, no error — just a SIGKILL.

Root Cause

Root cause: the gateway pidfile resolver at gateway/status.py:45-47 (v2026.5.16) returns get_hermes_home() / "gateway.pid" — pidfile is scoped to HERMES_HOME, not HERMES_PROFILE. Two gateways running under the same HERMES_HOME but different HERMES_PROFILE values share {HERMES_HOME}/gateway.pid. The newer instance's --replace reads that pidfile, finds the sibling's PID, and SIGKILLs it.

Fix Action

Fix / Workaround

Workaround currently running in production

  • Hermes Agent v0.14.0 (v2026.5.16, commit a91a57fa5a13d516c38b07a141a9ce8a3daabeb0)
  • Python 3.11.15
  • Linux x86_64, systemd-managed gateway units
  • Six gateway instances on one host, each with a distinct HERMES_PROFILE and (after workaround) a distinct HERMES_HOME

Code Example

# /etc/systemd/system/hermes-gateway-alpha.service
Environment=HERMES_HOME=/var/lib/hermes-alpha
Environment=HERMES_PROFILE=alpha

# /etc/systemd/system/hermes-gateway-beta.service
Environment=HERMES_HOME=/var/lib/hermes-beta
Environment=HERMES_PROFILE=beta

---

ln -s /var/lib/hermes-shared/profiles/alpha /var/lib/hermes-alpha/profiles/alpha
ln -s /var/lib/hermes-shared/profiles/beta  /var/lib/hermes-beta/profiles/beta
RAW_BUFFERClick to expand / collapse

Summary

When multiple gateway instances run on the same host with different HERMES_PROFILE values but a shared HERMES_HOME (or unset HERMES_HOME, where the default ~/.hermes is shared by all instances of the same user), hermes gateway run --replace SIGKILLs sibling gateways instead of replacing only the matching one.

Root cause: the gateway pidfile resolver at gateway/status.py:45-47 (v2026.5.16) returns get_hermes_home() / "gateway.pid" — pidfile is scoped to HERMES_HOME, not HERMES_PROFILE. Two gateways running under the same HERMES_HOME but different HERMES_PROFILE values share {HERMES_HOME}/gateway.pid. The newer instance's --replace reads that pidfile, finds the sibling's PID, and SIGKILLs it.

The design intent is documented at gateway/run.py:17766-17769:

Prevent two gateways from running under the same HERMES_HOME. The PID file is scoped to HERMES_HOME, so future multi-profile setups (each profile using a distinct HERMES_HOME) will naturally allow concurrent instances without tripping this guard.

That contract is correct, but it is not enforced or surfaced to the operator. A user who sets HERMES_PROFILE=foo in one systemd unit and HERMES_PROFILE=bar in another, without also setting per-instance HERMES_HOME, gets a silent cross-kill the moment either unit restarts with --replace.

Steps to reproduce

  1. Install Hermes v2026.5.16. Create two profiles alpha and beta under ~/.hermes/profiles/.
  2. Start gateway A as HERMES_PROFILE=alpha hermes gateway run --replace (PID 1000).
  3. Start gateway B as HERMES_PROFILE=beta hermes gateway run --replace (PID 2000).
  4. Gateway B reads ~/.hermes/gateway.pid (contains 1000), SIGKILLs PID 1000, writes 2000.
  5. Gateway A is now dead. Telegram (or other transport) on gateway A starts returning 409 Conflict until the systemd unit restarts the process — at which point gateway A reads ~/.hermes/gateway.pid (contains 2000), SIGKILLs gateway B, and the cycle repeats.

Expected behavior

Either of:

  • (a) Scope the pidfile to the active profile when HERMES_PROFILE is set — e.g., {HERMES_HOME}/profiles/{HERMES_PROFILE}/gateway.pid when HERMES_PROFILE is set, else fall back to the current {HERMES_HOME}/gateway.pid.
  • (b) Fail-loud at startup when HERMES_PROFILE is set but HERMES_HOME matches another running gateway's HERMES_HOME, with a clear error: "another gateway is running under this HERMES_HOME with a different HERMES_PROFILE; set a distinct HERMES_HOME for each profile."

Option (a) matches what most operators intuitively expect from HERMES_PROFILE. Option (b) is the smaller, more conservative change that preserves the current "one HERMES_HOME, one gateway" contract.

Actual behavior

Pidfile is at {HERMES_HOME}/gateway.pid regardless of HERMES_PROFILE. --replace reads it and kills whatever PID is there, with no awareness of profile boundaries. No warning, no error — just a SIGKILL.

Workaround currently running in production

Set a distinct HERMES_HOME per gateway instance:

# /etc/systemd/system/hermes-gateway-alpha.service
Environment=HERMES_HOME=/var/lib/hermes-alpha
Environment=HERMES_PROFILE=alpha

# /etc/systemd/system/hermes-gateway-beta.service
Environment=HERMES_HOME=/var/lib/hermes-beta
Environment=HERMES_PROFILE=beta

Then symlink the shared profile sources into each HERMES_HOME/profiles/ so plugin/skill source stays single-sourced:

ln -s /var/lib/hermes-shared/profiles/alpha /var/lib/hermes-alpha/profiles/alpha
ln -s /var/lib/hermes-shared/profiles/beta  /var/lib/hermes-beta/profiles/beta

Result: each gateway gets its own pidfile path (/var/lib/hermes-alpha/gateway.pid, /var/lib/hermes-beta/gateway.pid), --replace is profile-isolated, and six gateways have run stable on one host for the verification window.

Why this is worth a fix upstream

Multi-profile-on-one-host is a growing pattern. The existing fallback warning at hermes_constants.py:67-92 already shows the maintainers anticipated profile/HERMES_HOME drift — but it only fires when HERMES_HOME is unset and a non-default profile is active. It does not fire when two gateways share the same HERMES_HOME with different HERMES_PROFILE values, which is the exact scenario that bites.

Symptom is also misleading: the surviving gateway looks healthy, and the dead gateway's transport returns 409 Conflict (in the Telegram case) which initially looks like a token/auth issue — not a pidfile collision.

Environment

  • Hermes Agent v0.14.0 (v2026.5.16, commit a91a57fa5a13d516c38b07a141a9ce8a3daabeb0)
  • Python 3.11.15
  • Linux x86_64, systemd-managed gateway units
  • Six gateway instances on one host, each with a distinct HERMES_PROFILE and (after workaround) a distinct HERMES_HOME

Code references

  • gateway/status.py:45-47get_pid_file_path() returns get_hermes_home() / "gateway.pid"
  • gateway/run.py:17766-17769 — comment documenting the "one HERMES_HOME, one gateway" contract
  • gateway/run.py:17778-17872 — the --replace branch that reads the pidfile and kills the existing PID
  • hermes_constants.py:43-105get_hermes_home() resolver and the existing one-shot fallback warning

PR offer

Happy to send a PR for either option. Lean toward (b) as the conservative fix — it matches the documented design contract and surfaces the misconfiguration loudly instead of silently re-interpreting HERMES_PROFILE. Open to feedback on which direction the maintainers prefer.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Either of:

  • (a) Scope the pidfile to the active profile when HERMES_PROFILE is set — e.g., {HERMES_HOME}/profiles/{HERMES_PROFILE}/gateway.pid when HERMES_PROFILE is set, else fall back to the current {HERMES_HOME}/gateway.pid.
  • (b) Fail-loud at startup when HERMES_PROFILE is set but HERMES_HOME matches another running gateway's HERMES_HOME, with a clear error: "another gateway is running under this HERMES_HOME with a different HERMES_PROFILE; set a distinct HERMES_HOME for each profile."

Option (a) matches what most operators intuitively expect from HERMES_PROFILE. Option (b) is the smaller, more conservative change that preserves the current "one HERMES_HOME, one gateway" contract.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING