hermes - 💡(How to fix) Fix [Bug]: Two profile gateway services enter SIGTERM flap loop — `--replace` + PID file cross-profile mis-routing

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When a user installs Hermes with multiple profiles and enables a gateway systemd service for more than one of them (e.g. the default hermes-gateway.service for the deepseek profile, plus hermes-gateway-coder.service for a custom coder profile), the two services enter an unrecoverable SIGTERM restart loop within seconds of both being active.

Symptoms observed:

  • Both NRestarts counters climb continuously (97+ restarts in 15 minutes during diagnosis).
  • Gateway processes connect to their respective messaging platforms (Lark / Feishu websocket established), then are SIGTERM'd within 1–5 seconds.
  • One service's PID file ends up containing the other service's PID, causing the second service to log ❌ Gateway already running (PID xxxxx) and exit 1.
  • Messages sent to either bot get no reply — the gateway is killed before it can dispatch them through to the agent loop.
  • hermes profile list reports Gateway: stopped for a profile whose service is actually active in systemd (because the wrong PID file is being checked).

Root cause appears to be a combination of two design issues:

  1. The systemd unit template hardcodes --replace in ExecStart (hermes_cli/gateway.py:2184), which is intended for same-service takeover during gateway restart, but in practice also kills sibling profile gateways via a shared termination pathway.
  2. Somewhere in the gateway startup path, the default profile's gateway writes its PID file into a non-default profile's directory (specifically the directory of whatever profile is set in ~/.hermes/active_profile), violating the HERMES_HOME-based isolation promised by get_hermes_home() in hermes_constants.py:14-68.

The combination means both services, once enabled, kill each other on every restart cycle, and the recovery logic in --replace cannot escape the loop because the PID file mis-routing makes each side incorrectly believe the other is the same service it's supposed to take over.

A working workaround using systemd drop-in overrides is provided at the end of this issue. The fix on the Hermes side is straightforward (described in "Suggested Fixes").

Error Message

Fix 4: Detect multi-profile service coexistence and warn

Root Cause

  • Both NRestarts counters climb continuously (97+ restarts in 15 minutes during diagnosis).
  • Gateway processes connect to their respective messaging platforms (Lark / Feishu websocket established), then are SIGTERM'd within 1–5 seconds.
  • One service's PID file ends up containing the other service's PID, causing the second service to log ❌ Gateway already running (PID xxxxx) and exit 1.
  • Messages sent to either bot get no reply — the gateway is killed before it can dispatch them through to the agent loop.
  • hermes profile list reports Gateway: stopped for a profile whose service is actually active in systemd (because the wrong PID file is being checked).

Fix Action

Fix / Workaround

  • Both NRestarts counters climb continuously (97+ restarts in 15 minutes during diagnosis).
  • Gateway processes connect to their respective messaging platforms (Lark / Feishu websocket established), then are SIGTERM'd within 1–5 seconds.
  • One service's PID file ends up containing the other service's PID, causing the second service to log ❌ Gateway already running (PID xxxxx) and exit 1.
  • Messages sent to either bot get no reply — the gateway is killed before it can dispatch them through to the agent loop.
  • hermes profile list reports Gateway: stopped for a profile whose service is actually active in systemd (because the wrong PID file is being checked).

A working workaround using systemd drop-in overrides is provided at the end of this issue. The fix on the Hermes side is straightforward (described in "Suggested Fixes").

I was unable to fully isolate the offending code path before applying the workaround, but the empirical evidence is reproducible. Candidate places to investigate:

Code Example

hermes profile create coder           # or any non-default name
# Configure both profiles with different LLM providers and DIFFERENT messaging
# platform credentials (different Feishu app, different Telegram bot, etc.)
hermes config set model.provider deepseek         # for default profile
hermes --profile coder config set model.provider openrouter  # for coder profile

---

# 1. Install gateway service for default profile
hermes gateway install
# Creates: ~/.config/systemd/user/hermes-gateway.service
#   ExecStart=... -m hermes_cli.main gateway run --replace

# 2. Install gateway service for coder profile (assuming user has done this
#    via some custom mechanism — Hermes does not currently expose a
#    `hermes --profile X gateway install` command, but two services CAN
#    end up coexisting on a system through manual install or upgrades)
# Creates: ~/.config/systemd/user/hermes-gateway-coder.service
#   ExecStart=... --profile coder gateway run --replace

# 3. Set sticky default profile (this is the trigger condition)
hermes profile use coder                # writes ~/.hermes/active_profile

# 4. Enable + start both services
systemctl --user enable --now hermes-gateway hermes-gateway-coder

# 5. Watch the flap loop
watch -n 2 'for svc in hermes-gateway hermes-gateway-coder; do
  echo "$svc: $(systemctl --user is-active $svc), NRestarts=$(systemctl --user show $svc -p NRestarts --value)"
done'

---

# default service env shows correct HERMES_HOME:
$ tr '\0' '\n' < /proc/$(systemctl --user show hermes-gateway -p MainPID --value)/environ | grep HERMES_HOME
HERMES_HOME=/home/$USER/.hermes

# But its PID ends up in coder's PID file (note the path):
$ cat /home/$USER/.hermes/profiles/coder/gateway.pid
{"pid": 20555, "kind": "hermes-gateway", "argv": [...], "start_time": 350410}

$ systemctl --user show hermes-gateway -p MainPID --value
20555    # ← this is default service's PID, not coder's

$ ls -la /home/$USER/.hermes/gateway.pid
# File does not exist or is empty — default's PID was written elsewhere

---

$ journalctl --user -u hermes-gateway-coder --since "5 minutes ago" --no-pager | grep -E "Gateway|SIGTERM" | head
... [Lark] connected to wss://msg-frontier.feishu.cn/...
... Shutdown context: signal=SIGTERM under_systemd=yes parent_pid=330
...Gateway already running (PID 20555).
...    Use 'hermes gateway restart' to replace it,
...    or 'hermes gateway stop' to kill it first.
...    Or use 'hermes gateway run --replace' to auto-replace.
... hermes-gateway-coder.service: Main process exited, code=exited, status=1/FAILURE
... hermes-gateway-coder.service: Scheduled restart job, restart counter is at 12.

---

ExecStart={python_path} -m hermes_cli.main{f" {profile_arg}" if profile_arg else ""} gateway run --replace

---

# When both a legacy unit and the current ``hermes-gateway.service`` are
# active, they fight over the same bot token — the PR #5646 signal-recovery
# change turns this into a 30-second SIGTERM flap loop.

---

def _get_pid_path() -> Path:
    home = get_hermes_home()
    return home / "gateway.pid"

---

def get_hermes_home() -> Path:
    val = os.environ.get("HERMES_HOME", "").strip()
    if val:
        return Path(val)
    # ... fallback to ~/.hermes if unset

---

mkdir -p ~/.config/systemd/user/hermes-gateway.service.d/
mkdir -p ~/.config/systemd/user/hermes-gateway-coder.service.d/  # adjust for your profile names

---

# default service:
cat > ~/.config/systemd/user/hermes-gateway.service.d/no-replace.conf << 'EOF'
[Service]
ExecStart=
ExecStart=/home/USER/.hermes/hermes-agent/venv/bin/python -m hermes_cli.main --profile default gateway run
EOF

# coder service:
cat > ~/.config/systemd/user/hermes-gateway-coder.service.d/no-replace.conf << 'EOF'
[Service]
ExecStart=
ExecStart=/home/USER/.hermes/hermes-agent/venv/bin/python -m hermes_cli.main --profile coder gateway run
EOF

---

systemctl --user stop hermes-gateway hermes-gateway-coder
sleep 3
# Clean stale PID/lock files from the flap loop period
rm -f ~/.hermes/gateway.lock ~/.hermes/gateway.pid
rm -f ~/.hermes/profiles/*/gateway.lock ~/.hermes/profiles/*/gateway.pid
systemctl --user reset-failed hermes-gateway hermes-gateway-coder
systemctl --user daemon-reload
systemctl --user start hermes-gateway
sleep 8        # stagger start to avoid PID file race
systemctl --user start hermes-gateway-coder

---

$ for svc in hermes-gateway hermes-gateway-coder; do
    echo "$svc: NRestarts=$(systemctl --user show $svc -p NRestarts --value)"
  done
hermes-gateway: NRestarts=0           # ← stable
hermes-gateway-coder: NRestarts=0     # ← stable

$ cat ~/.hermes/gateway.pid
{"pid": 21702, ...}                   # ← matches `systemctl show hermes-gateway -p MainPID`

$ cat ~/.hermes/profiles/coder/gateway.pid
{"pid": 21741, ...}                   # ← matches `systemctl show hermes-gateway-coder -p MainPID`

---

# hermes_cli/gateway.py around line 2184
ExecStart={python_path} -m hermes_cli.main{f" {profile_arg}" if profile_arg else ""} gateway run
#                                                                                          ^^^ no --replace

---

# hermes_cli/gateway.py around line 2162 / 2184
profile_arg = _profile_arg(hermes_home) or "--profile default"
ExecStart={python_path} -m hermes_cli.main {profile_arg} gateway run

---

⚠️ Multiple gateway services detected:
   - hermes-gateway.service (profile: default, enabled)
   - hermes-gateway-coder.service (profile: coder, enabled)

   Until Hermes fully supports parallel multi-profile gateways, this
   configuration may cause a SIGTERM flap loop. See:
   https://github.com/NousResearch/hermes-agent/issues/<this-issue-number>

---

### Customizing the gateway service

Direct edits to `~/.config/systemd/user/hermes-gateway.service` will be
**automatically reverted** by Hermes on the next gateway startup. To make
persistent customizations (proxy variables, custom timeouts, etc.), use a
systemd drop-in override:

`~/.config/systemd/user/hermes-gateway.service.d/custom.conf`:

\`\`\`ini
[Service]
Environment="HTTPS_PROXY=http://proxy:8080"
Environment="HTTP_PROXY=http://proxy:8080"
\`\`\`

After editing, run:

\`\`\`bash
systemctl --user daemon-reload
systemctl --user restart hermes-gateway
\`\`\`

Drop-in overrides are not touched by Hermes's auto-refresh logic.
RAW_BUFFERClick to expand / collapse

Summary

When a user installs Hermes with multiple profiles and enables a gateway systemd service for more than one of them (e.g. the default hermes-gateway.service for the deepseek profile, plus hermes-gateway-coder.service for a custom coder profile), the two services enter an unrecoverable SIGTERM restart loop within seconds of both being active.

Symptoms observed:

  • Both NRestarts counters climb continuously (97+ restarts in 15 minutes during diagnosis).
  • Gateway processes connect to their respective messaging platforms (Lark / Feishu websocket established), then are SIGTERM'd within 1–5 seconds.
  • One service's PID file ends up containing the other service's PID, causing the second service to log ❌ Gateway already running (PID xxxxx) and exit 1.
  • Messages sent to either bot get no reply — the gateway is killed before it can dispatch them through to the agent loop.
  • hermes profile list reports Gateway: stopped for a profile whose service is actually active in systemd (because the wrong PID file is being checked).

Root cause appears to be a combination of two design issues:

  1. The systemd unit template hardcodes --replace in ExecStart (hermes_cli/gateway.py:2184), which is intended for same-service takeover during gateway restart, but in practice also kills sibling profile gateways via a shared termination pathway.
  2. Somewhere in the gateway startup path, the default profile's gateway writes its PID file into a non-default profile's directory (specifically the directory of whatever profile is set in ~/.hermes/active_profile), violating the HERMES_HOME-based isolation promised by get_hermes_home() in hermes_constants.py:14-68.

The combination means both services, once enabled, kill each other on every restart cycle, and the recovery logic in --replace cannot escape the loop because the PID file mis-routing makes each side incorrectly believe the other is the same service it's supposed to take over.

A working workaround using systemd drop-in overrides is provided at the end of this issue. The fix on the Hermes side is straightforward (described in "Suggested Fixes").

Reproduction Steps

Tested on Ubuntu 22.04 in WSL2, Hermes 0.14.0 (commit f36c89cd5798da0f313192555739975e57ffdef5). Should reproduce on any Linux + systemd setup.

Prerequisites

hermes profile create coder           # or any non-default name
# Configure both profiles with different LLM providers and DIFFERENT messaging
# platform credentials (different Feishu app, different Telegram bot, etc.)
hermes config set model.provider deepseek         # for default profile
hermes --profile coder config set model.provider openrouter  # for coder profile

Reproduce

# 1. Install gateway service for default profile
hermes gateway install
# Creates: ~/.config/systemd/user/hermes-gateway.service
#   ExecStart=... -m hermes_cli.main gateway run --replace

# 2. Install gateway service for coder profile (assuming user has done this
#    via some custom mechanism — Hermes does not currently expose a
#    `hermes --profile X gateway install` command, but two services CAN
#    end up coexisting on a system through manual install or upgrades)
# Creates: ~/.config/systemd/user/hermes-gateway-coder.service
#   ExecStart=... --profile coder gateway run --replace

# 3. Set sticky default profile (this is the trigger condition)
hermes profile use coder                # writes ~/.hermes/active_profile

# 4. Enable + start both services
systemctl --user enable --now hermes-gateway hermes-gateway-coder

# 5. Watch the flap loop
watch -n 2 'for svc in hermes-gateway hermes-gateway-coder; do
  echo "$svc: $(systemctl --user is-active $svc), NRestarts=$(systemctl --user show $svc -p NRestarts --value)"
done'

Within ~60 seconds, NRestarts on both should be >= 5 and climbing.

Evidence of the PID file misrouting

# default service env shows correct HERMES_HOME:
$ tr '\0' '\n' < /proc/$(systemctl --user show hermes-gateway -p MainPID --value)/environ | grep HERMES_HOME
HERMES_HOME=/home/$USER/.hermes

# But its PID ends up in coder's PID file (note the path):
$ cat /home/$USER/.hermes/profiles/coder/gateway.pid
{"pid": 20555, "kind": "hermes-gateway", "argv": [...], "start_time": 350410}

$ systemctl --user show hermes-gateway -p MainPID --value
20555    # ← this is default service's PID, not coder's

$ ls -la /home/$USER/.hermes/gateway.pid
# File does not exist or is empty — default's PID was written elsewhere

Evidence from journalctl

$ journalctl --user -u hermes-gateway-coder --since "5 minutes ago" --no-pager | grep -E "Gateway|SIGTERM" | head
... [Lark] connected to wss://msg-frontier.feishu.cn/...
... Shutdown context: signal=SIGTERM under_systemd=yes parent_pid=330
... ❌ Gateway already running (PID 20555).
...    Use 'hermes gateway restart' to replace it,
...    or 'hermes gateway stop' to kill it first.
...    Or use 'hermes gateway run --replace' to auto-replace.
... hermes-gateway-coder.service: Main process exited, code=exited, status=1/FAILURE
... hermes-gateway-coder.service: Scheduled restart job, restart counter is at 12.

Root Cause Analysis

Issue 1: --replace is hardcoded in the systemd unit template

hermes_cli/gateway.py:2184:

ExecStart={python_path} -m hermes_cli.main{f" {profile_arg}" if profile_arg else ""} gateway run --replace

--replace flag exists for a legitimate use case: when a user runs hermes gateway restart, the new process should kill the old one cleanly. But in a systemd-managed setup, systemd already handles restart lifecycle (Type=simple, Restart=always). The --replace flag becomes redundant and harmful: it makes the gateway proactively call terminate_pid(existing_pid, force=False) (gateway/run.py:16741) on whatever PID it finds in the PID file.

When the PID file points to a sibling profile's gateway (see Issue 2), --replace becomes a cross-profile SIGTERM emitter.

The codebase already partly acknowledges this. hermes_cli/gateway.py:1612-1614:

# When both a legacy unit and the current ``hermes-gateway.service`` are
# active, they fight over the same bot token — the PR #5646 signal-recovery
# change turns this into a 30-second SIGTERM flap loop.

That comment covers only the legacy unit vs new unit scenario. The same root cause applies to two profile units (hermes-gateway.service + hermes-gateway-coder.service) coexisting, which is not currently detected.

Issue 2: PID file ends up in wrong profile directory

gateway/status.py:44-47:

def _get_pid_path() -> Path:
    home = get_hermes_home()
    return home / "gateway.pid"

And hermes_constants.py:14-68::get_hermes_home() correctly honors the HERMES_HOME env var first:

def get_hermes_home() -> Path:
    val = os.environ.get("HERMES_HOME", "").strip()
    if val:
        return Path(val)
    # ... fallback to ~/.hermes if unset

The systemd unit sets Environment="HERMES_HOME=/home/USER/.hermes" explicitly (hermes_cli/gateway.py:2191), so by inspection this should work correctly.

However, empirical evidence shows the default profile's PID is being written into ~/.hermes/profiles/<active_profile>/gateway.pid when ~/.hermes/active_profile contains a non-default profile name. The actual file write site must be reading active_profile directly somewhere instead of resolving the path through get_hermes_home(), OR a code path is using a stale/cached HERMES_HOME, OR something in --replace startup runs before HERMES_HOME is fully respected.

I was unable to fully isolate the offending code path before applying the workaround, but the empirical evidence is reproducible. Candidate places to investigate:

  • gateway/run.py startup sequence between get_running_pid() call (line 16722) and the PID file write
  • Any import-time side effect that reads active_profile before HERMES_HOME is propagated
  • The PID file write inside gateway/status.py::acquire_gateway_runtime_lock (referenced by line 16715)
  • The _get_takeover_marker_path() and related "marker" pathways introduced by PR #5646

The function get_active_profile_name() in hermes_cli/profiles.py:947-971 is correctly HERMES_HOME-derived, so that's not the culprit. But get_active_profile() at line 907 reads ~/.hermes/active_profile directly — if any startup path goes through this function before HERMES_HOME is propagated, that would explain the symptom.

Issue 3: refresh_systemd_unit_if_needed() rewrites user edits

hermes_cli/gateway.py:2278-2311::refresh_systemd_unit_if_needed() is called on every gateway run startup (also from gateway/run.py:3182 and elsewhere). It compares the current unit file content to generate_systemd_unit() output and rewrites the file if they differ.

This means a user who patches their .service file to drop --replace will have their edit silently reverted on the next gateway startup. The only stable way for a user to override the ExecStart is via a systemd drop-in override (<service>.service.d/*.conf), which refresh_systemd_unit_if_needed() does not touch.

This is fine as a design choice for keeping the unit file up to date, but it's a footgun: there's no mechanism for legitimate user overrides short of drop-ins, and the auto-revert behavior is not documented at the unit-file level.

Workaround (Confirmed working)

For users who hit this in production and need a fix before the upstream patch lands:

Step 1: Create drop-in override directories

mkdir -p ~/.config/systemd/user/hermes-gateway.service.d/
mkdir -p ~/.config/systemd/user/hermes-gateway-coder.service.d/  # adjust for your profile names

Step 2: Override ExecStart via drop-in for each service

# default service:
cat > ~/.config/systemd/user/hermes-gateway.service.d/no-replace.conf << 'EOF'
[Service]
ExecStart=
ExecStart=/home/USER/.hermes/hermes-agent/venv/bin/python -m hermes_cli.main --profile default gateway run
EOF

# coder service:
cat > ~/.config/systemd/user/hermes-gateway-coder.service.d/no-replace.conf << 'EOF'
[Service]
ExecStart=
ExecStart=/home/USER/.hermes/hermes-agent/venv/bin/python -m hermes_cli.main --profile coder gateway run
EOF

The drop-in does two things:

  1. Removes --replace (no more cross-service SIGTERM).
  2. Adds explicit --profile <name> even for the default profile (defensive — avoids any code path that might fall back to active_profile).

The leading empty ExecStart= is required because systemd treats ExecStart as a list-type directive that appends by default; the empty line clears it before adding the new one.

Step 3: Reset and restart

systemctl --user stop hermes-gateway hermes-gateway-coder
sleep 3
# Clean stale PID/lock files from the flap loop period
rm -f ~/.hermes/gateway.lock ~/.hermes/gateway.pid
rm -f ~/.hermes/profiles/*/gateway.lock ~/.hermes/profiles/*/gateway.pid
systemctl --user reset-failed hermes-gateway hermes-gateway-coder
systemctl --user daemon-reload
systemctl --user start hermes-gateway
sleep 8        # stagger start to avoid PID file race
systemctl --user start hermes-gateway-coder

Verification

After ~30 seconds:

$ for svc in hermes-gateway hermes-gateway-coder; do
    echo "$svc: NRestarts=$(systemctl --user show $svc -p NRestarts --value)"
  done
hermes-gateway: NRestarts=0           # ← stable
hermes-gateway-coder: NRestarts=0     # ← stable

$ cat ~/.hermes/gateway.pid
{"pid": 21702, ...}                   # ← matches `systemctl show hermes-gateway -p MainPID`

$ cat ~/.hermes/profiles/coder/gateway.pid
{"pid": 21741, ...}                   # ← matches `systemctl show hermes-gateway-coder -p MainPID`

Both PID files now point to their correct service's main PID. The flap loop is gone.

Suggested Fixes (in suggested priority order)

Fix 1: Remove --replace from the systemd unit template (HIGH PRIORITY)

hermes_cli/gateway.py:2184 and :2237 (the equivalent in the system-service branch).

Rationale: when systemd is managing lifecycle, --replace is at best redundant (systemd already kills the old process via KillMode=mixed before restart). At worst, it's a cross-process SIGTERM emitter that breaks multi-profile setups. The --replace flag should be retained on the CLI for users running gateway via hermes gateway restart manually, but not baked into systemd units.

Suggested patch sketch:

# hermes_cli/gateway.py around line 2184
ExecStart={python_path} -m hermes_cli.main{f" {profile_arg}" if profile_arg else ""} gateway run
#                                                                                          ^^^ no --replace

Existing systems will pick up the new template via refresh_systemd_unit_if_needed() on the next gateway startup, completing the migration automatically.

Fix 2: Make the template add explicit --profile <name> even for the default profile

Currently profile_arg is empty for default. This means the default profile's gateway invocation has no explicit profile signal, leaving it to inherit whatever's in ~/.hermes/active_profile. Adding --profile default defensively eliminates any path that could mis-identify the profile.

# hermes_cli/gateway.py around line 2162 / 2184
profile_arg = _profile_arg(hermes_home) or "--profile default"
ExecStart={python_path} -m hermes_cli.main {profile_arg} gateway run

Fix 3: Investigate and fix the PID file misrouting (CRITICAL but harder)

The empirical observation is that the default-profile gateway, started with HERMES_HOME=~/.hermes in its env, writes its PID into ~/.hermes/profiles/coder/gateway.pid instead of ~/.hermes/gateway.pid.

Steps to investigate:

  1. Add a logger.debug at the start of gateway/status.py::_get_pid_path() printing os.environ.get('HERMES_HOME') and the resolved get_hermes_home() return value, then reproduce.
  2. Audit every callsite in the gateway startup path that calls get_active_profile() (the raw file-reading variant in hermes_cli/profiles.py:907) and check whether it could affect PID path resolution.
  3. Check whether _get_takeover_marker_path() (introduced in PR #5646) writes/reads from a HERMES_HOME-relative path, and whether the marker write itself could trigger a stale path elsewhere.

Without root-cause access, my hypothesis is that there's a function on the startup path that reads ~/.hermes/active_profile and uses it to compute a path, bypassing the per-service HERMES_HOME setting. The hardcoded fallback to Path.home() / ".hermes" in some helper (perhaps in profiles.py) may be combining with active_profile to override the explicit setting.

Fix 4: Detect multi-profile service coexistence and warn

Extend has_conflicting_systemd_units() in hermes_cli/gateway.py:1573 (or add a new function) to detect when multiple hermes-gateway-*.service units exist and are both enabled. When hermes gateway install, hermes setup, or even hermes gateway run detects this, print a warning:

⚠️ Multiple gateway services detected:
   - hermes-gateway.service (profile: default, enabled)
   - hermes-gateway-coder.service (profile: coder, enabled)

   Until Hermes fully supports parallel multi-profile gateways, this
   configuration may cause a SIGTERM flap loop. See:
   https://github.com/NousResearch/hermes-agent/issues/<this-issue-number>

Fix 5: Document drop-in overrides as the supported customization path

Since refresh_systemd_unit_if_needed() will overwrite any direct edit to the .service file, the documentation should explicitly tell users to use drop-in overrides for customization (proxy variables, custom user, etc.). Currently users discover this the hard way.

Suggested addition to the user-guide docs (e.g., website/docs/user-guide/messaging/gateway-service.md):

### Customizing the gateway service

Direct edits to `~/.config/systemd/user/hermes-gateway.service` will be
**automatically reverted** by Hermes on the next gateway startup. To make
persistent customizations (proxy variables, custom timeouts, etc.), use a
systemd drop-in override:

`~/.config/systemd/user/hermes-gateway.service.d/custom.conf`:

\`\`\`ini
[Service]
Environment="HTTPS_PROXY=http://proxy:8080"
Environment="HTTP_PROXY=http://proxy:8080"
\`\`\`

After editing, run:

\`\`\`bash
systemctl --user daemon-reload
systemctl --user restart hermes-gateway
\`\`\`

Drop-in overrides are not touched by Hermes's auto-refresh logic.

Related Issues / Code References

Existing related issues (this issue is a new symptom of the same underlying class of bug):

  • #23457 (open) — Dashboard reports gateway STOPPED when active profile differs from container-boot HERMES_HOME. Same root cause family — a process resolves HERMES_HOME from one source (env var or active_profile) while another process resolves it from the other source, and they end up reading/writing to different directories. #23457 is the read-side symptom; this issue is the write-side symptom on the gateway itself.
  • #18594 (closed) — get_hermes_home() silently falls back to ~/.hermes in profile mode and causes cross-profile data corruption. The fix added a warning log but did not eliminate all code paths that bypass get_hermes_home().
  • #22502 (closed) — Profile switching fully broken in Gateway/WebUI/Telegram - HERMES_HOME guard blocks active_profile. Earlier round of the same theme.
  • #4707 (open) — cron under profile-scoped launchd gateway falls back to default ~/.hermes instead of profile HERMES_HOME. macOS-flavored sibling of this bug.
  • #14203 (closed) — Gateway fails to start with --replace when previous instance PID is already dead (stale gateway.pid). Same --replace + PID file class of issues; the fix there did not consider cross-profile PID files.
  • #22035 (closed) — gateway restart --system always reports failure (60s timeout × 2) — wrapper reads runtime status from root's HERMES_HOME. Another HERMES_HOME-mismatch symptom.

Code references:

  • hermes_constants.py:14-68get_hermes_home() definition
  • hermes_cli/profiles.py:907-971get_active_profile() (file-reading) and get_active_profile_name() (HERMES_HOME-derived)
  • hermes_cli/gateway.py:1573has_conflicting_systemd_units() (currently only detects user/system scope collision)
  • hermes_cli/gateway.py:1612-1614 — Code comment acknowledging "legacy unit vs new unit" flap loop scenario; same root cause as this issue
  • hermes_cli/gateway.py:2184, 2237 — Hardcoded --replace in unit template
  • hermes_cli/gateway.py:2278-2311refresh_systemd_unit_if_needed() auto-overwrite logic
  • gateway/run.py:16695-16800start_gateway() and --replace takeover logic
  • gateway/status.py:44-47_get_pid_path() definition
  • PR #5646 — signal-recovery change that intensified the SIGTERM flap loop (per the source comment)

Environment

  • OS: Ubuntu 22.04 LTS on WSL2 (Windows 11)
  • systemd: user-mode service (no system-wide install)
  • Hermes version: 0.14.0
  • Hermes commit: f36c89cd5798da0f313192555739975e57ffdef5
  • Python: 3.11 (bundled venv)
  • Profiles in use: default (deepseek-v4-flash via deepseek provider) and coder (claude-opus-4.7 via openrouter)
  • Messaging platforms: Feishu (Lark) — two different Feishu apps, one bound to each profile
  • Sticky active profile: coder (set via hermes profile use coder)

Reproducibility

100% reproducible by following the steps above. The flap loop typically starts within 30 seconds of both services being enabled, and NRestarts reaches ~10 within 2 minutes.

The workaround has been confirmed stable for 15+ minutes of continuous operation across both bots, with both NRestarts=0 and bidirectional messaging working as expected.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING