hermes - 💡(How to fix) Fix Generated Linux gateway service can flap forever because ExecStart uses gateway run --replace under Type=simple

hermes2026-05-17 05:16:33

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

On WSL Ubuntu, the generated user unit for the Friday profile repeatedly restarted roughly every five minutes even though the gateway itself was healthy and connected to Slack. Each restart was treated as a planned --replace takeover of the already-running gateway, so the gateway kept disconnecting and reconnecting cleanly instead of remaining as one stable systemd-managed process.

This looks like a service contract bug rather than a Slack, token, or profile-config problem.

Root Cause

Generated Linux systemd gateway unit for profile-specific Hermes services can flap forever because steady-state ExecStart uses gateway run --replace under Type=simple

Fix Action

Fix / Workaround

Local Workaround That Stabilized Friday

This workaround is useful operationally, but it should not be the final upstream answer because the standalone script currently appears not to propagate start_gateway() failure as a nonzero exit code.

Code Example

ExecStart=/home/brandon/.hermes/hermes-agent/venv/bin/python -m hermes_cli.main --profile friday gateway run --replace
Type=simple
Restart=always
RestartSec=60
RestartMaxDelaySec=300
RestartSteps=5

---

Received SIGTERM as a planned --replace takeover - exiting cleanly

---

[Service]
ExecStart=
ExecStart=/home/brandon/.hermes/hermes-agent/venv/bin/python /home/brandon/.hermes/hermes-agent/scripts/hermes-gateway run
Restart=on-failure
RestartSec=30

RAW_BUFFERClick to expand / collapse

Hermes Gateway Service Loop Bug Report Draft

Title

Generated Linux systemd gateway unit for profile-specific Hermes services can flap forever because steady-state ExecStart uses gateway run --replace under Type=simple

Summary

This looks like a service contract bug rather than a Slack, token, or profile-config problem.

Environment

Hermes profile: friday
Linux host: WSL Ubuntu
Service installed via: friday gateway install --force then friday gateway start
Generated unit path: ~/.config/systemd/user/hermes-gateway-friday.service
Generated steady-state ExecStart:

ExecStart=/home/brandon/.hermes/hermes-agent/venv/bin/python -m hermes_cli.main --profile friday gateway run --replace
Type=simple
Restart=always
RestartSec=60
RestartMaxDelaySec=300
RestartSteps=5

Observed Behavior

systemctl --user status hermes-gateway-friday.service showed the service in activating (auto-restart) rather than active (running).
The systemd Main PID exited with status=0/SUCCESS.
friday gateway status reported that the user gateway service was stopped or restart-pending while also showing a live gateway PID for the profile.
gateway.log showed the gateway connecting to Slack successfully, running normally, then later receiving:

Received SIGTERM as a planned --replace takeover - exiting cleanly

Immediately after that clean exit, a new PID started, connected to Slack again, and the same cycle repeated.

Expected Behavior

The generated systemd unit should produce one stable, foreground, service-managed main process.

systemctl --user status should remain active (running).
The gateway should stay connected until a real stop, restart, or failure occurs.
A healthy service should not repeatedly perform planned takeovers of itself.

Actual Behavior

The generated systemd unit repeatedly launched a fresh gateway run --replace process.

Each new process then treated the existing live gateway as something to replace, sent a planned takeover signal, and restarted the session even though the prior gateway was healthy.

Likely Root Cause

The generated Linux service definition appears to be using the wrong runtime mode for steady-state service ownership.

Facts from the incident:

the generated unit used gateway run --replace
the generated unit used Type=simple
the systemd-tracked main process did not remain the long-lived gateway process
the long-lived gateway process continued running long enough to be seen by the next launch and then got cleanly replaced

That means one of two closely related things is happening:

gateway run --replace is not safe as the normal steady-state ExecStart for a service because it is takeover-oriented by design, or
the current command path is spawning or handing off to a child process in a way that violates Type=simple expectations for the service manager.

Either way, the steady-state Linux service command should not be the replace-oriented startup path.

Local Workaround That Stabilized Friday

The following user-unit override stopped the restart loop locally:

[Service]
ExecStart=
ExecStart=/home/brandon/.hermes/hermes-agent/venv/bin/python /home/brandon/.hermes/hermes-agent/scripts/hermes-gateway run
Restart=on-failure
RestartSec=30

After reloading systemd and restarting the service:

hermes-gateway-friday.service became active (running)
systemd tracked one stable main PID
the gateway log showed normal Slack connection and steady runtime

Proposed Upstream Direction

Add a dedicated service-safe gateway run mode for Linux systemd use.
Generate Linux units against that service-safe mode, not gateway run --replace.
Reserve replace/takeover behavior for explicit restart operations and manual-process replacement flows.

Notes

This bug is easy to misdiagnose because the gateway looks healthy from the platform side. Slack auth, Socket Mode, and channel directory setup all succeeded repeatedly. The failure was in service ownership and lifecycle semantics, not platform connectivity.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#mixed precision #training loop #device allocation #model download #tokenizer error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Generated Linux gateway service can flap forever because ExecStart uses gateway run --replace under Type=simple

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Local Workaround That Stabilized Friday

Code Example

Hermes Gateway Service Loop Bug Report Draft

Title

Summary

Environment

Observed Behavior

Expected Behavior

Actual Behavior

Likely Root Cause

Local Workaround That Stabilized Friday

Proposed Upstream Direction

Notes

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Generated Linux gateway service can flap forever because ExecStart uses gateway run --replace under Type=simple

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Local Workaround That Stabilized Friday

Code Example

Hermes Gateway Service Loop Bug Report Draft

Title

Summary

Environment

Observed Behavior

Expected Behavior

Actual Behavior

Likely Root Cause

Local Workaround That Stabilized Friday

Proposed Upstream Direction

Notes

Still need to ship something?

RELATED_DISCOVERY

TRENDING