hermes - ✅(Solved) Fix Exec-type quick commands are blocked when gateway is draining [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#28663Fetched 2026-05-20 04:02:41
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×3labeled ×3commented ×1referenced ×1

Root Cause

In gateway/run.py, the _draining check runs before quick command parsing:

if self._draining:
    return "⏳ Gateway is draining..."

# User-defined quick commands
if command:
    if qcmd.get("type") == "exec":
        # execute shell command

Because of this order, when _draining is true, the message falls through to the agent loop as a regular conversation turn — which then fails because the LLM is unreachable.

Fix Action

Fix

Move quick command parsing before the _draining check in gateway/run.py:

-        if self._draining:
-            return f"⏳ Gateway is {self._status_action_gerund()}..."
-
         # User-defined quick commands (bypass agent loop, no LLM call)
+        # MUST be checked BEFORE _draining so ops commands (cleanup, restart)
+        # still execute even when gateway is shutting down or draining.
         if command:
             ...
+
+        if self._draining:
+            return f"⏳ Gateway is {self._status_action_gerund()}..."

PR fix notes

PR #28734: fix(gateway): allow exec-type quick commands to bypass draining guard

Description (problem / solution / changelog)

Summary

Exec-type quick commands (quick_commands.<name>.type: exec) now bypass the _draining guard in GatewayRunner._handle_message(), ensuring they remain available when the gateway is shutting down or restarting.

These commands run pure shell subprocesses via asyncio.create_subprocess_shell and do not depend on the agent loop or LLM backend — there is no reason to block them during draining. In practice, this means ops commands (health checks, service status, etc.) are accessible when the system is most unhealthy and you need them the most.

Changes

  • gateway/run.py: Added exec-type quick command dispatch before the _draining guard. The original dispatch block after the guard is kept as a defensive fallback with a comment. Alias-type quick commands and regular slash commands remain blocked during draining as they may require the agent loop.
  • tests/gateway/test_exec_quick_command_draining.py: 6 new tests covering exec commands while draining, timeout handling, error reporting, empty command, non-draining regression, and unknown command blocking.

Testing

  • 6/6 new tests pass
  • 16/16 existing test_restart_drain.py tests pass (no regression)
  • 18/18 existing test_quick_commands.py tests pass (no regression)

Complementary with PR #25804

PR #25804 fixes exec quick commands being blocked when the agent is mid-turn (_handle_active_session_busy_message path). This PR fixes the same class of issue but for the draining trigger condition in _handle_message. Both fixes are complementary — one handles mid-turn, the other handles draining.

Closes #28663

Changed files

  • gateway/run.py (modified, +38/-5)
  • tests/gateway/test_exec_quick_command_draining.py (added, +139/-0)

PR #17: fix(gateway): allow exec-type quick commands during drain state (#28663)

Description (problem / solution / changelog)

🟡 Merge order: 9 / 12 — gateway flow reorder, moderate risk

Closes #28663 (P2)

Problem

Exec-type quick commands were blocked by the _draining guard before reaching the exec handler. Ops commands like /restart-sglang were unusable during drain state.

Fix

Move quick command parsing before the _draining check. Exec-type bypasses drain; alias-type still falls through.

Risk assessment

FactorRating
Lines changed6/-3
New code0 (code moved, not added)
Side effects⚠️ Changes dispatch order in gateway — alias-type quick commands still blocked by drain (intended)
Revert complexityEasy — move the _draining check back

Testing notes

  • Set gateway to draining state, send exec-type quick command → should execute
  • Set gateway to draining state, send regular message → should still be blocked

Files changed

  • gateway/run.py (+6/-3)

Changed files

  • gateway/run.py (modified, +6/-3)

Code Example

if self._draining:
    return "⏳ Gateway is draining..."

# User-defined quick commands
if command:
    if qcmd.get("type") == "exec":
        # execute shell command

---

-        if self._draining:
-            return f"⏳ Gateway is {self._status_action_gerund()}..."
-
         # User-defined quick commands (bypass agent loop, no LLM call)
+        # MUST be checked BEFORE _draining so ops commands (cleanup, restart)
+        # still execute even when gateway is shutting down or draining.
         if command:
             ...
+
+        if self._draining:
+            return f"⏳ Gateway is {self._status_action_gerund()}..."
RAW_BUFFERClick to expand / collapse

Bug: Exec-type quick commands are blocked when gateway is draining

Problem

When the gateway enters a draining state (e.g., after SIGTERM, or after repeated LLM backend failures), user-defined quick commands with type: exec are blocked by the _draining guard before reaching the exec handler.

These commands run independent shell scripts via asyncio.create_subprocess_shell and do not depend on the agent loop or LLM backend — there's no reason to block them during draining. In practice, this means that when the system is most unhealthy and you need ops commands the most, those are the ones that get blocked.

(Example: custom quick commands like /cleanup and /restart-sglang defined in config.yaml with type: exec — intended for infrastructure recovery — were unusable during a real sglang crash incident.)

Root Cause

In gateway/run.py, the _draining check runs before quick command parsing:

if self._draining:
    return "⏳ Gateway is draining..."

# User-defined quick commands
if command:
    if qcmd.get("type") == "exec":
        # execute shell command

Because of this order, when _draining is true, the message falls through to the agent loop as a regular conversation turn — which then fails because the LLM is unreachable.

Reproduction

  1. Let sglang crash or become unreachable (e.g., GPU OOM)
  2. Gateway enters draining/degraded state
  3. Send any /your-exec-command defined in config.yaml with type: exec
  4. Message is treated as a conversation turn instead of executing the shell command

Expected Behavior

Exec-type quick commands should be parsed and executed regardless of gateway draining state, since they operate on external processes and don't depend on any gateway internals.

Fix

Move quick command parsing before the _draining check in gateway/run.py:

-        if self._draining:
-            return f"⏳ Gateway is {self._status_action_gerund()}..."
-
         # User-defined quick commands (bypass agent loop, no LLM call)
+        # MUST be checked BEFORE _draining so ops commands (cleanup, restart)
+        # still execute even when gateway is shutting down or draining.
         if command:
             ...
+
+        if self._draining:
+            return f"⏳ Gateway is {self._status_action_gerund()}..."

Impact Assessment

  • exec type: Gains privileged channel — executes regardless of draining state. Safe because exec commands run independent shell scripts that don't depend on gateway state.
  • alias type: Unchanged — still falls through to _draining check since alias rewrites event.text and continues normal dispatch.
  • Plugin commands / skill commands / regular messages: Unchanged — still blocked by _draining.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Exec-type quick commands are blocked when gateway is draining [2 pull requests, 1 comments, 2 participants]