hermes - 💡(How to fix) Fix approval.py: gateway-mode auto-deny leaks DANGEROUS COMMAND warning text into agent's visible output, contaminates downstream consumers

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When Hermes runs in gateway mode (no interactive approver registered) and the agent issues a command that matches tools/approval.py's DANGEROUS_PATTERNS (e.g. python -c "...", curl ... | bash), the approval check auto-denies after timeout — but the DANGEROUS COMMAND warning text becomes the agent's visible response as if it were the model's answer. Downstream consumers (Paperclip, other gateway clients) cannot distinguish a real agent response from a leaked security warning.

Error Message

  • (a) Auto-deny silently and return a structured error code (e.g. approval.denied.dangerous_pattern) on the tool call. The agent sees the tool failed, can react. The gateway client sees a typed failure, not a stdin-style text leak. Affects every Hermes gateway-mode consumer. Any Paperclip-driven agent run that triggers approval.py corrupts the issue thread with what looks like a model response. Operators routinely see issues marked done with "DANGEROUS COMMAND..." as the closure comment — silently, no warning, no error code surfaced to the client.

Root Cause

When Hermes runs in gateway mode (no interactive approver registered) and the agent issues a command that matches tools/approval.py's DANGEROUS_PATTERNS (e.g. python -c "...", curl ... | bash), the approval check auto-denies after timeout — but the DANGEROUS COMMAND warning text becomes the agent's visible response as if it were the model's answer. Downstream consumers (Paperclip, other gateway clients) cannot distinguish a real agent response from a leaked security warning.

Fix Action

Fix / Workaround

Workaround currently in production for SparkEros: HERMES_YOLO_MODE=1 in ~/.hermes/.env, which bypasses all approvals. Not a real fix — disables the security model entirely.

RAW_BUFFERClick to expand / collapse

Summary

When Hermes runs in gateway mode (no interactive approver registered) and the agent issues a command that matches tools/approval.py's DANGEROUS_PATTERNS (e.g. python -c "...", curl ... | bash), the approval check auto-denies after timeout — but the DANGEROUS COMMAND warning text becomes the agent's visible response as if it were the model's answer. Downstream consumers (Paperclip, other gateway clients) cannot distinguish a real agent response from a leaked security warning.

Reproduction

  1. Run hermes in gateway mode (no interactive responder)
  2. Have an agent issue a command like python -c 'print(1)' via the terminal tool
  3. Observe: tools/approval.py flags it as script execution via -e/-c flag, queues for approval, no responder answers, auto-denies after approvals.timeout (default 60s)
  4. Bug: the agent's run completes with the warning text as if it were the agent's reply — verbatim "⚠️ DANGEROUS COMMAND: script execution via -e/-c flag — Safer: tirith run ..." appears as the agent's stdout / final response.

Observed in a Paperclip integration (paperclipai/paperclip) where Paperclip-driven Hermes agents repeatedly posted the DANGEROUS COMMAND text as comments on issues — looked like the agent answered "DANGEROUS COMMAND..." to "Tell me a joke". The downstream paperclip system had no way to detect this was a security gate, not an actual agent reply.

Expected behavior

Either:

  • (a) Auto-deny silently and return a structured error code (e.g. approval.denied.dangerous_pattern) on the tool call. The agent sees the tool failed, can react. The gateway client sees a typed failure, not a stdin-style text leak.
  • (b) In gateway mode, auto-approve specific scoped patterns (loopback URLs, agent-owned workspace paths) and only deny truly dangerous ones (rm -rf, network exfil to non-loopback) — similar to YOLO mode but more selective.

(a) is the minimum-viable fix. (b) is the better long-term shape.

Impact

Affects every Hermes gateway-mode consumer. Any Paperclip-driven agent run that triggers approval.py corrupts the issue thread with what looks like a model response. Operators routinely see issues marked done with "DANGEROUS COMMAND..." as the closure comment — silently, no warning, no error code surfaced to the client.

Workaround currently in production for SparkEros: HERMES_YOLO_MODE=1 in ~/.hermes/.env, which bypasses all approvals. Not a real fix — disables the security model entirely.

Severity

High. This affects security telemetry (warnings get swallowed into business output) and operator trust (agent appears to be answering with security-warning text). The combination — security control fires AND its alert text becomes the agent's apparent answer — is the worst of both worlds.

Related

  • tools/approval.py _is_gateway_approval_context() correctly detects gateway mode but the path forward when no responder is registered loses the structured failure signal
  • Filed in context of paperclipai/paperclip integration diagnostic 2026-05-20

Submitted with assistance from Claude Opus 4.7 via Claude Code.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Either:

  • (a) Auto-deny silently and return a structured error code (e.g. approval.denied.dangerous_pattern) on the tool call. The agent sees the tool failed, can react. The gateway client sees a typed failure, not a stdin-style text leak.
  • (b) In gateway mode, auto-approve specific scoped patterns (loopback URLs, agent-owned workspace paths) and only deny truly dangerous ones (rm -rf, network exfil to non-loopback) — similar to YOLO mode but more selective.

(a) is the minimum-viable fix. (b) is the better long-term shape.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING