hermes - 💡(How to fix) Fix Feature: User-approval gate for outbound communication tools (send_message, email, etc.)

StepCodex · 2026-05-29T04:09:14Z

[hermes] Feature Description Add a hard enforcement mechanism not prompt-based that requires explicit user approval before any outbound communication tool can… ## Feature Description Add a hard enforcement mechanism (not prompt-based) that requires explicit user approval before any outbound communication tool can send messages to external contacts. Currently, safety rules for outbound communication ("don't send API keys, tokens, emails to third parties without permission") are enforced only via system prompt text. As recent experience shows, this is fragile — text-based rules can be overlooked or forgotten by the LLM during complex tasks. ## Motivation Agents can have powerful outbound tools: `send_message` (Telegram, Discord, Slack, etc.), email sending, HTTP POST to external services. A misstep here is irreversible — once an API key or private message is sent, it's gone. Prompt-level rules are necessary but not sufficient. We need a **code-level gate** that blocks outbound messages unless the user has explicitly approved. ## Proposed Solution Add an `approval_required` flag to outbound communication tools. When set, the tool refuses to execute unless it receives a cryptographically valid approval token that only the user (via the gateway) can produce. ### Possible implementation sketch: 1. **New tool parameter**: `user_approval_token: str | None` on `send_message`, email tools, etc. 2. **Gateway generates approval tokens**: When the user confirms an outbound action (e.g. via a `/approve`-like mechanism or inline button), the gateway issues a short-lived signed token. 3. **Tool enforcement**: The tool handler checks `user_approval_token` before executing. If missing or invalid, it returns a hard error — not a warning, not a prompt — the send simply fails. 4. **Configurable per-instance**: `outbound.require_approval: true` in config.yaml. ### Alternative: simpler stopgap Even a simpler version — the tool returns an error message *by default* saying "This action requires user approval. Ask the user to confirm." — and only proceeds after the LLM calls `clarify()` and gets back a confirmation — would be an improvement over pure prompt text. ## Alternatives Considered - **Just writing stronger prompts**: Already tried. Doesn't hold — prompts can be missed. - **Environment variable gate**: Could work but is too blunt (all-or-nothing, can't be per-message). ## Prior Art - `hermes config set approvals.mode` already exists for shell commands — this is a similar concept applied to outbound messaging. - Claude Code has `permission_mode` for file writes and network calls. ## Impact - **Safety**: Prevents irreversible outbound leaks even when the LLM makes mistakes. - **UX**: Adds one confirmation step for outbound messages. Acceptable trade-off given the risk. - **Implementation**: Touches `tools/send_message.py`, email tools, and the gateway approval flow. ## Related - Security config: `security.redact_secrets` (already exists) - Command approvals: `approvals.mode` (already exists)

hermes2026-05-29 04:09:14

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

Tool enforcement: The tool handler checks user_approval_token before executing. If missing or invalid, it returns a hard error — not a warning, not a prompt — the send simply fails. Even a simpler version — the tool returns an error message by default saying "This action requires user approval. Ask the user to confirm." — and only proceeds after the LLM calls clarify() and gets back a confirmation — would be an improvement over pure prompt text.

RAW_BUFFERClick to expand / collapse

Feature Description

Add a hard enforcement mechanism (not prompt-based) that requires explicit user approval before any outbound communication tool can send messages to external contacts.

Currently, safety rules for outbound communication ("don't send API keys, tokens, emails to third parties without permission") are enforced only via system prompt text. As recent experience shows, this is fragile — text-based rules can be overlooked or forgotten by the LLM during complex tasks.

Motivation

Agents can have powerful outbound tools: send_message (Telegram, Discord, Slack, etc.), email sending, HTTP POST to external services. A misstep here is irreversible — once an API key or private message is sent, it's gone.

Prompt-level rules are necessary but not sufficient. We need a code-level gate that blocks outbound messages unless the user has explicitly approved.

Proposed Solution

Add an approval_required flag to outbound communication tools. When set, the tool refuses to execute unless it receives a cryptographically valid approval token that only the user (via the gateway) can produce.

Possible implementation sketch:

New tool parameter: user_approval_token: str | None on send_message, email tools, etc.
Gateway generates approval tokens: When the user confirms an outbound action (e.g. via a /approve-like mechanism or inline button), the gateway issues a short-lived signed token.
Tool enforcement: The tool handler checks user_approval_token before executing. If missing or invalid, it returns a hard error — not a warning, not a prompt — the send simply fails.
Configurable per-instance: outbound.require_approval: true in config.yaml.

Alternative: simpler stopgap

Even a simpler version — the tool returns an error message by default saying "This action requires user approval. Ask the user to confirm." — and only proceeds after the LLM calls clarify() and gets back a confirmation — would be an improvement over pure prompt text.

Alternatives Considered

Just writing stronger prompts: Already tried. Doesn't hold — prompts can be missed.
Environment variable gate: Could work but is too blunt (all-or-nothing, can't be per-message).

Prior Art

hermes config set approvals.mode already exists for shell commands — this is a similar concept applied to outbound messaging.
Claude Code has permission_mode for file writes and network calls.

Impact

Safety: Prevents irreversible outbound leaks even when the LLM makes mistakes.
UX: Adds one confirmation step for outbound messages. Acceptable trade-off given the risk.
Implementation: Touches tools/send_message.py, email tools, and the gateway approval flow.

Security config: security.redact_secrets (already exists)
Command approvals: approvals.mode (already exists)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering