hermes - 💡(How to fix) Fix [Security] Default secret-pattern redaction on outbound agent messages / dispatch logs

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

In multi-agent / approval-gate flows (see #31392) where one agent's result is shown to the user via an approval card, posted to a chat platform, and persisted in logs / history / DBs, an agent that accidentally embeds a secret in its text — an API key in a code snippet, a token in a debug dump, a password in a config example, a .env line it was asked to summarize — leaks that secret to all the places the message is rendered or stored.

Proposal: a small, low-false-positive regex-redaction pass over outbound agent-to-human / agent-to-storage text by default, with a way to opt out per-call when the content is known to be safe.

Root Cause

In multi-agent / approval-gate flows (see #31392) where one agent's result is shown to the user via an approval card, posted to a chat platform, and persisted in logs / history / DBs, an agent that accidentally embeds a secret in its text — an API key in a code snippet, a token in a debug dump, a password in a config example, a .env line it was asked to summarize — leaks that secret to all the places the message is rendered or stored.

Proposal: a small, low-false-positive regex-redaction pass over outbound agent-to-human / agent-to-storage text by default, with a way to opt out per-call when the content is known to be safe.

Fix Action

Fix / Workaround

  • Agents are getting better at content but not at "know what not to say" — an LLM that's been asked to "summarize this config" will happily include the literal OPENAI_API_KEY=sk-… line

  • Once a secret is in chat history / approval log / dispatch DB, it's effectively permanent and hard to scrub

  • Common secret patterns are extremely cheap to detect (regex per pattern, microseconds per message)

  • False-positive rate on the well-known prefixes is essentially zero

  • Outbound platform messages (Feishu / Telegram / Discord / Slack / WhatsApp / CLI / web UI rendering)

  • Dispatch / approval messages stored in the local task DB (separate proposal: #31392)

  • Any logs written under .hermes/logs/

Workaround currently used downstream

RAW_BUFFERClick to expand / collapse

Summary

In multi-agent / approval-gate flows (see #31392) where one agent's result is shown to the user via an approval card, posted to a chat platform, and persisted in logs / history / DBs, an agent that accidentally embeds a secret in its text — an API key in a code snippet, a token in a debug dump, a password in a config example, a .env line it was asked to summarize — leaks that secret to all the places the message is rendered or stored.

Proposal: a small, low-false-positive regex-redaction pass over outbound agent-to-human / agent-to-storage text by default, with a way to opt out per-call when the content is known to be safe.

Why this is worth doing as a default

  • Agents are getting better at content but not at "know what not to say" — an LLM that's been asked to "summarize this config" will happily include the literal OPENAI_API_KEY=sk-… line
  • Once a secret is in chat history / approval log / dispatch DB, it's effectively permanent and hard to scrub
  • Common secret patterns are extremely cheap to detect (regex per pattern, microseconds per message)
  • False-positive rate on the well-known prefixes is essentially zero

Patterns worth covering (initial set)

PatternExample prefixSource
Anthropic API keysk-ant-…provider docs
OpenAI API keysk-… (40+ chars, base58-ish)provider docs
GitHub PATghp_…, gho_…, ghu_…, ghs_…, ghr_…GitHub docs
Slack bot tokenxoxb-…, xoxa-…, xoxp-…Slack docs
AWS access keyAKIA[0-9A-Z]{16}AWS docs
Google API keyAIza[0-9A-Za-z\\-_]{35}Google docs
Stripe keysk_live_…, pk_live_…Stripe docs
Generic <NAME>_API_KEY=<value>env-style assignmentsgeneric

Replace match with <REDACTED-{label}> so it's visually obvious what got scrubbed (and so a downstream agent reading the message can recognize the placeholder rather than acting on garbage).

Where to apply

By default, on:

  • Outbound platform messages (Feishu / Telegram / Discord / Slack / WhatsApp / CLI / web UI rendering)
  • Dispatch / approval messages stored in the local task DB (separate proposal: #31392)
  • Any logs written under .hermes/logs/

Opt-out (per-call flag) for the rare case where the message genuinely contains a secret-shaped string that isn't a secret (key rotation messages, security docs, scrubbing instructions, etc.).

What this is NOT

  • Not a replacement for real secret-scanning at commit / CI time (e.g., git-secrets / gitleaks)
  • Not a perfect filter — pattern-based redaction always has gaps (custom token formats, unusual encodings)
  • Not encryption — once a secret is in agent input, the threat model is "don't let it propagate further", not "make it unreachable"

Workaround currently used downstream

I added a simple regex-pass for approval messages in my own dispatch-relay code (~30 LoC, no measurable latency); haven't generalized it across all outbound text yet, but the experience has been "zero false positives in three weeks of production, multiple real catches of debug-dump-with-token in agent outputs."

Proposed scope

Phase 1: a redact_secrets(text: str) -> str helper with the patterns above and ~10 lines of test fixtures (positive + negative). Apply on outbound platform messages and dispatch DB writes.

Phase 2 (separate): broader log scrubbing, optional pluggable patterns, an opt-out kwarg.

Questions

  1. In scope? Or better as an optional skill / external middleware?
  2. Default-on or opt-in? I'd argue default-on with documented opt-out (the false-positive rate on these patterns is genuinely negligible)
  3. Pattern source — hard-coded list (current local form) or pull from a maintained registry like detect-secrets?

Happy to PR. Filing as a security-hardening RFC for triage first.

Related: #31385 (bridge), #31392 (dispatch relay with approval messages), #31417 (StreamReader), and sibling Windows / env-strip bugs I'm filing alongside this.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING