hermes - 💡(How to fix) Fix [Security] Default secret-pattern redaction on outbound agent messages / dispatch logs

StepCodex · 2026-05-24T10:48:53Z

[hermes] In multi-agent / approval-gate flows see 31392 where one agent's result is shown to the user via an approval card, posted to a chat platform, and pers… In multi-agent / approval-gate flows (see #31392) where one agent's result is shown to the user via an approval card, posted to a chat platform, and persisted in logs / history / DBs, an agent that accidentally embeds a secret in its text — an API key in a code snippet, a token in a debug dump, a password in a config example, a `.env` line it was asked to summarize — leaks that secret to **all the places the message is rendered or stored**. Proposal: a small, low-false-positive regex-redaction pass over outbound agent-to-human / agent-to-storage text by default, with a way to opt out per-call when the content is known to be safe. ## Fix / Workaround - Agents are getting better at content but not at "know what not to say" — an LLM that's been asked to "summarize this config" will happily include the literal `OPENAI_API_KEY=sk-…` line - Once a secret is in chat history / approval log / dispatch DB, it's effectively permanent and hard to scrub - Common secret patterns are extremely cheap to detect (regex per pattern, microseconds per message) - False-positive rate on the well-known prefixes is essentially zero - Outbound platform messages (Feishu / Telegram / Discord / Slack / WhatsApp / CLI / web UI rendering) - Dispatch / approval messages stored in the local task DB (separate proposal: #31392) - Any logs written under `.hermes/logs/` ## Workaround currently used downstream ## Summary In multi-agent / approval-gate flows (see #31392) where one agent's result is shown to the user via an approval card, posted to a chat platform, and persisted in logs / history / DBs, an agent that accidentally embeds a secret in its text — an API key in a code snippet, a token in a debug dump, a password in a config example, a `.env` line it was asked to summarize — leaks that secret to **all the places the message is rendered or stored**. Proposal: a small, low-false-positive regex-redaction pass over outbound agent-to-human / agent-to-storage text by default, with a way to opt out per-call when the content is known to be safe. ## Why this is worth doing as a default - Agents are getting better at content but not at "know what not to say" — an LLM that's been asked to "summarize this config" will happily include the literal `OPENAI_API_KEY=sk-…` line - Once a secret is in chat history / approval log / dispatch DB, it's effectively permanent and hard to scrub - Common secret patterns are extremely cheap to detect (regex per pattern, microseconds per message) - False-positive rate on the well-known prefixes is essentially zero ## Patterns worth covering (initial set) | Pattern | Example prefix | Source | |---|---|---| | Anthropic API key | `sk-ant-…` | provider docs | | OpenAI API key | `sk-…` (40+ chars, base58-ish) | provider docs | | GitHub PAT | `ghp_…`, `gho_…`, `ghu_…`, `ghs_…`, `ghr_…` | GitHub docs | | Slack bot token | `xoxb-…`, `xoxa-…`, `xoxp-…` | Slack docs | | AWS access key | `AKIA[0-9A-Z]{16}` | AWS docs | | Google API key | `AIza[0-9A-Za-z\\-_]{35}` | Google docs | | Stripe key | `sk_live_…`, `pk_live_…` | Stripe docs | | Generic ` _API_KEY= ` | env-style assignments | generic | Replace match with ` ` so it's visually obvious what got scrubbed (and so a downstream agent reading the message can recognize the placeholder rather than acting on garbage). ## Where to apply By default, on: - Outbound platform messages (Feishu / Telegram / Discord / Slack / WhatsApp / CLI / web UI rendering) - Dispatch / approval messages stored in the local task DB (separate proposal: #31392) - Any logs written under `.hermes/logs/` Opt-out (per-call flag) for the rare case where the message genuinely contains a secret-shaped string that isn't a secret (key rotation messages, security docs, scrubbing instructions, etc.). ## What this is NOT - Not a replacement for real secret-scanning at commit / CI time (e.g., `git-secrets` / `gitleaks`) - Not a perfect filter — pattern-based redaction always has gaps (custom token formats, unusual encodings) - Not encryption — once a secret is in agent input, the threat model is "don't let it propagate further", not "make it unreachable" ## Workaround currently used downstream I added a simple regex-pass for approval messages in my own dispatch-relay code (~30 LoC, no measurable latency); haven't generalized it across all outbound text yet, but the experience has been "zero false positives in three weeks of production, multiple real catches of debug-dump-with-token in agent outputs." ## Proposed scope Phase 1: a `redact_secrets(text: str) -> str` helper with the patterns above and ~10 lines of test fixtures (positive + negative). Apply on outbound platform messages and dispatch DB writes. Phase 2 (separate): broader log scrubbing, optional pluggable patterns, an opt-out kwarg. ## Questions 1. **In scope?** Or bett

hermes2026-05-24 10:48:53

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

In multi-agent / approval-gate flows (see #31392) where one agent's result is shown to the user via an approval card, posted to a chat platform, and persisted in logs / history / DBs, an agent that accidentally embeds a secret in its text — an API key in a code snippet, a token in a debug dump, a password in a config example, a .env line it was asked to summarize — leaks that secret to all the places the message is rendered or stored.

Proposal: a small, low-false-positive regex-redaction pass over outbound agent-to-human / agent-to-storage text by default, with a way to opt out per-call when the content is known to be safe.

Root Cause

Proposal: a small, low-false-positive regex-redaction pass over outbound agent-to-human / agent-to-storage text by default, with a way to opt out per-call when the content is known to be safe.

Fix Action

Fix / Workaround

Agents are getting better at content but not at "know what not to say" — an LLM that's been asked to "summarize this config" will happily include the literal OPENAI_API_KEY=sk-… line
Once a secret is in chat history / approval log / dispatch DB, it's effectively permanent and hard to scrub
Common secret patterns are extremely cheap to detect (regex per pattern, microseconds per message)
False-positive rate on the well-known prefixes is essentially zero
Outbound platform messages (Feishu / Telegram / Discord / Slack / WhatsApp / CLI / web UI rendering)
Dispatch / approval messages stored in the local task DB (separate proposal: #31392)
Any logs written under .hermes/logs/

Workaround currently used downstream

RAW_BUFFERClick to expand / collapse

Summary

Proposal: a small, low-false-positive regex-redaction pass over outbound agent-to-human / agent-to-storage text by default, with a way to opt out per-call when the content is known to be safe.

Why this is worth doing as a default

Agents are getting better at content but not at "know what not to say" — an LLM that's been asked to "summarize this config" will happily include the literal OPENAI_API_KEY=sk-… line
Once a secret is in chat history / approval log / dispatch DB, it's effectively permanent and hard to scrub
Common secret patterns are extremely cheap to detect (regex per pattern, microseconds per message)
False-positive rate on the well-known prefixes is essentially zero

Patterns worth covering (initial set)

Pattern	Example prefix	Source
Anthropic API key	`sk-ant-…`	provider docs
OpenAI API key	`sk-…` (40+ chars, base58-ish)	provider docs
GitHub PAT	`ghp_…`, `gho_…`, `ghu_…`, `ghs_…`, `ghr_…`	GitHub docs
Slack bot token	`xoxb-…`, `xoxa-…`, `xoxp-…`	Slack docs
AWS access key	`AKIA[0-9A-Z]{16}`	AWS docs
Google API key	`AIza[0-9A-Za-z\\-_]{35}`	Google docs
Stripe key	`sk_live_…`, `pk_live_…`	Stripe docs
Generic `<NAME>_API_KEY=<value>`	env-style assignments	generic

Replace match with <REDACTED-{label}> so it's visually obvious what got scrubbed (and so a downstream agent reading the message can recognize the placeholder rather than acting on garbage).

Where to apply

By default, on:

Outbound platform messages (Feishu / Telegram / Discord / Slack / WhatsApp / CLI / web UI rendering)
Dispatch / approval messages stored in the local task DB (separate proposal: #31392)
Any logs written under .hermes/logs/

Opt-out (per-call flag) for the rare case where the message genuinely contains a secret-shaped string that isn't a secret (key rotation messages, security docs, scrubbing instructions, etc.).

What this is NOT

Not a replacement for real secret-scanning at commit / CI time (e.g., git-secrets / gitleaks)
Not a perfect filter — pattern-based redaction always has gaps (custom token formats, unusual encodings)
Not encryption — once a secret is in agent input, the threat model is "don't let it propagate further", not "make it unreachable"

Workaround currently used downstream

I added a simple regex-pass for approval messages in my own dispatch-relay code (~30 LoC, no measurable latency); haven't generalized it across all outbound text yet, but the experience has been "zero false positives in three weeks of production, multiple real catches of debug-dump-with-token in agent outputs."

Proposed scope

Phase 1: a redact_secrets(text: str) -> str helper with the patterns above and ~10 lines of test fixtures (positive + negative). Apply on outbound platform messages and dispatch DB writes.

Phase 2 (separate): broader log scrubbing, optional pluggable patterns, an opt-out kwarg.

Questions

In scope? Or better as an optional skill / external middleware?
Default-on or opt-in? I'd argue default-on with documented opt-out (the false-positive rate on these patterns is genuinely negligible)
Pattern source — hard-coded list (current local form) or pull from a maintained registry like detect-secrets?

Happy to PR. Filing as a security-hardening RFC for triage first.

Related: #31385 (bridge), #31392 (dispatch relay with approval messages), #31417 (StreamReader), and sibling Windows / env-strip bugs I'm filing alongside this.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Security] Default secret-pattern redaction on outbound agent messages / dispatch logs

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workaround currently used downstream

Summary

Why this is worth doing as a default

Patterns worth covering (initial set)

Where to apply

What this is NOT

Workaround currently used downstream

Proposed scope

Questions

Still need to ship something?

TRENDING