hermes - ✅(Solved) Fix Harden large tool-result persistence and redact stored inline previews [1 pull requests, 1 participants]

hermes2026-05-10 13:18:42

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#23200•Fetched 2026-05-11 03:30:35

View on GitHub

Comments

Participants

Timeline

Reactions

Author

djonesx

Participants

djonesx

Timeline (top)

labeled ×3cross-referenced ×1

Hermes already has a useful large-tool-result persistence path that spills oversized tool outputs to a sandbox file and replaces the in-context content with a preview and file reference.

I would like to propose hardening that boundary so large tool results are handled consistently before they are written into session logs, transcripts, SQLite/session state, and user-visible previews.

This is mainly a reliability/privacy-hardening improvement rather than a security advisory: the goal is to reduce context/session bloat, avoid huge transcript rows, and prevent sensitive-looking values from being unnecessarily copied into inline previews.

Root Cause

Hermes already has a useful large-tool-result persistence path that spills oversized tool outputs to a sandbox file and replaces the in-context content with a preview and file reference.

I would like to propose hardening that boundary so large tool results are handled consistently before they are written into session logs, transcripts, SQLite/session state, and user-visible previews.

Fix Action

Fix / Workaround

I can provide a PR if maintainers agree with the direction. I have a local patch that applies this pattern around tool execution result handling, session logging/state persistence, preview redaction, and sandbox writes, but I would separate it into small reviewable commits if upstreaming.

PR fix notes

PR #23509: fix(tools): redact secrets in persisted tool-result preview

Repository: NousResearch/hermes-agent
Author: konsisumer
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/23509

Description (problem / solution / changelog)

Apply secret redaction to the bounded preview that lands in durable session/transcript state when a large tool result is spilled to the sandbox.

What changed and why

tools/tool_result_storage.py: new _redact_preview() helper that calls agent.redact.redact_sensitive_text(preview, force=True) and falls back to the raw preview on any exception; wired into maybe_persist_tool_result so both the <persisted-output> block and the no-env / write-failed inline-truncation fallback receive the scrubbed preview. The full content written to the sandbox file is untouched — the model can still recover the original via read_file.
force=True because this is a privacy boundary and should fire regardless of the operator's global security.redact_secrets setting (matches how send_message_tool.py and code_execution_tool.py already use the same helper for output that crosses a trust boundary).
tests/tools/test_tool_result_storage.py: 4 new tests in a TestPreviewRedaction class — canary sk-… token in preview is masked, full stdin payload to sandbox stays byte-exact, redaction-helper failure falls back to the raw preview rather than breaking tool execution, and the no-env truncation path also redacts.

Scope is intentionally narrow (item 3 of the issue). Items 1, 2, 4, 5, and 6 the issue lists were already implemented in the file (stdin-based sandbox writes via env.execute(..., stdin_data=...), aggregate enforce_turn_budget that skips already-persisted blocks via the PERSISTED_OUTPUT_TAG check, and existing test coverage for those mechanics). Touching SQLite append_message to redact tool-role content before insert would also affect FTS5 indexing and message replay; that belongs in a separate PR if maintainers want it.

How to test

pytest tests/tools/test_tool_result_storage.py -q --timeout=60 — 53 tests pass (49 existing + 4 new).
Manual smoke: python -c "from tools.tool_result_storage import _redact_preview; print(_redact_preview('sk-CANARYabcdefghijklmnopqrstuv inside'))" returns sk-CAN...stuv inside.

What platforms tested on

macOS on darwin-arm64 (local)

Fixes #23200

Changed files

tests/tools/test_tool_result_storage.py (modified, +79/-0)
tools/tool_result_storage.py (modified, +23/-2)

RAW_BUFFERClick to expand / collapse

GitHub issue submission body

Summary

Hermes already has a useful large-tool-result persistence path that spills oversized tool outputs to a sandbox file and replaces the in-context content with a preview and file reference.

I would like to propose hardening that boundary so large tool results are handled consistently before they are written into session logs, transcripts, SQLite/session state, and user-visible previews.

Problem

Large tool outputs can still cause operational issues if raw content reaches persistence/logging paths before being replaced with a persisted-output reference.

Observed failure modes include:

oversized tool outputs bloating session logs or SQLite-backed state;
repeated context compression being triggered earlier than necessary;
browser or transcript views becoming slow or noisy after broad file/search/shell results;
inline previews retaining sensitive-looking snippets even when the full result is spilled to disk;
shell heredoc persistence being fragile when tool output contains unusual delimiters or shell-sensitive content.

A synthetic example is enough to reproduce the shape of the issue; no real credentials or private data are needed:

Example synthetic tool output:

CANARY_TOOL_RESULT_123456
line 1 ...
line 2 ...
<repeat until larger than the configured result threshold>

Expected behaviour:

the full output is written to the sandbox result file;
the model/session/transcript stores only a bounded replacement block;
inline preview text is redacted/sanitised using the existing redaction helper where available;
the same safe representation is used consistently across tool execution, session logging, and state persistence.

Proposed changes

Add/confirm one central helper for safe tool-result storage.

It should accept the raw tool result plus tool name, tool call ID, and active environment, then return either:
- the original content, if small and safe to keep inline; or
- a bounded persisted-output replacement with path, byte/character count, and a small preview.
Apply that helper before writing tool messages to durable/session persistence paths.

In particular, tool-role messages should be normalised before being encoded into session state or transcript/session logs.
Redact inline previews before storing them.

If agent.redact.redact_sensitive_text is available, use it on the preview text before building the persisted-output message. If redaction fails, fall back gracefully rather than breaking tool execution.
Prefer env.execute(..., stdin_data=content) for sandbox writes.

This avoids embedding arbitrary tool output into shell heredocs. Keep the heredoc method as a compatibility fallback for older/custom environments that do not support stdin_data.
Avoid duplicate persistence passes.

If a tool result has already been replaced with a persisted-output block, later aggregate budget enforcement should recognise it and avoid spilling the replacement again.
Add tests using synthetic canary output.

Suggested tests:
- oversized tool result is replaced with a persisted-output block;
- full output is written to the sandbox path;
- session/log/state persistence receives the bounded replacement, not the full raw output;
- preview redaction is applied to obvious synthetic secret-looking strings;
- stdin_data write path is used where supported and heredoc fallback still works.

Why this helps

This makes Hermes more robust for real operational use where tools can return large file contents, search results, command output, API responses, or structured JSON. It reduces session bloat and keeps persisted history more manageable without removing the model’s ability to inspect the full output when needed.

Related issues

This is related to #14948, but narrower: #14948 discusses progressive compression of older tool results before LLM calls; this issue focuses on the persistence/logging boundary for large current tool results and ensuring the durable representation is bounded and preview-redacted.

Notes

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #integration issue #index setup #retrieval issue #search optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix Harden large tool-result persistence and redact stored inline previews [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #23509: fix(tools): redact secrets in persisted tool-result preview

Description (problem / solution / changelog)

What changed and why

How to test

What platforms tested on

Changed files

GitHub issue submission body

Summary

Problem

Proposed changes

Why this helps

Related issues

Notes

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix Harden large tool-result persistence and redact stored inline previews [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #23509: fix(tools): redact secrets in persisted tool-result preview

Description (problem / solution / changelog)

What changed and why

How to test

What platforms tested on

Changed files

GitHub issue submission body

Summary

Problem

Proposed changes

Why this helps

Related issues

Notes

Still need to ship something?

RELATED_DISCOVERY

TRENDING