hermes - 💡(How to fix) Fix No harness-level defense against prompt injection in tool outputs [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18981Fetched 2026-05-03 04:53:07
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×4
RAW_BUFFERClick to expand / collapse

Problem

When Hermes Agent calls tools that return external data — browser_navigate, terminal(curl ...), read_file on third-party repos, MCP tool outputs, search results — there is no harness-level mechanism to validate or guard that data before it enters the model's context. The model sees raw tool output interleaved with its conversation and system prompt, with nothing preventing injected instructions in that output from being treated as commands.

The BridgeWard skill teaches the model a skeptical-reading discipline, but it is a behavioral defense only: the model must remember to load the skill before processing external data, which fails in practice (especially under load, distraction, or when speed is prioritized).

OWASP LLM01 (2025) identifies this as a fundamental architectural gap: prompt injection defenses that rely on model compliance are not defenses at all.

Existing infrastructure that could close this gap

The transform_tool_result plugin hook (model_tools.py line 753) already fires on every tool result before it is appended to the conversation context. It allows a plugin to replace the result string entirely. The gateway/builtin_hooks/ directory exists and is currently empty.

Desired outcome

A way to mark certain tools (or all tools returning network/external data) so that their output is automatically annotated or wrapped with a security preamble before the model sees it — without requiring the model to "remember" to load a skill first. This would provide defense in depth: behavioral (skill) + architectural (harness injection).

I'm not prescribing the implementation — just surfacing that the gap exists and that the transform_tool_result hook + builtin_hooks/ directory look like natural fit points.

extent analysis

TL;DR

Implement a plugin using the transform_tool_result hook to automatically annotate or wrap tool output with a security preamble before it enters the model's context.

Guidance

  • Identify tools that return external data and determine the appropriate security preamble to apply.
  • Develop a plugin to utilize the transform_tool_result hook, replacing the result string with the annotated or wrapped output.
  • Consider implementing a default annotation for all tools returning network/external data to provide a baseline level of defense.
  • Test the plugin to ensure it correctly annotates tool output without interfering with model functionality.

Example

# Example plugin using transform_tool_result hook
def transform_tool_result(result, tool_name):
    # Apply security preamble to tool output
    secure_preamble = "[EXTERNAL DATA] "
    return secure_preamble + result

Notes

The implementation details of the plugin and security preamble will depend on the specific requirements and constraints of the system. This solution assumes that the transform_tool_result hook is sufficient to address the identified gap.

Recommendation

Apply workaround by implementing a plugin using the transform_tool_result hook, as this provides a targeted solution to the identified architectural gap without requiring changes to the model or existing skills.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix No harness-level defense against prompt injection in tool outputs [1 participants]