hermes - ✅(Solved) Fix [Bug] Gateway auto-appends MEDIA tags from arbitrary tool results [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#16720Fetched 2026-04-28 06:51:17
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
labeled ×3cross-referenced ×1referenced ×1renamed ×1

The gateway media auto-append logic can treat literal MEDIA: examples from ordinary tool output as real attachments.

This is platform-neutral and not specific to any one gateway adapter. Any messaging gateway can hit it if a normal tool result contains documentation, logs, search output, or skill text with a literal media-tag example.

Error Message

No full traceback is produced in the minimal reproduction; the observed failure mode is incorrect media extraction / attempted delivery of a documentation example as an attachment.

Root Cause

Without this boundary, ordinary documentation or tool output can cause the gateway to attempt sending nonexistent or unintended files. The producing tool should be the trust boundary for automatic media delivery.

Fix Action

Fixed

PR fix notes

PR #16721: fix(gateway): restrict auto-appended media to producer tools

Description (problem / solution / changelog)

Summary

This PR prevents the gateway from auto-appending MEDIA: tags found in arbitrary tool results.

Previously, when the final response did not contain a media tag, the gateway scanned all current-turn tool/function output for MEDIA: strings and appended any matches to the final gateway reply. That is too broad: ordinary tools can return documentation, skill content, logs, search results, or user-authored text containing literal media examples.

For example, a docs/skill tool can legitimately return:

MEDIA:/absolute/path/to/file.png

That string is only documentation, but the old broad scanner treats it as a real attachment and may try to send a nonexistent file on any gateway platform.

The fix keeps the behavior intentionally narrow: automatic append only considers tool outputs from known media-producing tools (text_to_speech and the legacy text_to_speech_tool name). Future media-producing tools can opt in deliberately by adding their tool name to this list or by introducing a registry-level capability flag.

Fixes #16720

Changes

  • Adds a small gateway helper for collecting auto-appended media tags.
  • Maps tool results back to the assistant tool call that produced them.
  • Stops scanning arbitrary tool/function output for media tags.
  • Keeps existing auto-append behavior for real TTS media outputs.
  • Adds regression coverage for documentation/skill examples and real TTS media artifacts.

Test plan

  • Added regression coverage that skill_view output containing a MEDIA: example is not auto-appended.
  • Added positive coverage that text_to_speech media output is still auto-appended, including the voice directive.
  • Tested on macOS Darwin 25.3.0 arm64 with Hermes v0.11.0 local checkout.
venv/bin/python -m pytest tests/gateway/test_media_extraction.py -q

Result:

6 passed in 2.24s

Follow-up note

This PR intentionally avoids changing tool registration APIs. If maintainers prefer a more extensible design, the allowlist can be replaced in a follow-up with a registry-level capability flag such as auto_append_media=True on media-producing tools.

Changed files

  • gateway/run.py (modified, +58/-13)
  • tests/gateway/test_media_extraction.py (modified, +57/-0)

Code Example

MEDIA:/absolute/path/to/file.png

---

messages = [
    {
        "role": "assistant",
        "tool_calls": [
            {"id": "call_skill", "function": {"name": "skill_view"}}
        ],
    },
    {
        "role": "tool",
        "tool_call_id": "call_skill",
        "content": "Example usage: MEDIA:/absolute/path/to/file.png",
    },
]

---

['MEDIA:/absolute/path/to/file.png']

---

[]

---

messages = [
    {
        "role": "assistant",
        "tool_calls": [
            {"id": "call_tts", "function": {"name": "text_to_speech"}}
        ],
    },
    {
        "role": "tool",
        "tool_call_id": "call_tts",
        "content": "MEDIA:/tmp/voice.mp3 [[audio_as_voice]]",
    },
]

---

['MEDIA:/tmp/voice.mp3']

---

_AUTO_APPEND_MEDIA_TOOL_NAMES = {"text_to_speech", "text_to_speech_tool"}
RAW_BUFFERClick to expand / collapse

Summary

The gateway media auto-append logic can treat literal MEDIA: examples from ordinary tool output as real attachments.

This is platform-neutral and not specific to any one gateway adapter. Any messaging gateway can hit it if a normal tool result contains documentation, logs, search output, or skill text with a literal media-tag example.

Problem

When the final response does not contain a media tag, the gateway currently has logic that scans tool/function output for MEDIA: strings and appends matching paths to the final gateway reply.

That is too broad. A documentation or skill tool may legitimately return an example like:

MEDIA:/absolute/path/to/file.png

That string is documentation, not a real artifact. But the broad scanner can treat it as an attachment and try to send a nonexistent file.

Minimal reproduction

Construct a conversation where the assistant calls a documentation-like tool such as skill_view, and the tool result contains a media example:

messages = [
    {
        "role": "assistant",
        "tool_calls": [
            {"id": "call_skill", "function": {"name": "skill_view"}}
        ],
    },
    {
        "role": "tool",
        "tool_call_id": "call_skill",
        "content": "Example usage: MEDIA:/absolute/path/to/file.png",
    },
]

Old broad behavior:

['MEDIA:/absolute/path/to/file.png']

Expected behavior:

[]

A positive case should still work for real media-producing tools:

messages = [
    {
        "role": "assistant",
        "tool_calls": [
            {"id": "call_tts", "function": {"name": "text_to_speech"}}
        ],
    },
    {
        "role": "tool",
        "tool_call_id": "call_tts",
        "content": "MEDIA:/tmp/voice.mp3 [[audio_as_voice]]",
    },
]

Expected behavior:

['MEDIA:/tmp/voice.mp3']

Expected behavior

Auto-append should only consider outputs from tools that intentionally produce media artifacts, such as text_to_speech.

Documentation tools, skill tools, search tools, logs, and arbitrary tool outputs should not trigger attachment sending just because they contain a literal MEDIA: example.

Suggested fix

Use an allowlist of media-producing tool names for auto-append extraction instead of regex-scanning all tool/function output.

For example:

_AUTO_APPEND_MEDIA_TOOL_NAMES = {"text_to_speech", "text_to_speech_tool"}

Then only scan tool results whose tool_call_id maps back to an assistant tool call with an allowlisted tool name.

Why this matters

Without this boundary, ordinary documentation or tool output can cause the gateway to attempt sending nonexistent or unintended files. The producing tool should be the trust boundary for automatic media delivery.

Environment

  • OS: macOS Darwin 25.3.0 arm64
  • Hermes: v0.11.0 (2026.4.23), local checkout at commit 95c33147
  • Runtime shown by hermes --version: Python 3.11.15
  • System python3 --version: Python 3.9.6
  • Repository remote: https://github.com/NousResearch/hermes-agent.git

No full traceback is produced in the minimal reproduction; the observed failure mode is incorrect media extraction / attempted delivery of a documentation example as an attachment.

Notes

I have a local fix prepared that:

  • adds an allowlisted collector for auto-appended media tags;
  • ignores MEDIA: examples returned by skill_view and other non-media tools;
  • preserves auto-append behavior for text_to_speech outputs;
  • adds regression tests for both the negative docs/skill case and the positive TTS case.

extent analysis

TL;DR

Implement an allowlist of media-producing tool names to restrict auto-append extraction and prevent incorrect media delivery.

Guidance

  • Identify the tools that intentionally produce media artifacts and add them to the allowlist.
  • Modify the auto-append logic to only scan tool results from allowlisted tools.
  • Verify the fix by testing with both positive and negative cases, such as text_to_speech and skill_view tools.
  • Review the existing code to ensure that the allowlist is properly integrated and that the auto-append logic is updated to respect the allowlist.

Example

_AUTO_APPEND_MEDIA_TOOL_NAMES = {"text_to_speech", "text_to_speech_tool"}

# ...

if tool_name in _AUTO_APPEND_MEDIA_TOOL_NAMES:
    # Scan tool result for media tags
    # ...

Notes

The suggested fix relies on the assumption that the tool names can be reliably used to determine whether a tool produces media artifacts. Additional validation may be necessary to ensure that the allowlist is comprehensive and accurate.

Recommendation

Apply the suggested fix by implementing the allowlist and updating the auto-append logic, as it provides a clear and targeted solution to the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Auto-append should only consider outputs from tools that intentionally produce media artifacts, such as text_to_speech.

Documentation tools, skill tools, search tools, logs, and arbitrary tool outputs should not trigger attachment sending just because they contain a literal MEDIA: example.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug] Gateway auto-appends MEDIA tags from arbitrary tool results [1 pull requests, 1 participants]