hermes - ✅(Solved) Fix Matrix image-only messages use filename as text and can mislead vision handling [1 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#13482Fetched 2026-04-22 08:06:10
View on GitHub
Comments
1
Participants
1
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
labeled ×4commented ×1

On the latest main, inbound Matrix m.image events still treat content.body (usually just the uploaded filename like 30.png) as the user text payload.

This becomes problematic when Hermes later prepends auto-vision analysis to the same message text: the model receives something semantically equivalent to:

[vision description of the image]

30.png

For image-only Matrix messages, that trailing raw filename can mislead the model into treating the turn as a file/path lookup instead of an image understanding request.

Root Cause

The media download / cache path itself is working correctly. The image is cached locally and passed into the vision enrichment pipeline. The issue is the semantic packaging of the inbound message.

In practice this can produce behavior like:

  • user sends an image in Matrix DM
  • Hermes caches it successfully
  • Hermes then tries to search_files / reason about 30.png as a literal filename
  • instead of responding to the image content

PR fix notes

PR #14063: fix(matrix): normalize image-only filenames

Description (problem / solution / changelog)

Summary

  • stop forwarding Matrix image-only filenames as user text
  • preserve real image captions for downstream vision flows
  • add regression coverage for both image-only and captioned image events

Problem

Fixes #13482.

Matrix m.image events often put the uploaded filename in content.body when no caption is provided. Hermes was forwarding that raw filename as MessageEvent.text, so later vision enrichment could receive inputs like a vision summary plus 30.png, which steers the model toward filename/path reasoning instead of image understanding.

Changes

  • add a narrow m.image filename heuristic in gateway/platforms/matrix.py
  • clear MessageEvent.text only when the body looks like a transport filename
  • keep real caption text unchanged
  • add regression tests for image-only and captioned image messages

Verification

  • pytest -o addopts= tests/gateway/test_matrix.py -k 'image_only_filename_body_is_not_forwarded_as_text or image_caption_text_is_preserved'
  • manual repro after the fix:
    • 30.png =>
    • Please describe this chart => 'Please describe this chart'

Changed files

  • gateway/platforms/matrix.py (modified, +41/-0)
  • tests/gateway/test_matrix.py (modified, +76/-0)

Code Example

[vision description of the image]

30.png

---

msg_event = MessageEvent(
       text=body,
       message_type=msg_type,
       source=source,
       raw_message=source_content,
       message_id=event_id,
       media_urls=media_urls,
       media_types=media_types,
   )
RAW_BUFFERClick to expand / collapse

Summary

On the latest main, inbound Matrix m.image events still treat content.body (usually just the uploaded filename like 30.png) as the user text payload.

This becomes problematic when Hermes later prepends auto-vision analysis to the same message text: the model receives something semantically equivalent to:

[vision description of the image]

30.png

For image-only Matrix messages, that trailing raw filename can mislead the model into treating the turn as a file/path lookup instead of an image understanding request.

Why this matters

The media download / cache path itself is working correctly. The image is cached locally and passed into the vision enrichment pipeline. The issue is the semantic packaging of the inbound message.

In practice this can produce behavior like:

  • user sends an image in Matrix DM
  • Hermes caches it successfully
  • Hermes then tries to search_files / reason about 30.png as a literal filename
  • instead of responding to the image content

Current code path on latest main

  1. gateway/platforms/matrix.py:

    • _handle_media_message() reads body = source_content.get("body", "") or ""
    • for m.image, that body is typically just the upload filename
    • after context resolution it constructs:
    msg_event = MessageEvent(
        text=body,
        message_type=msg_type,
        source=source,
        raw_message=source_content,
        message_id=event_id,
        media_urls=media_urls,
        media_types=media_types,
    )
  2. gateway/run.py:

    • if event.media_urls contains images, Hermes calls _enrich_message_with_vision(user_text, image_paths)
    • that function prepends the vision description to the existing user_text

So for image-only Matrix messages, the filename survives as the message text and is mixed into the final LLM input.

Expected behavior

For Matrix image-only messages:

  • if there is no real user caption, MessageEvent.text should probably be empty
  • or be wrapped in an explicit semantic marker instead of a bare filename
  • caption vs filename should be distinguished

This would align better with how image-only messages are typically represented in other channels, and avoids confusing the model while preserving the cached local image path for downstream vision tools.

Repro

  1. Run Hermes with Matrix enabled and vision enabled.
  2. Send an image-only message (no caption) in a Matrix DM.
  3. Observe that the inbound body is the filename (for example 30.png).
  4. Hermes may interpret the turn as being about the literal filename rather than the image contents.

Suggested fix directions

Possible fixes upstream:

  1. In Matrix _handle_media_message(), treat m.image filename/body separately from user caption text.
  2. For image-only events, set MessageEvent.text to empty string instead of raw filename.
  3. If keeping some textual hint is desirable, wrap it semantically, e.g. [User sent an image: 30.png], rather than passing a naked filename.
  4. More generally, preserve the cached local media path in media_urls, but avoid leaking transport-level filename metadata into the user-intent text unless it is actually a caption.

Notes

I confirmed this is still present on latest origin/main in local inspection, not just in an older downstream integration.

extent analysis

TL;DR

Set MessageEvent.text to an empty string for image-only Matrix messages to prevent filename misinterpretation.

Guidance

  • In gateway/platforms/matrix.py, modify _handle_media_message() to check if the message is an image-only event and set body to an empty string if there's no user caption.
  • Consider wrapping the filename in a semantic marker (e.g., [User sent an image: 30.png]) if preserving some textual hint is desirable.
  • Verify that the fix works by sending an image-only message in a Matrix DM and checking that Hermes interprets the turn as being about the image contents, not the filename.
  • Review the gateway/run.py code to ensure that the vision description is correctly prepended to the user text, without including the filename.

Example

if msg_type == "m.image" and not source_content.get("body"):
    body = ""

Notes

This fix assumes that the body field is only used for user captions and not for other purposes. Additional testing may be necessary to ensure that this change does not introduce other issues.

Recommendation

Apply the suggested fix to set MessageEvent.text to an empty string for image-only Matrix messages, as this should prevent the filename from being misinterpreted by the model.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

For Matrix image-only messages:

  • if there is no real user caption, MessageEvent.text should probably be empty
  • or be wrapped in an explicit semantic marker instead of a bare filename
  • caption vs filename should be distinguished

This would align better with how image-only messages are typically represented in other channels, and avoids confusing the model while preserving the cached local image path for downstream vision tools.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Matrix image-only messages use filename as text and can mislead vision handling [1 pull requests, 1 comments, 1 participants]