For Matrix image-only messages: - if there is no real user caption, `MessageEvent.text` should probably be empty - or be wrapped in an explicit semantic marker instead of a bare filename - caption vs filename should be distinguished This would align better with how image-only messages are typically represented in other channels, and avoids confusing the model while preserving the cached local image path for downstream vision tools.

hermes - ✅(Solved) Fix Matrix image-only messages use filename as text and can mislead vision handling [1 pull requests, 1 comments, 1 participants]

johnlanni · 2026-04-21T11:31:23Z

[hermes] On the latest main , inbound Matrix m.image events still treat content.body usually just the uploaded filename like 30.png as the user text payload. T… On the latest `main`, inbound Matrix `m.image` events still treat `content.body` (usually just the uploaded filename like `30.png`) as the user text payload. This becomes problematic when Hermes later prepends auto-vision analysis to the same message text: the model receives something semantically equivalent to: ```text [vision description of the image] 30.png ``` For image-only Matrix messages, that trailing raw filename can mislead the model into treating the turn as a file/path lookup instead of an image understanding request. # PR #14063: fix(matrix): normalize image-only filenames - Repository: NousResearch/hermes-agent - Author: LeonSGP43 - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/14063 ## Description (problem / solution / changelog) ## Summary - stop forwarding Matrix image-only filenames as user text - preserve real image captions for downstream vision flows - add regression coverage for both image-only and captioned image events ## Problem Fixes #13482. Matrix `m.image` events often put the uploaded filename in `content.body` when no caption is provided. Hermes was forwarding that raw filename as `MessageEvent.text`, so later vision enrichment could receive inputs like a vision summary plus `30.png`, which steers the model toward filename/path reasoning instead of image understanding. ## Changes - add a narrow `m.image` filename heuristic in `gateway/platforms/matrix.py` - clear `MessageEvent.text` only when the body looks like a transport filename - keep real caption text unchanged - add regression tests for image-only and captioned image messages ## Verification - `pytest -o addopts= tests/gateway/test_matrix.py -k 'image_only_filename_body_is_not_forwarded_as_text or image_caption_text_is_preserved'` - manual repro after the fix: - `30.png => ` - `Please describe this chart => 'Please describe this chart'` ## Changed files - `gateway/platforms/matrix.py` (modified, +41/-0) - `tests/gateway/test_matrix.py` (modified, +76/-0) ## Summary On the latest `main`, inbound Matrix `m.image` events still treat `content.body` (usually just the uploaded filename like `30.png`) as the user text payload. This becomes problematic when Hermes later prepends auto-vision analysis to the same message text: the model receives something semantically equivalent to: ```text [vision description of the image] 30.png ``` For image-only Matrix messages, that trailing raw filename can mislead the model into treating the turn as a file/path lookup instead of an image understanding request. ## Why this matters The media download / cache path itself is working correctly. The image is cached locally and passed into the vision enrichment pipeline. The issue is the semantic packaging of the inbound message. In practice this can produce behavior like: - user sends an image in Matrix DM - Hermes caches it successfully - Hermes then tries to `search_files` / reason about `30.png` as a literal filename - instead of responding to the image content ## Current code path on latest `main` 1. `gateway/platforms/matrix.py`: - `_handle_media_message()` reads `body = source_content.get("body", "") or ""` - for `m.image`, that `body` is typically just the upload filename - after context resolution it constructs: ```python msg_event = MessageEvent( text=body, message_type=msg_type, source=source, raw_message=source_content, message_id=event_id, media_urls=media_urls, media_types=media_types, ) ``` 2. `gateway/run.py`: - if `event.media_urls` contains images, Hermes calls `_enrich_message_with_vision(user_text, image_paths)` - that function prepends the vision description to the existing `user_text` So for image-only Matrix messages, the filename survives as the message text and is mixed into the final LLM input. ## Expected behavior For Matrix image-only messages: - if there is no real user caption, `MessageEvent.text` should probably be empty - or be wrapped in an explicit semantic marker instead of a bare filename - caption vs filename should be distinguished This would align better with how image-only messages are typically represented in other channels, and avoids confusing the model while preserving the cached local image path for downstream vision tools. ## Repro 1. Run Hermes with Matrix enabled and vision enabled. 2. Send an image-only message (no caption) in a Matrix DM. 3. Observe that the inbound `body` is the filename (for example `30.png`). 4. Hermes may interpret the turn as being about the literal filename rather than the image contents. ## Suggested fix directions Possible fixes upstream: 1. In Matrix `_handle_media_message()`, treat `m.image` filename/body separately from user caption text. 2. For image-only events, set `MessageEvent.text` to empty string instead of raw filename. 3. If keeping some textual hint is desirable,

hermes2026-04-21 11:31:23

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#13482•Fetched 2026-04-22 08:06:10

View on GitHub

Comments

Participants

Timeline

Reactions

Author

johnlanni

Participants

johnlanni

Timeline (top)

labeled ×4commented ×1

On the latest main, inbound Matrix m.image events still treat content.body (usually just the uploaded filename like 30.png) as the user text payload.

This becomes problematic when Hermes later prepends auto-vision analysis to the same message text: the model receives something semantically equivalent to:

[vision description of the image]

30.png

For image-only Matrix messages, that trailing raw filename can mislead the model into treating the turn as a file/path lookup instead of an image understanding request.

Root Cause

The media download / cache path itself is working correctly. The image is cached locally and passed into the vision enrichment pipeline. The issue is the semantic packaging of the inbound message.

In practice this can produce behavior like:

user sends an image in Matrix DM
Hermes caches it successfully
Hermes then tries to search_files / reason about 30.png as a literal filename
instead of responding to the image content

PR fix notes

PR #14063: fix(matrix): normalize image-only filenames

Repository: NousResearch/hermes-agent
Author: LeonSGP43
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/14063

Description (problem / solution / changelog)

Summary

stop forwarding Matrix image-only filenames as user text
preserve real image captions for downstream vision flows
add regression coverage for both image-only and captioned image events

Problem

Fixes #13482.

Matrix m.image events often put the uploaded filename in content.body when no caption is provided. Hermes was forwarding that raw filename as MessageEvent.text, so later vision enrichment could receive inputs like a vision summary plus 30.png, which steers the model toward filename/path reasoning instead of image understanding.

Changes

add a narrow m.image filename heuristic in gateway/platforms/matrix.py
clear MessageEvent.text only when the body looks like a transport filename
keep real caption text unchanged
add regression tests for image-only and captioned image messages

Verification

pytest -o addopts= tests/gateway/test_matrix.py -k 'image_only_filename_body_is_not_forwarded_as_text or image_caption_text_is_preserved'
manual repro after the fix:
- 30.png =>
- Please describe this chart => 'Please describe this chart'

Changed files

gateway/platforms/matrix.py (modified, +41/-0)
tests/gateway/test_matrix.py (modified, +76/-0)

Code Example

[vision description of the image]

30.png

---

msg_event = MessageEvent(
       text=body,
       message_type=msg_type,
       source=source,
       raw_message=source_content,
       message_id=event_id,
       media_urls=media_urls,
       media_types=media_types,
   )

RAW_BUFFERClick to expand / collapse

Summary

On the latest main, inbound Matrix m.image events still treat content.body (usually just the uploaded filename like 30.png) as the user text payload.

This becomes problematic when Hermes later prepends auto-vision analysis to the same message text: the model receives something semantically equivalent to:

[vision description of the image]

30.png

For image-only Matrix messages, that trailing raw filename can mislead the model into treating the turn as a file/path lookup instead of an image understanding request.

Why this matters

The media download / cache path itself is working correctly. The image is cached locally and passed into the vision enrichment pipeline. The issue is the semantic packaging of the inbound message.

In practice this can produce behavior like:

user sends an image in Matrix DM
Hermes caches it successfully
Hermes then tries to search_files / reason about 30.png as a literal filename
instead of responding to the image content

Current code path on latest `main`

gateway/platforms/matrix.py:

_handle_media_message() reads body = source_content.get("body", "") or ""
for m.image, that body is typically just the upload filename
after context resolution it constructs:

msg_event = MessageEvent(
    text=body,
    message_type=msg_type,
    source=source,
    raw_message=source_content,
    message_id=event_id,
    media_urls=media_urls,
    media_types=media_types,
)

gateway/run.py:
- if event.media_urls contains images, Hermes calls _enrich_message_with_vision(user_text, image_paths)
- that function prepends the vision description to the existing user_text

So for image-only Matrix messages, the filename survives as the message text and is mixed into the final LLM input.

Expected behavior

For Matrix image-only messages:

if there is no real user caption, MessageEvent.text should probably be empty
or be wrapped in an explicit semantic marker instead of a bare filename
caption vs filename should be distinguished

This would align better with how image-only messages are typically represented in other channels, and avoids confusing the model while preserving the cached local image path for downstream vision tools.

Repro

Run Hermes with Matrix enabled and vision enabled.
Send an image-only message (no caption) in a Matrix DM.
Observe that the inbound body is the filename (for example 30.png).
Hermes may interpret the turn as being about the literal filename rather than the image contents.

Suggested fix directions

Possible fixes upstream:

In Matrix _handle_media_message(), treat m.image filename/body separately from user caption text.
For image-only events, set MessageEvent.text to empty string instead of raw filename.
If keeping some textual hint is desirable, wrap it semantically, e.g. [User sent an image: 30.png], rather than passing a naked filename.
More generally, preserve the cached local media path in media_urls, but avoid leaking transport-level filename metadata into the user-intent text unless it is actually a caption.

Notes

I confirmed this is still present on latest origin/main in local inspection, not just in an older downstream integration.

extent analysis

TL;DR

Set MessageEvent.text to an empty string for image-only Matrix messages to prevent filename misinterpretation.

Guidance

In gateway/platforms/matrix.py, modify _handle_media_message() to check if the message is an image-only event and set body to an empty string if there's no user caption.
Consider wrapping the filename in a semantic marker (e.g., [User sent an image: 30.png]) if preserving some textual hint is desirable.
Verify that the fix works by sending an image-only message in a Matrix DM and checking that Hermes interprets the turn as being about the image contents, not the filename.
Review the gateway/run.py code to ensure that the vision description is correctly prepended to the user text, without including the filename.

Example

if msg_type == "m.image" and not source_content.get("body"):
    body = ""

Notes

This fix assumes that the body field is only used for user captions and not for other purposes. Additional testing may be necessary to ensure that this change does not introduce other issues.

Recommendation

Apply the suggested fix to set MessageEvent.text to an empty string for image-only Matrix messages, as this should prevent the filename from being misinterpreted by the model.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

For Matrix image-only messages:

if there is no real user caption, MessageEvent.text should probably be empty
or be wrapped in an explicit semantic marker instead of a bare filename
caption vs filename should be distinguished

#api #ssr #cache error #pipeline error #runtime error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Matrix image-only messages use filename as text and can mislead vision handling [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

PR fix notes

PR #14063: fix(matrix): normalize image-only filenames

Description (problem / solution / changelog)

Summary

Problem

Changes

Verification

Changed files

Code Example

Summary

Why this matters

Current code path on latest main

Expected behavior

Repro

Suggested fix directions

Notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Current code path on latest `main`