hermes - 💡(How to fix) Fix bug: queued photo messages lose media during vision enrichment [1 participants]

hermes2026-04-17 10:06:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#11538•Fetched 2026-04-18 06:00:26

View on GitHub

Comments

Participants

Timeline

Reactions

Author

justmaker

Participants

justmaker

Root Cause

Three separate bugs compound into this:

Discord adapter placeholder on captionless photos: _handle_message() sets text to a placeholder before vision enrichment runs. This placeholder persists even after the vision pipeline prepends an image description, confusing the LLM.
_dequeue_pending_text() discards media: The dequeue helper only returns event.text, throwing away media_urls. Queued photos are processed as empty text messages with zero vision analysis.
Interrupt path loses media: When a message with media triggers an interrupt, only event.text is stored in _pending_messages, losing media_urls entirely.

Code Example

File unchanged since last read. The content from the earlier read_file result in this conversation is still current — refer to that instead of re-reading.

RAW_BUFFERClick to expand / collapse

Bug

When a user sends a photo while the agent is already processing a previous message, the queued photo loses its media context entirely. The image is never sent through vision enrichment, and the agent receives an empty or misleading text message instead.

Root Cause

Three separate bugs compound into this:

Discord adapter placeholder on captionless photos: _handle_message() sets text to a placeholder before vision enrichment runs. This placeholder persists even after the vision pipeline prepends an image description, confusing the LLM.
_dequeue_pending_text() discards media: The dequeue helper only returns event.text, throwing away media_urls. Queued photos are processed as empty text messages with zero vision analysis.
Interrupt path loses media: When a message with media triggers an interrupt, only event.text is stored in _pending_messages, losing media_urls entirely.

Steps to Reproduce

Send a text message that triggers a long agent response
While the agent is processing, send a photo (with or without caption)
Wait for the agent to finish and process the queued message
The agent responds as if no image was sent

Suggested Fix

Add _dequeue_pending_event() that returns the full MessageEvent (not just text)
Add _enrich_pending_event() that runs vision/STT on dequeued events
In Discord adapter, skip the placeholder text when media_urls is present
In interrupt path, store the full event in adapter's pending queue to preserve media

<details> <summary>Reference diff (from justmaker fork, will need porting to current main)</summary>

File unchanged since last read. The content from the earlier read_file result in this conversation is still current — refer to that instead of re-reading.

</details>

extent analysis

TL;DR

To fix the issue, update the Discord adapter and message processing pipeline to preserve media context by returning the full MessageEvent and running vision/STT on dequeued events.

Guidance

Implement _dequeue_pending_event() to return the full MessageEvent, including media_urls, instead of just event.text.
Create _enrich_pending_event() to run vision and STT on dequeued events, ensuring media context is not lost.
Modify the Discord adapter to skip setting a placeholder text when media_urls is present, preventing confusion in the LLM.
Update the interrupt path to store the full event in the adapter's pending queue, preserving media context.

Example

// Proposed change to return full MessageEvent
function _dequeue_pending_event() {
  // Return the full event, including media_urls
  return this._pending_messages.shift();
}

// Proposed change to enrich dequeued events
function _enrich_pending_event(event) {
  // Run vision and STT on the dequeued event
  if (event.media_urls) {
    // Vision enrichment pipeline
  }
  return event;
}

Notes

The provided diff is from a different fork and needs to be ported to the current main branch. The suggested fixes require careful implementation to ensure media context is preserved throughout the message processing pipeline.

Recommendation

Apply the suggested workarounds by implementing _dequeue_pending_event() and _enrich_pending_event(), and updating the Discord adapter and interrupt path to preserve media context, as these changes directly address the identified root causes of the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#SSR setup #ISR setup #authentication setup #request error #file not found

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix bug: queued photo messages lose media during vision enrichment [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Bug

Root Cause

Steps to Reproduce

Suggested Fix

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix bug: queued photo messages lose media during vision enrichment [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Bug

Root Cause

Steps to Reproduce

Suggested Fix

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING