hermes - 💡(How to fix) Fix bug: queued photo messages lose media during vision enrichment [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#11538Fetched 2026-04-18 06:00:26
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Root Cause

Three separate bugs compound into this:

  1. Discord adapter placeholder on captionless photos: _handle_message() sets text to a placeholder before vision enrichment runs. This placeholder persists even after the vision pipeline prepends an image description, confusing the LLM.

  2. _dequeue_pending_text() discards media: The dequeue helper only returns event.text, throwing away media_urls. Queued photos are processed as empty text messages with zero vision analysis.

  3. Interrupt path loses media: When a message with media triggers an interrupt, only event.text is stored in _pending_messages, losing media_urls entirely.

Code Example

File unchanged since last read. The content from the earlier read_file result in this conversation is still current — refer to that instead of re-reading.
RAW_BUFFERClick to expand / collapse

Bug

When a user sends a photo while the agent is already processing a previous message, the queued photo loses its media context entirely. The image is never sent through vision enrichment, and the agent receives an empty or misleading text message instead.

Root Cause

Three separate bugs compound into this:

  1. Discord adapter placeholder on captionless photos: _handle_message() sets text to a placeholder before vision enrichment runs. This placeholder persists even after the vision pipeline prepends an image description, confusing the LLM.

  2. _dequeue_pending_text() discards media: The dequeue helper only returns event.text, throwing away media_urls. Queued photos are processed as empty text messages with zero vision analysis.

  3. Interrupt path loses media: When a message with media triggers an interrupt, only event.text is stored in _pending_messages, losing media_urls entirely.

Steps to Reproduce

  1. Send a text message that triggers a long agent response
  2. While the agent is processing, send a photo (with or without caption)
  3. Wait for the agent to finish and process the queued message
  4. The agent responds as if no image was sent

Suggested Fix

  • Add _dequeue_pending_event() that returns the full MessageEvent (not just text)
  • Add _enrich_pending_event() that runs vision/STT on dequeued events
  • In Discord adapter, skip the placeholder text when media_urls is present
  • In interrupt path, store the full event in adapter's pending queue to preserve media
<details> <summary>Reference diff (from justmaker fork, will need porting to current main)</summary>
File unchanged since last read. The content from the earlier read_file result in this conversation is still current — refer to that instead of re-reading.
</details>

extent analysis

TL;DR

To fix the issue, update the Discord adapter and message processing pipeline to preserve media context by returning the full MessageEvent and running vision/STT on dequeued events.

Guidance

  • Implement _dequeue_pending_event() to return the full MessageEvent, including media_urls, instead of just event.text.
  • Create _enrich_pending_event() to run vision and STT on dequeued events, ensuring media context is not lost.
  • Modify the Discord adapter to skip setting a placeholder text when media_urls is present, preventing confusion in the LLM.
  • Update the interrupt path to store the full event in the adapter's pending queue, preserving media context.

Example

// Proposed change to return full MessageEvent
function _dequeue_pending_event() {
  // Return the full event, including media_urls
  return this._pending_messages.shift();
}

// Proposed change to enrich dequeued events
function _enrich_pending_event(event) {
  // Run vision and STT on the dequeued event
  if (event.media_urls) {
    // Vision enrichment pipeline
  }
  return event;
}

Notes

The provided diff is from a different fork and needs to be ported to the current main branch. The suggested fixes require careful implementation to ensure media context is preserved throughout the message processing pipeline.

Recommendation

Apply the suggested workarounds by implementing _dequeue_pending_event() and _enrich_pending_event(), and updating the Discord adapter and interrupt path to preserve media context, as these changes directly address the identified root causes of the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix bug: queued photo messages lose media during vision enrichment [1 participants]