hermes - 💡(How to fix) Fix [Feature]: Native vision routing hides local image paths from agent [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17878Fetched 2026-05-01 05:55:21
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×4

Root Cause

Root Cause build_native_content_parts() only creates: json [{"type": "text", "text": "..."}, {"type": "image_url", "image_url": {"url": "data:..."}}]

Fix Action

Fix / Workaround

Patch diff diff --git a/agent/image_routing.py b/agent/image_routing.py --- a/agent/image_routing.py +++ b/agent/image_routing.py @@ -209,6 +209,20 @@ def build_native_content_parts( if text: parts.append({"type": "text", "text": text})

RAW_BUFFERClick to expand / collapse

Problem or Use Case

When a user sends an image via QQ Bot and the active model supports native vision, Hermes routes the image through agent/image_routing.py's build_native_content_parts(). This function converts locally cached images to base64 data: URLs and attaches them as image_url content parts in the multimodal user turn.

The Problem
The local file paths are never exposed to the agent. The LLM can "see" the image via native vision, but the agent has no way to reference the file on disk.

Impact
- Any workflow requiring downstream CLI tools that need a local file path breaks (e.g., dreamina multimodal2video --image=<path>, ffmpeg -i <path>, custom image processors).
- The agent receives the image visually but cannot construct correct CLI commands, often falling back to incorrect text-only modes or failing silently.
- Affects all platforms (QQ Bot, Telegram, Discord, etc.) when using vision-capable models.

Root Cause
build_native_content_parts() only creates:
json
[{"type": "text", "text": "..."}, {"type": "image_url", "image_url": {"url": "data:..."}}]

Local paths are consumed during base64 conversion and discarded. The agent's context lacks filesystem references.

Steps to Reproduce 1. Configure a platform (e.g., QQ Bot) with a model that supports native vision. 2. Send an image to the bot. 3. Ask the agent to use the image as input to a CLI tool that requires a file path (e.g., dreamina multimodal2video --image=<path>). 4. Observe that the agent cannot locate the file on disk, despite visually recognizing the image.

Proposed Solution

Inject a lightweight text annotation containing the local file paths into the multimodal content parts before base64 attachment. This preserves native vision while giving the agent disk access: python In agent/image_routing.py, build_native_content_parts() path_refs = [str(p) for p in image_paths if Path(p).exists()] if path_refs: path_note = f"\n\n[Image saved locally, paths: {', '.join(path_refs)}]" if text: for p in parts: if p.get("type") == "text": p["text"] += path_note break else: parts.insert(0, {"type": "text", "text": path_note.strip()})

Patch
diff
diff --git a/agent/image_routing.py b/agent/image_routing.py
--- a/agent/image_routing.py
+++ b/agent/image_routing.py
@@ -209,6 +209,20 @@ def build_native_content_parts(
     if text:
         parts.append({"type": "text", "text": text})

+    # Always inject local paths into text so the agent knows where files are on disk
+    # (native vision routing otherwise hides paths from the agent).
+    path_refs = [str(p) for p in image_paths if Path(p).exists()]
+    if path_refs:
+        path_note = f"\n\n[Image saved locally, paths: {', '.join(path_refs)}]"
+        if text:
+            for p in parts:
+                if p.get("type") == "text":
+                    p["text"] += path_note
+                    break
+        else:
+            parts.insert(0, {"type": "text", "text": path_note.strip()})
+
     for raw_path in image_paths:
         p = Path(raw_path)
         if not p.exists() or not p.is_file():

Alternatives Considered

No response

Feature Type

New tool

Scope

None

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

extent analysis

TL;DR

Injecting local file paths into the multimodal content parts as text annotations can resolve the issue of the agent being unable to reference images on disk.

Guidance

  • Modify the build_native_content_parts function in agent/image_routing.py to include local file paths in the text content parts.
  • Use the proposed patch as a starting point, which injects a text annotation containing the local file paths before base64 attachment.
  • Verify that the agent can correctly construct CLI commands using the injected file paths.
  • Test the modified function with different scenarios, such as sending images via QQ Bot and using the image as input to CLI tools.

Example

path_refs = [str(p) for p in image_paths if Path(p).exists()]
if path_refs:
    path_note = f"\n\n[Image saved locally, paths: {', '.join(path_refs)}]"
    if text:
        for p in parts:
            if p.get("type") == "text":
                p["text"] += path_note
                break
    else:
        parts.insert(0, {"type": "text", "text": path_note.strip()})

Notes

The proposed solution assumes that the image_paths variable contains the local file paths of the images. The patch provided is a starting point and may need to be modified to fit the specific requirements of the project.

Recommendation

Apply the proposed workaround by injecting local file paths into the multimodal content parts as text annotations, as it provides a straightforward solution to the issue and allows the agent to reference images on disk.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Native vision routing hides local image paths from agent [1 participants]