hermes - 💡(How to fix) Fix Image routing failure locks entire message queue — subsequent text messages blocked

hermes2026-05-07 11:11:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

Verify vision capability at runtime: When native mode fails with an image-related error (404, "image input not supported"), automatically strip image parts from the message and retry as text-only, falling back to vision_analyze for image description.

Root Cause

Root Cause Chain (3 layers)

Fix Action

Workaround

Force text mode in config.yaml:

agent:
  image_input_mode: text

This makes all images go through vision_analyze (using auxiliary vision model) instead of native attachment. Prevents the lockup but requires a working auxiliary vision configuration.

Code Example

# image_routing.py line 126-128
supports = _lookup_supports_vision(provider, model)
if supports is True:
    return "native"  # ← wrong choice when metadata is inaccurate

---

⏳ Retrying in 2.3s (attempt 1/3)...
⏳ Retrying in 4.7s (attempt 2/3)...
⚠️ Max retries (3) exhausted — trying fallback...
❌ API failed after 3 retries — HTTP 404: No endpoints found that support image input

[user sends text message]
⏳ Queued for the next turn (iteration 1/90). I'll respond once the current task finishes.
⏳ Retrying in 2.6s (attempt 1/3)...
❌ API failed after 3 retries — HTTP 404: No endpoints found that support image input

---

agent:
  image_input_mode: text

RAW_BUFFERClick to expand / collapse

Bug Description

When a user sends an image and the vision API call fails (e.g. model doesn't actually support images despite metadata claiming it does), the gateway's retry loop blocks the entire message processing pipeline. All subsequent messages — including plain text — are queued indefinitely and never processed.

Root Cause Chain (3 layers)

1. Inaccurate vision capability detection (`agent/image_routing.py`)

decide_image_input_mode() in auto mode relies on _lookup_supports_vision() which queries models.dev metadata. When the metadata incorrectly reports supports_vision=True for a model that doesn't actually support images (e.g. Xiaomi mimo-v2.5-pro), the system chooses native mode — attaching images as base64 inline to the main API call.

# image_routing.py line 126-128
supports = _lookup_supports_vision(provider, model)
if supports is True:
    return "native"  # ← wrong choice when metadata is inaccurate

2. No fallback from native to text mode

When the native API call fails with HTTP 404: No endpoints found that support image input, there is no mechanism to strip the images and retry in text mode (via vision_analyze). The failure propagates up to the agent loop.

3. Agent retry loop blocks message queue (`gateway/run.py`)

The main agent's API retry mechanism (3 attempts with exponential backoff) retries the same failing call — images included — without any degradation. During retries, the gateway marks the session as "busy" and queues all incoming messages (including subsequent text-only messages) with "Queued for the next turn" responses. These queued messages never get processed because the retry loop keeps failing.

Reproduction

Configure a model that doesn't support images (e.g. mimo-v2.5-pro on xiaomi provider)
image_input_mode: auto (default)
Ensure models.dev metadata reports supports_vision: true for the model
Send an image via Discord/Telegram
The gateway attaches image as native base64 → API returns 404 → retries 3x → all fail
Send any text message → "Queued for the next turn" → never processed

Observed Behavior

⏳ Retrying in 2.3s (attempt 1/3)...
⏳ Retrying in 4.7s (attempt 2/3)...
⚠️ Max retries (3) exhausted — trying fallback...
❌ API failed after 3 retries — HTTP 404: No endpoints found that support image input

[user sends text message]
⏳ Queued for the next turn (iteration 1/90). I'll respond once the current task finishes.
⏳ Retrying in 2.6s (attempt 1/3)...
❌ API failed after 3 retries — HTTP 404: No endpoints found that support image input

Suggested Fixes

Verify vision capability at runtime: When native mode fails with an image-related error (404, "image input not supported"), automatically strip image parts from the message and retry as text-only, falling back to vision_analyze for image description.
Add explicit capability override: Allow config.yaml to force supports_vision: false for specific models, bypassing unreliable metadata.
Timeout/degrade on retry exhaustion: When all retries fail, the agent should exit the conversation loop gracefully instead of continuing to retry, allowing queued messages to be processed.

Environment

Hermes Agent: v0.12.0 (local checkout with modifications)
Main model: mimo-v2.5-pro (xiaomi provider, does NOT support images)
image_input_mode: auto
auxiliary.vision.provider: auto (was unconfigured at time of incident)
Platform: Discord

Workaround

Force text mode in config.yaml:

agent:
  image_input_mode: text

This makes all images go through vision_analyze (using auxiliary vision model) instead of native attachment. Prevents the lockup but requires a working auxiliary vision configuration.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #API middleware #SSR setup #ISR setup #authentication setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Image routing failure locks entire message queue — subsequent text messages blocked

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root Cause Chain (3 layers)

Fix Action

Workaround

Code Example

Bug Description

Root Cause Chain (3 layers)

1. Inaccurate vision capability detection (`agent/image_routing.py`)

2. No fallback from native to text mode

3. Agent retry loop blocks message queue (`gateway/run.py`)

Reproduction

Observed Behavior

Suggested Fixes

Environment

Workaround

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Image routing failure locks entire message queue — subsequent text messages blocked

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root Cause Chain (3 layers)

Fix Action

Workaround

Code Example

Bug Description

Root Cause Chain (3 layers)

1. Inaccurate vision capability detection (agent/image_routing.py)

2. No fallback from native to text mode

3. Agent retry loop blocks message queue (gateway/run.py)

Reproduction

Observed Behavior

Suggested Fixes

Environment

Workaround

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. Inaccurate vision capability detection (`agent/image_routing.py`)

3. Agent retry loop blocks message queue (`gateway/run.py`)