hermes - 💡(How to fix) Fix Image routing failure locks entire message queue — subsequent text messages blocked

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

  1. Verify vision capability at runtime: When native mode fails with an image-related error (404, "image input not supported"), automatically strip image parts from the message and retry as text-only, falling back to vision_analyze for image description.

Root Cause

Root Cause Chain (3 layers)

Fix Action

Workaround

Force text mode in config.yaml:

agent:
  image_input_mode: text

This makes all images go through vision_analyze (using auxiliary vision model) instead of native attachment. Prevents the lockup but requires a working auxiliary vision configuration.

Code Example

# image_routing.py line 126-128
supports = _lookup_supports_vision(provider, model)
if supports is True:
    return "native"  # ← wrong choice when metadata is inaccurate

---

Retrying in 2.3s (attempt 1/3)...
Retrying in 4.7s (attempt 2/3)...
⚠️ Max retries (3) exhausted — trying fallback...
API failed after 3 retries — HTTP 404: No endpoints found that support image input

[user sends text message]
Queued for the next turn (iteration 1/90). I'll respond once the current task finishes.
 Retrying in 2.6s (attempt 1/3)...
API failed after 3 retries — HTTP 404: No endpoints found that support image input

---

agent:
  image_input_mode: text
RAW_BUFFERClick to expand / collapse

Bug Description

When a user sends an image and the vision API call fails (e.g. model doesn't actually support images despite metadata claiming it does), the gateway's retry loop blocks the entire message processing pipeline. All subsequent messages — including plain text — are queued indefinitely and never processed.

Root Cause Chain (3 layers)

1. Inaccurate vision capability detection (agent/image_routing.py)

decide_image_input_mode() in auto mode relies on _lookup_supports_vision() which queries models.dev metadata. When the metadata incorrectly reports supports_vision=True for a model that doesn't actually support images (e.g. Xiaomi mimo-v2.5-pro), the system chooses native mode — attaching images as base64 inline to the main API call.

# image_routing.py line 126-128
supports = _lookup_supports_vision(provider, model)
if supports is True:
    return "native"  # ← wrong choice when metadata is inaccurate

2. No fallback from native to text mode

When the native API call fails with HTTP 404: No endpoints found that support image input, there is no mechanism to strip the images and retry in text mode (via vision_analyze). The failure propagates up to the agent loop.

3. Agent retry loop blocks message queue (gateway/run.py)

The main agent's API retry mechanism (3 attempts with exponential backoff) retries the same failing call — images included — without any degradation. During retries, the gateway marks the session as "busy" and queues all incoming messages (including subsequent text-only messages) with "Queued for the next turn" responses. These queued messages never get processed because the retry loop keeps failing.

Reproduction

  1. Configure a model that doesn't support images (e.g. mimo-v2.5-pro on xiaomi provider)
  2. image_input_mode: auto (default)
  3. Ensure models.dev metadata reports supports_vision: true for the model
  4. Send an image via Discord/Telegram
  5. The gateway attaches image as native base64 → API returns 404 → retries 3x → all fail
  6. Send any text message → "Queued for the next turn" → never processed

Observed Behavior

⏳ Retrying in 2.3s (attempt 1/3)...
⏳ Retrying in 4.7s (attempt 2/3)...
⚠️ Max retries (3) exhausted — trying fallback...
❌ API failed after 3 retries — HTTP 404: No endpoints found that support image input

[user sends text message]
⏳ Queued for the next turn (iteration 1/90). I'll respond once the current task finishes.
⏳ Retrying in 2.6s (attempt 1/3)...
❌ API failed after 3 retries — HTTP 404: No endpoints found that support image input

Suggested Fixes

  1. Verify vision capability at runtime: When native mode fails with an image-related error (404, "image input not supported"), automatically strip image parts from the message and retry as text-only, falling back to vision_analyze for image description.

  2. Add explicit capability override: Allow config.yaml to force supports_vision: false for specific models, bypassing unreliable metadata.

  3. Timeout/degrade on retry exhaustion: When all retries fail, the agent should exit the conversation loop gracefully instead of continuing to retry, allowing queued messages to be processed.

Environment

  • Hermes Agent: v0.12.0 (local checkout with modifications)
  • Main model: mimo-v2.5-pro (xiaomi provider, does NOT support images)
  • image_input_mode: auto
  • auxiliary.vision.provider: auto (was unconfigured at time of incident)
  • Platform: Discord

Workaround

Force text mode in config.yaml:

agent:
  image_input_mode: text

This makes all images go through vision_analyze (using auxiliary vision model) instead of native attachment. Prevents the lockup but requires a working auxiliary vision configuration.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING