hermes - 💡(How to fix) Fix Feature Request: Automatic vision fallback for non-vision primary models

hermes2026-05-25 16:43:23

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Fix Action

Fix / Workaround

Current workaround

/new (lose context)
Switch primary model to Gemini/GPT-4o
Send image again
Switch back to DeepSeek

Code Example

unknown variant `image_url`, expected `text`

---

User sends image + text
       |
       v
+----------------------+
| Vision Pre-processor |
| 1. Detect image_url  |
| 2. Check if primary  |
|    model has vision  |
| 3. If NOT: send to   |
|    auxiliary.vision  |
|    -> get descriptn  |
| 4. Replace image_url |
|    with text block   |
+----------------------+
       |
       v
  DeepSeek receives
  text-only message OK

---

auxiliary:
  vision:
    provider: google
    model: gemini-2.5-flash
    auto_fallback: true   # NEW

RAW_BUFFERClick to expand / collapse

Problem

When using a primary model that does not support vision (e.g. DeepSeek), sending an image causes the entire request to fail with:

unknown variant `image_url`, expected `text`

The failure happens at the API level — DeepSeek rejects image_url before the agent sees a response. The agent cannot recover.

Current workaround

/new (lose context)
Switch primary model to Gemini/GPT-4o
Send image again
Switch back to DeepSeek

This is disruptive — especially on Telegram where voice+image messages are common.

Proposed Solution

Pre-processing layer in the agent loop that intercepts image attachments before they reach the primary model:

User sends image + text
       |
       v
+----------------------+
| Vision Pre-processor |
| 1. Detect image_url  |
| 2. Check if primary  |
|    model has vision  |
| 3. If NOT: send to   |
|    auxiliary.vision  |
|    -> get descriptn  |
| 4. Replace image_url |
|    with text block   |
+----------------------+
       |
       v
  DeepSeek receives
  text-only message OK

Config addition

auxiliary:
  vision:
    provider: google
    model: gemini-2.5-flash
    auto_fallback: true   # NEW

Benefits

Seamless UX — no /new, no model switching, no context loss
Best-of-both-worlds — cheap text models for reasoning + specialized vision models for images
Works mid-session — images just work
No API changes — purely client-side preprocessing

Edge Cases

Multiple images per message — process sequentially or parallel
Mixed content — preserve text alongside descriptions
Vision model failure — graceful degradation with a note
Cost tracking — auxiliary calls should appear in usage stats
Streaming — vision must complete before primary model stream begins

Submitted via Hermes Agent on behalf of Vitaliy Li

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Feature Request: Automatic vision fallback for non-vision primary models

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

Current workaround

Code Example

Problem

Current workaround

Proposed Solution

Config addition

Benefits

Edge Cases

Still need to ship something?

TRENDING