hermes - 💡(How to fix) Fix [Bug]: OpenAI API gateway silently drops image content — Open WebUI users can't upload photos [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fixed

Code Example

⚠️ Server rejected image content — switching to text-only mode for this session. Stripped images from history and retrying.

---

Hermes version: latest (as of 2026-05-16)
Open WebUI version: latest
Main provider: DeepSeek / deepseek-v4-pro
Vision provider: OpenRouter (Qwen-VL)
Deployment: Docker container
RAW_BUFFERClick to expand / collapse

Bug Description

When sending images to Hermes via the /v1/chat/completions endpoint (OpenAI-compatible, used by Open WebUI), the images are silently stripped and the conversation proceeds text-only. The log shows:

⚠️ Server rejected image content — switching to text-only mode for this session. Stripped images from history and retrying.

The image never reaches the vision pipeline (vision_analyze via OpenRouter), even though auxiliary.vision.provider=openrouter is configured correctly.

Steps to Reproduce

  1. Configure Hermes with terminal.provider=deepseek and auxiliary.vision.provider=openrouter
  2. Open Open WebUI pointing at Hermes' /v1/chat/completions
  3. Upload a photo in a message
  4. Observe: response says "no photo seen"; log shows the "Server rejected image content" message
  5. Upload the same photo via Hermes TUI gateway → works correctly

Expected Behavior

When auxiliary.vision.provider is configured (e.g., openrouter), the OpenAI API gateway should also route images through the vision pipeline — either by:

  • Calling the same _decide_image_input_mode() logic before passing to the agent, or
  • Converting image_url content to text descriptions via vision_analyze before passing to _run_agent()

This would make image uploads work from Open WebUI (or any OpenAI-compatible client) without requiring the main provider to natively support vision.

Actual Behavior

Open WebUI response not seeing any uploaded images.

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

⚠️ Server rejected image content — switching to text-only mode for this session. Stripped images from history and retrying.

Operating System

Unraid 7.3.0

Python Version

3.13.5

Hermes Version

0.14.0

Additional Logs / Traceback (optional)

Hermes version: latest (as of 2026-05-16)
Open WebUI version: latest
Main provider: DeepSeek / deepseek-v4-pro
Vision provider: OpenRouter (Qwen-VL)
Deployment: Docker container

Root Cause Analysis (optional)

Hermes has two separate gateways with different code paths for image handling:

GatewayFileImage Handling
TUI Gatewaygateway/run.pyCalls _decide_image_input_mode() which reads auxiliary.vision.provider and routes through vision_analyze
OpenAI API Gatewaygateway/platforms/api_server.pyNever calls decide_image_input_mode() — passes raw image_url directly to the agent

Code trace:

  1. api_server.py → _handle_chat_completions (~line 1050): calls _normalize_multimodal_content() which validates image_url format but passes it through unchanged
  2. Passes the raw multimodal list (containing image_url parts) to _run_agent()
  3. Agent sends the image_url to the main provider (e.g., DeepSeek) which rejects it (DeepSeek doesn't support image_url)
  4. run_agent.py (~line 13876): catches the rejection, sets _vision_supported = False, strips all images, and retries text-only
  5. Image analysis opportunity is lost

Meanwhile, the same image uploaded via the TUI gateway works correctly because gateway/run.py calls _decide_image_input_mode() which reads auxiliary.vision.provider and invokes vision_analyze (OpenRouter/Qwen-VL) to describe the image as text before sending to the agent.

Proposed Fix (optional)

In gateway/platforms/api_server.py, _handle_chat_completions() should detect multimodal content (list with image_url parts) and call the same decide_image_input_mode() logic from gateway/run.py before passing to _run_agent(). If the decision is "text" mode, the image should be described via vision_analyze and replaced with text.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug]: OpenAI API gateway silently drops image content — Open WebUI users can't upload photos [1 pull requests]