hermes - 💡(How to fix) Fix Gateway /v1/chat/completions passthrough does not apply image_routing.py — OpenAI-compat clients bypass auxiliary.vision routing

Q: Expected behavior

The passthrough handler at `gateway/run.py` should call `decide_image_input_mode(provider, model, cfg)` near the top of the handler. When mode == `"text"`, it should preprocess image parts through `vision_analyze` against the configured aux endpoint (same code path as the agent loop) before forwarding to the main model. This makes the OpenAI-compat surface symmetric with the agent loop and gives users one consistent contract: "configure `auxiliary.vision` and the gateway will route images to it from *any* entrypoint."

hermes2026-05-16 17:01:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

The OpenAI-compatible /v1/chat/completions passthrough handler in gateway/run.py (around lines 14109–14145) does NOT call agent/image_routing.py::decide_image_input_mode(). OpenAI-compat clients (OpenWebUI, generic OpenAI SDK callers, etc.) that send image_url content parts get forwarded directly to the configured main model, bypassing auxiliary.vision routing.

Root Cause

The same images route correctly when sent via Hermes' messaging integrations (Telegram / Discord / Slack / CLI), because the agent's user-message intake path DOES call decide_image_input_mode() and respects _explicit_aux_vision_override().

Fix Action

Fix / Workaround

Hermes commit: 1fc962ff08220c48775346772835f84e0235900a (v2026.5.16 + 3 local patches re-cherry-picked; the issue exists in upstream main as of 2026-05-16, independent of our patches)
Main model: vLLM-served NVFP4 Qwen3.6 variant where the visual tower weights are nested at model.language_model.visual.* and the vLLM loader doesn't pick them up → main model has no working vision pathway
Aux vision: Moondream2 wrapped in a tiny FastAPI OpenAI-compat server, running on :8002
Hardware: NVIDIA DGX Spark (GB10), Ubuntu 24.04

Add an image_routing.decide_image_input_mode() dispatch at the top of the /v1/chat/completions handler in gateway/run.py (around the spot where api_messages is currently built as text-only). When mode == "text":

Walk messages[].content for image_url parts
For each: call vision_analyze against the aux endpoint
Replace the part with a text part describing the image
Then forward to the main model

Code Example

auxiliary:
     vision:
       provider: custom
       model: moondream2
       base_url: http://127.0.0.1:8002/v1
       api_key: not-needed

---

curl http://127.0.0.1:8642/v1/chat/completions \
     -H "Authorization: Bearer <API_SERVER_KEY>" \
     -H 'Content-Type: application/json' \
     -d '{
       "model": "...",
       "messages": [{
         "role": "user",
         "content": [
           {"type": "text", "text": "Describe this image."},
           {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
         ]
       }],
       "max_tokens": 300
     }'

RAW_BUFFERClick to expand / collapse

Summary

Impact

Users running a vision-incapable main model + a configured auxiliary.vision sidecar experience crashes / gibberish on image uploads when sending via OpenWebUI or any other OpenAI-compat client.

So the behavior is inconsistent depending on which entrypoint the user happens to use — a footgun for anyone (like us) running a text-only quant as their primary model + a small vision sidecar (Moondream2 / Qwen2.5-VL / etc.) as the auxiliary.vision backend.

Reproduction

Configure auxiliary.vision in config.yaml:

auxiliary:
  vision:
    provider: custom
    model: moondream2
    base_url: http://127.0.0.1:8002/v1
    api_key: not-needed

Configure a main model that lacks working vision capability (e.g. a text-only Qwen quant whose visual tower weights are missing from the safetensors).
Send an image via Telegram → image_routing.decide_image_input_mode() returns "text" (because of the explicit aux override), vision_analyze runs against the sidecar, the description is prepended to the user's text, and the main model never sees pixels. ✅

Send the same image via the OpenAI-compat passthrough:

curl http://127.0.0.1:8642/v1/chat/completions \
  -H "Authorization: Bearer <API_SERVER_KEY>" \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "...",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image."},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
      ]
    }],
    "max_tokens": 300
  }'

→ The passthrough forwards the request to the main model with raw image_url parts. Main model returns gibberish or 500s. ❌

Expected behavior

The passthrough handler at gateway/run.py should call decide_image_input_mode(provider, model, cfg) near the top of the handler. When mode == "text", it should preprocess image parts through vision_analyze against the configured aux endpoint (same code path as the agent loop) before forwarding to the main model.

This makes the OpenAI-compat surface symmetric with the agent loop and gives users one consistent contract: "configure auxiliary.vision and the gateway will route images to it from any entrypoint."

Environment

Hermes commit: 1fc962ff08220c48775346772835f84e0235900a (v2026.5.16 + 3 local patches re-cherry-picked; the issue exists in upstream main as of 2026-05-16, independent of our patches)
Main model: vLLM-served NVFP4 Qwen3.6 variant where the visual tower weights are nested at model.language_model.visual.* and the vLLM loader doesn't pick them up → main model has no working vision pathway
Aux vision: Moondream2 wrapped in a tiny FastAPI OpenAI-compat server, running on :8002
Hardware: NVIDIA DGX Spark (GB10), Ubuntu 24.04

Suggested fix surface

Walk messages[].content for image_url parts
For each: call vision_analyze against the aux endpoint
Replace the part with a text part describing the image
Then forward to the main model

This mirrors what already happens in the agent's message intake path, so the implementation can be cribbed/lifted from there.

Happy to send a PR if there's interest — let me know if the maintainers prefer this be done by the team.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #tensor shape #autograd error #model save/load #optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Gateway /v1/chat/completions passthrough does not apply image_routing.py — OpenAI-compat clients bypass auxiliary.vision routing

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Impact

Reproduction

Expected behavior

Environment

Suggested fix surface

FAQ

Expected behavior

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Gateway /v1/chat/completions passthrough does not apply image_routing.py — OpenAI-compat clients bypass auxiliary.vision routing

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Impact

Reproduction

Expected behavior

Environment

Suggested fix surface

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING