hermes - 💡(How to fix) Fix Gateway /v1/chat/completions passthrough does not apply image_routing.py — OpenAI-compat clients bypass auxiliary.vision routing

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The OpenAI-compatible /v1/chat/completions passthrough handler in gateway/run.py (around lines 14109–14145) does NOT call agent/image_routing.py::decide_image_input_mode(). OpenAI-compat clients (OpenWebUI, generic OpenAI SDK callers, etc.) that send image_url content parts get forwarded directly to the configured main model, bypassing auxiliary.vision routing.

Root Cause

The same images route correctly when sent via Hermes' messaging integrations (Telegram / Discord / Slack / CLI), because the agent's user-message intake path DOES call decide_image_input_mode() and respects _explicit_aux_vision_override().

Fix Action

Fix / Workaround

  • Hermes commit: 1fc962ff08220c48775346772835f84e0235900a (v2026.5.16 + 3 local patches re-cherry-picked; the issue exists in upstream main as of 2026-05-16, independent of our patches)
  • Main model: vLLM-served NVFP4 Qwen3.6 variant where the visual tower weights are nested at model.language_model.visual.* and the vLLM loader doesn't pick them up → main model has no working vision pathway
  • Aux vision: Moondream2 wrapped in a tiny FastAPI OpenAI-compat server, running on :8002
  • Hardware: NVIDIA DGX Spark (GB10), Ubuntu 24.04

Add an image_routing.decide_image_input_mode() dispatch at the top of the /v1/chat/completions handler in gateway/run.py (around the spot where api_messages is currently built as text-only). When mode == "text":

  • Walk messages[].content for image_url parts
  • For each: call vision_analyze against the aux endpoint
  • Replace the part with a text part describing the image
  • Then forward to the main model

Code Example

auxiliary:
     vision:
       provider: custom
       model: moondream2
       base_url: http://127.0.0.1:8002/v1
       api_key: not-needed

---

curl http://127.0.0.1:8642/v1/chat/completions \
     -H "Authorization: Bearer <API_SERVER_KEY>" \
     -H 'Content-Type: application/json' \
     -d '{
       "model": "...",
       "messages": [{
         "role": "user",
         "content": [
           {"type": "text", "text": "Describe this image."},
           {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
         ]
       }],
       "max_tokens": 300
     }'
RAW_BUFFERClick to expand / collapse

Summary

The OpenAI-compatible /v1/chat/completions passthrough handler in gateway/run.py (around lines 14109–14145) does NOT call agent/image_routing.py::decide_image_input_mode(). OpenAI-compat clients (OpenWebUI, generic OpenAI SDK callers, etc.) that send image_url content parts get forwarded directly to the configured main model, bypassing auxiliary.vision routing.

Impact

Users running a vision-incapable main model + a configured auxiliary.vision sidecar experience crashes / gibberish on image uploads when sending via OpenWebUI or any other OpenAI-compat client.

The same images route correctly when sent via Hermes' messaging integrations (Telegram / Discord / Slack / CLI), because the agent's user-message intake path DOES call decide_image_input_mode() and respects _explicit_aux_vision_override().

So the behavior is inconsistent depending on which entrypoint the user happens to use — a footgun for anyone (like us) running a text-only quant as their primary model + a small vision sidecar (Moondream2 / Qwen2.5-VL / etc.) as the auxiliary.vision backend.

Reproduction

  1. Configure auxiliary.vision in config.yaml:
    auxiliary:
      vision:
        provider: custom
        model: moondream2
        base_url: http://127.0.0.1:8002/v1
        api_key: not-needed
  2. Configure a main model that lacks working vision capability (e.g. a text-only Qwen quant whose visual tower weights are missing from the safetensors).
  3. Send an image via Telegram → image_routing.decide_image_input_mode() returns "text" (because of the explicit aux override), vision_analyze runs against the sidecar, the description is prepended to the user's text, and the main model never sees pixels. ✅
  4. Send the same image via the OpenAI-compat passthrough:
    curl http://127.0.0.1:8642/v1/chat/completions \
      -H "Authorization: Bearer <API_SERVER_KEY>" \
      -H 'Content-Type: application/json' \
      -d '{
        "model": "...",
        "messages": [{
          "role": "user",
          "content": [
            {"type": "text", "text": "Describe this image."},
            {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
          ]
        }],
        "max_tokens": 300
      }'
    → The passthrough forwards the request to the main model with raw image_url parts. Main model returns gibberish or 500s. ❌

Expected behavior

The passthrough handler at gateway/run.py should call decide_image_input_mode(provider, model, cfg) near the top of the handler. When mode == "text", it should preprocess image parts through vision_analyze against the configured aux endpoint (same code path as the agent loop) before forwarding to the main model.

This makes the OpenAI-compat surface symmetric with the agent loop and gives users one consistent contract: "configure auxiliary.vision and the gateway will route images to it from any entrypoint."

Environment

  • Hermes commit: 1fc962ff08220c48775346772835f84e0235900a (v2026.5.16 + 3 local patches re-cherry-picked; the issue exists in upstream main as of 2026-05-16, independent of our patches)
  • Main model: vLLM-served NVFP4 Qwen3.6 variant where the visual tower weights are nested at model.language_model.visual.* and the vLLM loader doesn't pick them up → main model has no working vision pathway
  • Aux vision: Moondream2 wrapped in a tiny FastAPI OpenAI-compat server, running on :8002
  • Hardware: NVIDIA DGX Spark (GB10), Ubuntu 24.04

Suggested fix surface

Add an image_routing.decide_image_input_mode() dispatch at the top of the /v1/chat/completions handler in gateway/run.py (around the spot where api_messages is currently built as text-only). When mode == "text":

  • Walk messages[].content for image_url parts
  • For each: call vision_analyze against the aux endpoint
  • Replace the part with a text part describing the image
  • Then forward to the main model

This mirrors what already happens in the agent's message intake path, so the implementation can be cribbed/lifted from there.

Happy to send a PR if there's interest — let me know if the maintainers prefer this be done by the team.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The passthrough handler at gateway/run.py should call decide_image_input_mode(provider, model, cfg) near the top of the handler. When mode == "text", it should preprocess image parts through vision_analyze against the configured aux endpoint (same code path as the agent loop) before forwarding to the main model.

This makes the OpenAI-compat surface symmetric with the agent loop and gives users one consistent contract: "configure auxiliary.vision and the gateway will route images to it from any entrypoint."

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING