hermes - 💡(How to fix) Fix auxiliary vision: explicit base_url routes through generic "custom" branch, leaking main model name + OPENAI_API_KEY to the configured backend (Gemini)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When auxiliary.vision is pointed at a non-OpenAI provider via an explicit base_url (e.g. Google Gemini's https://generativelanguage.googleapis.com/v1beta), the request is routed through the generic "custom" OpenAI-compatible branch of resolve_provider_client(). That branch back-fills two values from OpenAI/main-session defaults that are invalid for the target provider:

  1. Model name leak — if auxiliary.vision.model is empty, the main session model (e.g. gpt-5.5) is sent to Gemini → 404 models/gpt-5.5 is not found for API version v1main.
  2. API-key leak — if auxiliary.vision.api_key is empty, OPENAI_API_KEY (sk-proj-…) is sent as the Bearer token to Gemini → 500 INTERNAL (Gemini can't parse a foreign key, and returns 500 rather than a clean 401, so it looks like a transient outage).

Net effect: a user who configures a dedicated Gemini vision backend with a paid AI-Studio key gets cryptic failures that look like Gemini-side problems but are actually local mis-routing.


Error Message

the target provider's default (or raise a clear config error) — never the main session Actual: Gemini returns 500 - {"error":{"code":500,"status":"INTERNAL"}}. OPENAI_API_KEY. A wrong/foreign key should surface as a clear auth error, not a 500.

Root Cause

Root cause: the "use my main model for side tasks too" fallback (intended for same-provider side tasks like title generation) is applied even when the call targets a different provider's explicit base_url.

Fix Action

Fix / Workaround

  • vision_analyze rejects video (tools/vision_tools.py): returns "Only real image files are supported for vision analysis." Gemini 2.5 Flash supports video, but there is no server-side frame-extraction (or Files-API upload) path, so video attachments can't be analyzed without an out-of-band workaround. Consider extracting a frame (or uploading via the provider's media API) before the vision call.

Workaround (until fixed)

Code Example

model:
  provider: openai-codex
  default: gpt-5.5
auxiliary:
  vision:
    provider: gemini
    model: gemini-2.5-flash          # Bug 1 triggers when this is empty
    base_url: https://generativelanguage.googleapis.com/v1beta
    api_key: ''                      # Bug 2 triggers when this is empty
agent:
  image_input_mode: auto

---

if not model:
      model = _get_aux_model_for_provider(provider) or _read_main_model() or model

---

if provider == "custom":
    if explicit_base_url:
        custom_key = (
            (explicit_api_key or "").strip()
            or os.getenv("OPENAI_API_KEY", "").strip()
            or "no-key-required"
        )

---

auxiliary:
  vision:
    provider: gemini
    model: gemini-2.5-flash
    base_url: https://generativelanguage.googleapis.com/v1beta
    api_key: AIza…            # inline AI-Studio key; do NOT leave empty
RAW_BUFFERClick to expand / collapse

Auxiliary vision routing sends the main session model + OPENAI_API_KEY to a configured non-OpenAI backend (Gemini)

Component: auxiliary task routing / vision_analyze Version: Hermes Agent v0.15.1 (2026.5.29), Python 3.11 Severity: High — a correctly-configured auxiliary.vision Gemini backend silently fails; errors masquerade as upstream 404/500s.


Related issues

  • #33389 (open) — "auxiliary.vision.provider: gemini … not honored — falls through to main provider." Its root-cause analysis (gemini missing from _VISION_AUTO_PROVIDER_ORDER / _resolve_strict_vision_backend) is stale for v0.15.1: gemini is now present in both (auxiliary_client.py:3955-3961 and :4017). The fall-through still happens, but for a different reason — described below.
  • #35454 (open) — Gemini routed through native adapter even when OpenAI-compatible endpoint configured.
  • #31179 (closed) — vision_analyze/browser_vision route images to main model. The Bug-1 leak here is the same symptom via the explicit-base_url path, which #31179's fix did not cover.

Summary

When auxiliary.vision is pointed at a non-OpenAI provider via an explicit base_url (e.g. Google Gemini's https://generativelanguage.googleapis.com/v1beta), the request is routed through the generic "custom" OpenAI-compatible branch of resolve_provider_client(). That branch back-fills two values from OpenAI/main-session defaults that are invalid for the target provider:

  1. Model name leak — if auxiliary.vision.model is empty, the main session model (e.g. gpt-5.5) is sent to Gemini → 404 models/gpt-5.5 is not found for API version v1main.
  2. API-key leak — if auxiliary.vision.api_key is empty, OPENAI_API_KEY (sk-proj-…) is sent as the Bearer token to Gemini → 500 INTERNAL (Gemini can't parse a foreign key, and returns 500 rather than a clean 401, so it looks like a transient outage).

Net effect: a user who configures a dedicated Gemini vision backend with a paid AI-Studio key gets cryptic failures that look like Gemini-side problems but are actually local mis-routing.


Environment / config to reproduce

~/.hermes/config.yaml:

model:
  provider: openai-codex
  default: gpt-5.5
auxiliary:
  vision:
    provider: gemini
    model: gemini-2.5-flash          # Bug 1 triggers when this is empty
    base_url: https://generativelanguage.googleapis.com/v1beta
    api_key: ''                      # Bug 2 triggers when this is empty
agent:
  image_input_mode: auto

Env: OPENAI_API_KEY=sk-proj-… present (any unrelated OpenAI key), GOOGLE_API_KEY/GEMINI_API_KEY present and valid.

Send any image; image_input_mode: auto + an explicit auxiliary.vision.provider routes to text mode (agent/image_routing.py: decide_image_input_mode), which calls vision_analyzecall_llm(task="vision", …).


Bug 1 — main session model name leaks into the aux call

Trigger: auxiliary.vision.model empty/unset.

Path:

  • _resolve_task_provider_model("vision", …)resolved_model = model or cfg_model (agent/auxiliary_client.py:4627). Empty cfg_modelresolved_model = None.
  • resolve_provider_client() then hits its universal model fallback (agent/auxiliary_client.py:3345-3346):
    if not model:
        model = _get_aux_model_for_provider(provider) or _read_main_model() or model
    For Gemini there is no registered aux default, so step 3 (_read_main_model()) injects gpt-5.5. The "custom" branch has a second copy of this default (auxiliary_client.py:3514-3515): model or main_runtime.model or "gpt-4o-mini".

Actual: POST …/v1beta/chat/completions {"model":"gpt-5.5", …}404 - models/gpt-5.5 is not found for API version v1main.

Expected: the configured auxiliary.vision.model is used; if genuinely unset, fall back to the target provider's default (or raise a clear config error) — never the main session model, which belongs to a different provider.

Root cause: the "use my main model for side tasks too" fallback (intended for same-provider side tasks like title generation) is applied even when the call targets a different provider's explicit base_url.


Bug 2 — OPENAI_API_KEY is sent to a non-OpenAI base_url

Trigger: auxiliary.vision.api_key empty/unset (relying on env), with a non-OpenAI base_url.

Path: the explicit-base_url vision path resolves through the custom-endpoint branch (agent/auxiliary_client.py:3499-3507):

if provider == "custom":
    if explicit_base_url:
        custom_key = (
            (explicit_api_key or "").strip()
            or os.getenv("OPENAI_API_KEY", "").strip()
            or "no-key-required"
        )

With explicit_api_key=None, this yields OPENAI_API_KEY. (Confirmed by capturing the outbound request: Authorization: Bearer sk-proj-… to generativelanguage.googleapis.com.)

Actual: Gemini returns 500 - {"error":{"code":500,"status":"INTERNAL"}}.

Expected: key resolution should be provider-aware. For a generativelanguage.googleapis.com base_url (or provider: gemini), prefer GEMINI_API_KEY / GOOGLE_API_KEY before OPENAI_API_KEY. A wrong/foreign key should surface as a clear auth error, not a 500.


Why both happen: non-OpenAI base_url → generic "custom" branch

resolve_vision_provider_client() (auxiliary_client.py:4082) treats any explicit base_url as a custom endpoint, and resolve_provider_client funnels it into the provider == "custom" branch (logged as Auxiliary vision: using custom (…)). That branch is OpenAI-shaped end-to-end: it defaults the model to OpenAI/main values and the key to OPENAI_API_KEY. When the endpoint is actually Gemini, both defaults are wrong.


Suggested fixes

  1. Don't inject the main session model across providers. In resolve_provider_client (:3345-3346 and the custom branch :3514-3515), gate the _read_main_model() / main_runtime.model fallback so it only applies when the resolved endpoint belongs to the same provider as the main session. For a cross-provider aux base_url, use the target provider's default model or raise "auxiliary.<task>.model is required for this backend".

  2. Provider-aware key resolution for known hosts. When base_url host matches a known provider (e.g. generativelanguage.googleapis.com → Gemini), resolve the key from that provider's env vars (GEMINI_API_KEY/GOOGLE_API_KEY) before OPENAI_API_KEY, instead of unconditionally treating an explicit base_url as an OpenAI-compatible "custom" endpoint.

  3. Surface foreign-key failures clearly. A 500 from a downstream provider after a key/model substitution should be annotated ("sent main-model/OPENAI_API_KEY to <provider>") so it isn't mistaken for an upstream outage.


Related, lower priority

  • vision_analyze rejects video (tools/vision_tools.py): returns "Only real image files are supported for vision analysis." Gemini 2.5 Flash supports video, but there is no server-side frame-extraction (or Files-API upload) path, so video attachments can't be analyzed without an out-of-band workaround. Consider extracting a frame (or uploading via the provider's media API) before the vision call.

Workaround (until fixed)

Pin both fields explicitly in auxiliary.vision so neither fallback fires:

auxiliary:
  vision:
    provider: gemini
    model: gemini-2.5-flash
    base_url: https://generativelanguage.googleapis.com/v1beta
    api_key: AIza…            # inline AI-Studio key; do NOT leave empty

Verified: with both pinned, the outbound request carries {"model":"gemini-2.5-flash"} + Authorization: Bearer AIza… and returns 200 with a correct image description.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING