hermes - 💡(How to fix) Fix [Bug]: Azure Foundry vision with api_mode: responses can route through chat/custom path and fail with 401

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Azure Foundry auxiliary vision can fail with a misleading 401 error after configuring a Responses-capable Azure Foundry model using the user-facing setting: openai.AuthenticationError: Error code: 401 - {'error': {'code': '401', 'message': 'Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource.'}} openai.AuthenticationError: Error code: 401 - {'error': {'code': '401', 'message': 'Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource.'}} The resulting 401 error points users toward key rotation or endpoint changes, but the actual issue is Hermes routing through the wrong client/mode path. openai.AuthenticationError: Error code: 401 - {'error': {'code': '401', 'message': 'Access denied due to invalid subscription key or wrong API endpoint...'}} 7. Azure Foundry returns a misleading 401 error:

Additional Logs / Traceback (optional)

Root Cause

Root cause hypothesis

Fix Action

Fix / Workaround

A local patch was applied to agent/auxiliary_client.py and regression coverage was added to:

This also makes local fixes fragile because they require patching Hermes core files, which can be overwritten on update. It would be much better fixed upstream so future Hermes updates preserve the behavior.

Code Example

N/A

---
RAW_BUFFERClick to expand / collapse

Bug Description

Summary

Azure Foundry auxiliary vision can fail with a misleading 401 error after configuring a Responses-capable Azure Foundry model using the user-facing setting:

api_mode: responses

Observed failure:

openai.AuthenticationError: Error code: 401 - {'error': {'code': '401', 'message': 'Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource.'}}

This appears to be a Hermes auxiliary/vision routing issue rather than a bad Azure key or endpoint.

In my environment, direct Azure Foundry /responses calls using the configured endpoint/key worked, but Hermes vision_analyze routed through the wrong client path and failed.

Environment

- Hermes Agent repo path: /home/dan/.hermes/hermes-agent
- Hermes runtime Python: /home/dan/.hermes/hermes-agent/venv/
- Provider: Azure Foundry / OpenAI-compatible endpoint
- Affected tool: vision_analyze
- Relevant code paths:
  - tools/vision_tools.py
  - agent/auxiliary_client.py
  - async_call_llm(task="vision", ...)
  - resolve_vision_provider_client(...)
  - _try_azure_foundry(...)

Symptom

vision_analyze fails with repeated 401 errors from OpenAI/Azure:

python
File "/home/dan/.hermes/hermes-agent/tools/vision_tools.py", line 961, in vision_analyze_tool
    response = await async_call_llm(**call_kwargs)

File "/home/dan/.hermes/hermes-agent/agent/auxiliary_client.py", line 5555, in async_call_llm
    await client.chat.completions.create(**kwargs), task)

openai.AuthenticationError: Error code: 401 - {'error': {'code': '401', 'message': 'Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource.'}}


The important clue is that the failing stack shows:

python
client.chat.completions.create(...)


For Azure Foundry GPT-5.x / Responses-capable deployments configured with api_mode: responses, this should be routed through the Responses adapter, not plain chat completions.

Expected behavior

When Azure Foundry is configured with:

yaml
provider: azure-foundry
api_mode: responses
base_url: https://<resource>.services.ai.azure.com/openai/v1
default: <responses-capable-model-or-deployment>


Hermes auxiliary vision should:

1. Preserve the first-class azure-foundry provider identity.
2. Use the Azure Foundry credential and mode resolver.
3. Wrap the client with the Responses/Codex auxiliary adapter.
4. Route GPT-5.x / Responses-capable vision calls through /responses.
5. Avoid treating config-derived Azure base_url as an explicit generic/custom endpoint override.

Actual behavior

The vision path can resolve the configured Azure Foundry provider/model/base_url, then pass the resolved base_url back into resolve_vision_provider_client(...) as if it were an explicit override.

That appears to push the call down the generic/custom endpoint path, bypassing Azure Foundry-specific handling.

Separately, _try_azure_foundry(...) wrapped the client in CodexAuxiliaryClient for:

python
api_mode == "codex_responses"


but not for the user-facing setting:

python
api_mode == "responses"


So api_mode: responses could return a plain OpenAI client and later call:

python
client.chat.completions.create(...)


instead of translating the request to /responses.

Root cause hypothesis

There seem to be two core routing bugs:

1. api_mode: responses is not handled the same as codex_responses

In _try_azure_foundry(...), Azure Foundry clients should be wrapped in CodexAuxiliaryClient when runtime API mode is either:

python
{"codex_responses", "responses"}


not only when it is exactly:

python
"codex_responses"


The user-facing config spelling responses should map to the same Responses adapter behavior.

2. Vision passes config-derived Azure base_url as an explicit override

In call_llm(...) and async_call_llm(...), the task == "vision" branch passes:

python
base_url=resolved_base_url or base_url


into resolve_vision_provider_client(...).

For Azure Foundry, resolved_base_url is a first-class provider configuration value, not a user-supplied generic/custom override. Passing it back as an explicit override can make the resolver treat the request as custom and bypass Azure Foundry-specific credential/mode logic.

A safer pattern is:

python
vision_base_url = resolved_base_url if resolved_provider == "custom" else base_url


then pass:

python
base_url=vision_base_url


This keeps Azure Foundry as Azure Foundry instead of accidentally downgrading it into a generic OpenAI-compatible endpoint.

Additional compatibility issue

Azure Foundry multimodal Responses streaming appears unreliable for input_image payloads in some GPT-5.x deployments.

In testing, non-streaming /responses calls with input_image worked, while streaming could produce "I can't view the image" style answers.

For bounded auxiliary vision calls, Hermes may want to prefer non-streaming Responses calls when the Responses input contains an input_image block.

Suggested helper:

python
def _responses_input_contains_image(input_messages: Any) -> bool:
    if not isinstance(input_messages, list):
        return False
    for msg in input_messages:
        if not isinstance(msg, dict):
            continue
        content = msg.get("content")
        if not isinstance(content, list):
            continue
        for part in content:
            if isinstance(part, dict) and part.get("type") == "input_image":
                return True
    return False


Then in the Responses/Codex adapter:

python
if _responses_input_contains_image(resp_kwargs.get("input")):
    final = self._client.responses.create(**resp_kwargs)
else:
    # existing streaming path


Local validation

A local patch was applied to agent/auxiliary_client.py and regression coverage was added to:

tests/agent/test_auxiliary_client_azure_foundry.py

Focused test run passed:

bash
pytest tests/agent/test_auxiliary_client_azure_foundry.py \
       tests/agent/test_auxiliary_client.py::TestBuildCallKwargsMaxTokens \
       tests/agent/test_auxiliary_transport_autodetect.py -q


Result:

text
40 passed


Runtime validation also succeeded by executing an actual:

python
async_call_llm(task="vision", ...)


against Azure Foundry with a generated PNG data URI and receiving a real vision response.

Suggested fix

In _try_azure_foundry(...)

Change the wrapping condition from:

python
if runtime_api_mode == "codex_responses":
    return CodexAuxiliaryClient(client, final_model), final_model


to:

python
if runtime_api_mode in {"codex_responses", "responses"}:
    return CodexAuxiliaryClient(client, final_model), final_model


In call_llm(...) and async_call_llm(...) vision branches

Avoid feeding config-derived Azure resolved_base_url back into the explicit override path.

Use something like:

python
vision_base_url = resolved_base_url if resolved_provider == "custom" else base_url

effective_provider, client, final_model = resolve_vision_provider_client(
    provider=resolved_provider if resolved_provider != "auto" else provider,
    model=resolved_model or model,
    base_url=vision_base_url,
    api_key=resolved_api_key or api_key,
    async_mode=False,  # or True in async_call_llm
)


Optional multimodal Responses hardening

For Responses input containing input_image, prefer non-streaming:

python
final = self._client.responses.create(**resp_kwargs)


instead of the streaming path.

Why this matters

Without this fix, Azure Foundry vision can break after moving auxiliary/vision models to GPT-5.x or other Responses-capable deployments, even when the Azure key and endpoint are valid.

The resulting 401 error points users toward key rotation or endpoint changes, but the actual issue is Hermes routing through the wrong client/mode path.

This also makes local fixes fragile because they require patching Hermes core files, which can be overwritten on update. It would be much better fixed upstream so future Hermes updates preserve the behavior.

Steps to Reproduce

Steps to reproduce:


1. Configure Hermes auxiliary vision to use Azure Foundry with a Responses-capable model/deployment.

Example shape:

auxiliary:
  vision:
    provider: azure-foundry
    api_mode: responses
    base_url: https://<azure-foundry-resource>.services.ai.azure.com/openai/v1
    model: <responses-capable-model-or-deployment>

2. Ensure the Azure Foundry key and endpoint are valid.

Direct non-Hermes /responses calls to the same Azure Foundry endpoint/key should succeed.

3. Restart Hermes / Hermes gateway so the config is active.

4. Trigger the vision_analyze tool with any valid image, for example a small PNG, screenshot, or image URL.

5. Check Hermes logs.

The failure appears in the vision path:

tools/vision_tools.py -> async_call_llm(task="vision") -> client.chat.completions.create(...)

Example stack:

File ".../tools/vision_tools.py", line 961, in vision_analyze_tool
    response = await async_call_llm(**call_kwargs)

File ".../agent/auxiliary_client.py", line 5555, in async_call_llm
    await client.chat.completions.create(**kwargs), task)

openai.AuthenticationError: Error code: 401 - {'error': {'code': '401', 'message': 'Access denied due to invalid subscription key or wrong API endpoint...'}}

Expected Behavior

When Azure Foundry is configured with provider: azure-foundry and api_mode: responses, Hermes should preserve the Azure Foundry provider identity and route auxiliary vision through the Responses adapter.

Specifically, Hermes should:

1. Treat api_mode: responses as a Responses API mode, equivalent to the internal/legacy codex_responses path where appropriate.

2. Wrap the Azure Foundry client in the Responses/Codex auxiliary adapter instead of returning a plain chat-completions client.

3. Route vision requests for Responses-capable Azure Foundry deployments through /responses, not /chat/completions.

4. Use Azure Foundry-specific credential and endpoint resolution rather than falling back to generic/custom OpenAI-compatible handling.

5. Avoid treating a config-derived Azure Foundry base_url as an explicit custom endpoint override.

6. Successfully return a vision analysis result when the same Azure Foundry endpoint/key/model works with direct /responses calls.

Actual Behavior

Hermes can route Azure Foundry auxiliary vision through the wrong client path.

Observed behaviour:

1. vision_analyze calls async_call_llm(task="vision").

2. The resolved Azure Foundry base_url can be passed back into resolve_vision_provider_client(...) as if it were an explicit custom endpoint override.

3. That can bypass Azure Foundry-specific provider handling and make the call behave like a generic/custom OpenAI-compatible endpoint.

4. Separately, _try_azure_foundry(...) wraps the client in the Responses adapter for api_mode: codex_responses, but not for the user-facing setting api_mode: responses.

5. As a result, Hermes can return a plain OpenAI chat-completions client.

6. The request then goes through:

   client.chat.completions.create(...)

   instead of the Responses adapter.

7. Azure Foundry returns a misleading 401 error:

   "Access denied due to invalid subscription key or wrong API endpoint"

8. The key and endpoint may actually be valid; direct /responses calls using the same Azure Foundry resource can succeed.

This makes the issue look like a bad Azure key or wrong regional endpoint, but the root cause appears to be Hermes routing/mode resolution.

Affected Component

Tools (terminal, file ops, web, code execution, etc.)

Messaging Platform (if gateway-related)

Telegram

Debug Report

N/A

Operating System

Ubuntu 26.04

Python Version

No response

Hermes Version

0.15.1

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING