hermes - 💡(How to fix) Fix [Bug]: Azure Foundry ignores explicit model.api_mode and routes chat-completions deployments to Responses API

Code Example

model:
  provider: azure-foundry
  api_mode: chat_completions
  default: gpt-5.5

---

model:
  provider: azure-foundry
  api_mode: chat_completions
  default: gpt-5.5

model_aliases:
  grok-4-20-reasoning:
    model: grok-4-20-reasoning
    provider: azure-foundry
  kimi-k2.6:
    model: Kimi-K2.6
    provider: azure-foundry

---

hermes chat -Q --provider azure-foundry -m grok-4-20-reasoning -q 'Reply exactly: OK'
hermes chat -Q --provider azure-foundry -m Kimi-K2.6 -q 'Reply exactly: OK'

---

This model is not supported by Responses API.

---

if not model_cfg.get("api_mode"):
    effective_model = str(target_model or model_cfg.get("default") or "").strip()
    if effective_model and cfg_api_mode != "anthropic_messages":
        inferred = azure_foundry_model_api_mode(effective_model)
        if inferred:
            cfg_api_mode = inferred

---

model:
  provider: azure-foundry
  api_mode: chat_completions
  default: gpt-5.5

model_aliases:
  grok-4-20-reasoning:
    model: grok-4-20-reasoning
    provider: azure-foundry
  kimi-k2.6:
    model: Kimi-K2.6
    provider: azure-foundry

---

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["AZURE_FOUNDRY_API_KEY"],
    base_url=os.environ["AZURE_FOUNDRY_BASE_URL"].rstrip("/"),
)

for model in ["grok-4-20-reasoning", "Kimi-K2.6"]:
    r = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "Reply exactly: OK"}],
        max_completion_tokens=16,
    )
    print(model, r.choices[0].message.content)

---

hermes chat -Q --provider azure-foundry -m grok-4-20-reasoning -q 'Reply exactly: OK'
hermes chat -Q --provider azure-foundry -m Kimi-K2.6 -q 'Reply exactly: OK'

---

This model is not supported by Responses API.

---

hermes chat -Q --provider azure-foundry -m grok-4-20-reasoning -q 'Reply exactly: OK'
# OK

hermes chat -Q --provider azure-foundry -m Kimi-K2.6 -q 'Reply exactly: OK'
# OK

---

Error code: 400 - {'error': {'message': 'This model is not supported by Responses API.', 'type': 'invalid_request_error'}}

---

Report       https://paste.rs/Bsy8k
  agent.log    https://dpaste.com/FXZAMKVSG
  gateway.log  https://paste.rs/WjKVM

---

Bug Description

Summary

Azure Foundry runtime can route explicit chat-completions deployments through the Responses API even when config.yaml sets:

model:
  provider: azure-foundry
  api_mode: chat_completions
  default: gpt-5.5

This breaks Azure Foundry deployments that support /chat/completions but not /responses.

Environment

Hermes Agent v0.13.0 (2026.5.7) Commit: 524cbabd8 OpenAI SDK: 2.31.0 Provider: azure-foundry

Repro

Config:

model:
  provider: azure-foundry
  api_mode: chat_completions
  default: gpt-5.5

model_aliases:
  grok-4-20-reasoning:
    model: grok-4-20-reasoning
    provider: azure-foundry
  kimi-k2.6:
    model: Kimi-K2.6
    provider: azure-foundry

Run:

hermes chat -Q --provider azure-foundry -m grok-4-20-reasoning -q 'Reply exactly: OK'
hermes chat -Q --provider azure-foundry -m Kimi-K2.6 -q 'Reply exactly: OK'

Observed errors:

This model is not supported by Responses API.

Direct OpenAI SDK probe against the same Azure Foundry endpoint shows:

client.chat.completions.create(model="grok-4-20-reasoning", ...) succeeds
client.responses.create(model="grok-4-20-reasoning", ...) fails
client.chat.completions.create(model="Kimi-K2.6", ...) succeeds
client.responses.create(model="Kimi-K2.6", ...) fails

Expected

If model.api_mode: chat_completions is explicitly configured, Hermes should use chat completions for Azure Foundry unless the user explicitly chooses another API mode.

Suspected Cause

_resolve_azure_foundry_runtime() infers API mode from model family and can override the configured api_mode.

Suggested Fix

Only run Azure Foundry model-family API mode inference when model.api_mode is absent/unset. Explicit config should win.

Pseudo patch:

if not model_cfg.get("api_mode"):
    effective_model = str(target_model or model_cfg.get("default") or "").strip()
    if effective_model and cfg_api_mode != "anthropic_messages":
        inferred = azure_foundry_model_api_mode(effective_model)
        if inferred:
            cfg_api_mode = inferred

Steps to Reproduce

Configure Hermes with Azure Foundry and an explicit chat-completions API mode:

model:
  provider: azure-foundry
  api_mode: chat_completions
  default: gpt-5.5

model_aliases:
  grok-4-20-reasoning:
    model: grok-4-20-reasoning
    provider: azure-foundry
  kimi-k2.6:
    model: Kimi-K2.6
    provider: azure-foundry

Ensure AZURE_FOUNDRY_BASE_URL and AZURE_FOUNDRY_API_KEY point to an Azure Foundry OpenAI-compatible endpoint.
Verify the deployments support Chat Completions directly:

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["AZURE_FOUNDRY_API_KEY"],
    base_url=os.environ["AZURE_FOUNDRY_BASE_URL"].rstrip("/"),
)

for model in ["grok-4-20-reasoning", "Kimi-K2.6"]:
    r = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": "Reply exactly: OK"}],
        max_completion_tokens=16,
    )
    print(model, r.choices[0].message.content)

Run the same deployments through Hermes:

hermes chat -Q --provider azure-foundry -m grok-4-20-reasoning -q 'Reply exactly: OK'
hermes chat -Q --provider azure-foundry -m Kimi-K2.6 -q 'Reply exactly: OK'

Observe Hermes failing with:

This model is not supported by Responses API.

Expected Behavior

When model.api_mode: chat_completions is explicitly configured for provider: azure-foundry, Hermes should use the Chat Completions transport for Azure Foundry calls.

The model-family inference that upgrades Azure Foundry models to the Responses API should not override an explicit user-provided model.api_mode.

Expected results:

hermes chat -Q --provider azure-foundry -m grok-4-20-reasoning -q 'Reply exactly: OK'
# OK

hermes chat -Q --provider azure-foundry -m Kimi-K2.6 -q 'Reply exactly: OK'
# OK

This should match the direct OpenAI SDK behavior against the same Azure Foundry endpoint, where client.chat.completions.create(...) succeeds for both deployments.

Actual Behavior

Hermes routes these Azure Foundry deployments through the Responses API despite model.api_mode: chat_completions.

Observed errors:

Error code: 400 - {'error': {'message': 'This model is not supported by Responses API.', 'type': 'invalid_request_error'}}

Direct SDK calls to /chat/completions succeed for the same deployments, while direct SDK calls to /responses fail. That suggests Hermes is selecting the wrong transport, not that the deployments are broken.

Affected Component

Setup / Installation

Messaging Platform (if gateway-related)

No response

Debug Report

Report       https://paste.rs/Bsy8k
  agent.log    https://dpaste.com/FXZAMKVSG
  gateway.log  https://paste.rs/WjKVM

Operating System

Macos Sonoma

Python Version

No response

Hermes Version

No response

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

hermes - 💡(How to fix) Fix [Bug]: Azure Foundry ignores explicit model.api_mode and routes chat-completions deployments to Responses API

Recommended Tools

GitHub issue graph ai analysis

Error Message

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fix / Workaround

Code Example

Bug Description

Summary

Environment

Repro

Expected

Suspected Cause

Suggested Fix

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Still need to ship something?

RELATED_DISCOVERY

TRENDING