hermes - 💡(How to fix) Fix feat: General per-task fallback chains for ALL auxiliary models (vision, compression, STT, summarization, etc.)

hermes2026-05-17 06:02:26

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Extend the auxiliary model fallback chain mechanism — currently only wired for auxiliary.vision — to every auxiliary task: vision, compression, summarization, STT, session search, web extraction, and any future tasks.

This addresses users behind restrictive networks (Great Firewall of China, VPN-only access, corporate proxies) who need per-model fallback chains for all auxiliary operations, not just vision.

Root Cause

When Gemini returns 503 (UNAVAILABLE) or quota exceeded on a compression call, the call fails with no fallback because the fallback chain only exists for task == "vision".

Code Example

# agent/auxiliary_client.py, call_llm()
if task == "vision":
    effective_provider, client, final_model = resolve_vision_provider_client(...)
    if client is None and resolved_provider != "auto" and not resolved_base_url:
        # vision-only fallback logic
        ...

---

auxiliary:
  vision:
    provider: gemini
    model: gemini-2.5-flash
    fallback_provider: Xiaomi-TP
    fallback_model: mimo-v2-omni
    local_fallback_provider: ollama-launch
    local_fallback_model: llama3.2-vision:11b
  compression:
    provider: openrouter
    model: google/gemini-3-flash-preview
    fallback_provider: Xiaomi-TP
    fallback_model: mimo-v2.5-pro
    local_fallback_provider: ollama-launch
    local_fallback_model: llama3.3:70b
  stt:
    provider: deepgram
    fallback_provider: whisper-local
    fallback_model: large-v3

---

auxiliary:
  <task_name>:          # vision, compression, summarization, stt, session_search, web_extract, ...
    provider: ...
    model: ...
    base_url: ...
    api_key: ...
    api_mode: ...
    fallback_provider: ...
    fallback_model: ...
    fallback_base_url: ...
    local_fallback_provider: ...
    local_fallback_model: ...
    local_fallback_base_url: ...
    fallback_triggers:   # optional: which errors trigger fallback (default: all)
      - connection_error
      - rate_limit       # 429
      - server_error     # 5xx
      - quota_exceeded   # 402/403 quota
      - auth_error       # 401

---

def _resolve_auxiliary_client_with_fallback(task, provider, model, ...):
    """Resolve an auxiliary client, walking the fallback chain on failure."""
    # 1. Read task config + fallback config
    # 2. Try primary provider
    # 3. On failure -> check fallback_triggers
    # 4. Try fallback_provider
    # 5. On failure -> try local_fallback_provider
    # 6. Raise if all exhausted

---

# New generic fallback config reader
def _get_task_fallback_config(task: str) -> dict:
    """Read fallback_{provider,model,base_url} and local_fallback_{provider,model,base_url}."""
    task_config = _get_auxiliary_task_config(task)
    return {
        "fallback_provider": task_config.get("fallback_provider"),
        "fallback_model": task_config.get("fallback_model"),
        "fallback_base_url": task_config.get("fallback_base_url"),
        "local_fallback_provider": task_config.get("local_fallback_provider"),
        "local_fallback_model": task_config.get("local_fallback_model"),
        "local_fallback_base_url": task_config.get("local_fallback_base_url"),
        "fallback_triggers": task_config.get("fallback_triggers",
            ["connection_error", "rate_limit", "server_error", "quota_exceeded"]),
    }

RAW_BUFFERClick to expand / collapse

Summary

This addresses users behind restrictive networks (Great Firewall of China, VPN-only access, corporate proxies) who need per-model fallback chains for all auxiliary operations, not just vision.

Problem

Current State

PR #25878 introduced a vision-specific fallback chain (fallback_provider/fallback_model/local_fallback_provider/local_fallback_model under auxiliary.vision). The detection + fallback logic is hardcoded inside call_llm()/async_call_llm() with if task == "vision": branches:

# agent/auxiliary_client.py, call_llm()
if task == "vision":
    effective_provider, client, final_model = resolve_vision_provider_client(...)
    if client is None and resolved_provider != "auto" and not resolved_base_url:
        # vision-only fallback logic
        ...

This means:

compression tasks never get a fallback — if the primary provider is blocked by the GFW or rate-limited, compression silently fails
stt (speech-to-text) tasks have no fallback chain at all
summarization falls through the same gap
Any future auxiliary task must re-implement fallback from scratch

Example Use Case

A China-based user configures:

auxiliary:
  vision:
    provider: gemini
    model: gemini-2.5-flash
    fallback_provider: Xiaomi-TP
    fallback_model: mimo-v2-omni
    local_fallback_provider: ollama-launch
    local_fallback_model: llama3.2-vision:11b
  compression:
    provider: openrouter
    model: google/gemini-3-flash-preview
    fallback_provider: Xiaomi-TP
    fallback_model: mimo-v2.5-pro
    local_fallback_provider: ollama-launch
    local_fallback_model: llama3.3:70b
  stt:
    provider: deepgram
    fallback_provider: whisper-local
    fallback_model: large-v3

When Gemini returns 503 (UNAVAILABLE) or quota exceeded on a compression call, the call fails with no fallback because the fallback chain only exists for task == "vision".

Proposed Solution

1. Extend the config schema generically

Every task under auxiliary: should accept the same fallback keys:

auxiliary:
  <task_name>:          # vision, compression, summarization, stt, session_search, web_extract, ...
    provider: ...
    model: ...
    base_url: ...
    api_key: ...
    api_mode: ...
    fallback_provider: ...
    fallback_model: ...
    fallback_base_url: ...
    local_fallback_provider: ...
    local_fallback_model: ...
    local_fallback_base_url: ...
    fallback_triggers:   # optional: which errors trigger fallback (default: all)
      - connection_error
      - rate_limit       # 429
      - server_error     # 5xx
      - quota_exceeded   # 402/403 quota
      - auth_error       # 401

2. Generic fallback resolution function

Instead of if task == "vision": special-casing in call_llm(), introduce a single _resolve_auxiliary_client_with_fallback() function:

def _resolve_auxiliary_client_with_fallback(task, provider, model, ...):
    """Resolve an auxiliary client, walking the fallback chain on failure."""
    # 1. Read task config + fallback config
    # 2. Try primary provider
    # 3. On failure -> check fallback_triggers
    # 4. Try fallback_provider
    # 5. On failure -> try local_fallback_provider
    # 6. Raise if all exhausted

Both call_llm() and async_call_llm() call this instead of the current per-task branch.

3. Configurable fallback triggers

Different tasks have different reliability needs. Define a common set of trigger conditions:

Trigger	Detection	Example
connection_error	DNS/timeout/refused	GFW blocks OpenRouter
rate_limit	HTTP 429	Gemini free tier exhausted
server_error	HTTP 5xx	Upstream 503 overload
quota_exceeded	"quota exceeded" text	Gemini daily quota hit
auth_error	HTTP 401/403	Expired API key
client_creation_failure	client is None	Unconfigured provider

The _is_server_error() function already proposed in #25822 would be a prerequisite.

4. Cache awareness

The fallback chain should also interact with the client cache. When a fallback is triggered, the unhealthy provider should be marked so subsequent auxiliary calls don't retry the dead provider first — similar to the existing _mark_provider_unhealthy() pattern used for payment errors.

Prior Art

PR #25878 — Vision-only fallback chain (current implementation)
Issue #25594 — Vision capability detection broken for custom providers
Issue #25602 — Dashboard UI for auxiliary model fallback chains
Issue #25822 — Vision fallback fails to trigger on Gemini 503 server errors
_is_payment_error() / _try_payment_fallback() — existing generic fallback for payment errors (already task-agnostic, but only triggered when is_auto=True)

Implementation Sketch

# New generic fallback config reader
def _get_task_fallback_config(task: str) -> dict:
    """Read fallback_{provider,model,base_url} and local_fallback_{provider,model,base_url}."""
    task_config = _get_auxiliary_task_config(task)
    return {
        "fallback_provider": task_config.get("fallback_provider"),
        "fallback_model": task_config.get("fallback_model"),
        "fallback_base_url": task_config.get("fallback_base_url"),
        "local_fallback_provider": task_config.get("local_fallback_provider"),
        "local_fallback_model": task_config.get("local_fallback_model"),
        "local_fallback_base_url": task_config.get("local_fallback_base_url"),
        "fallback_triggers": task_config.get("fallback_triggers",
            ["connection_error", "rate_limit", "server_error", "quota_exceeded"]),
    }

Acceptance Criteria

auxiliary.compression.fallback_provider is actually read and used when compression fails
auxiliary.stt.fallback_provider works when the primary STT provider is unreachable
auxiliary.summarization.local_fallback_provider works for local-only offline operation
The existing auxiliary.vision fallback chain continues to work (backward compatibility)
fallback_triggers allows per-task customization of which errors cause fallback
Server errors (503 from Gemini) trigger fallback for all tasks, not just vision (#25822)
All existing tests in tests/agent/test_auxiliary_client.py still pass

This is a foundational piece for making Hermes Agent usable in GFW-restricted environments, VPN-only setups, and any network where different providers have varying availability. The China-based and enterprise user communities are particularly affected.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix feat: General per-task fallback chains for ALL auxiliary models (vision, compression, STT, summarization, etc.)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Problem

Current State

Example Use Case

Proposed Solution

1. Extend the config schema generically

2. Generic fallback resolution function

3. Configurable fallback triggers

4. Cache awareness

Prior Art

Implementation Sketch

Acceptance Criteria

Related

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix feat: General per-task fallback chains for ALL auxiliary models (vision, compression, STT, summarization, etc.)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Problem

Current State

Example Use Case

Proposed Solution

1. Extend the config schema generically

2. Generic fallback resolution function

3. Configurable fallback triggers

4. Cache awareness

Prior Art

Implementation Sketch

Acceptance Criteria

Related

Still need to ship something?

RELATED_DISCOVERY

TRENDING