hermes - 💡(How to fix) Fix feat: General per-task fallback chains for ALL auxiliary models (vision, compression, STT, summarization, etc.)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Extend the auxiliary model fallback chain mechanism — currently only wired for auxiliary.vision — to every auxiliary task: vision, compression, summarization, STT, session search, web extraction, and any future tasks.

This addresses users behind restrictive networks (Great Firewall of China, VPN-only access, corporate proxies) who need per-model fallback chains for all auxiliary operations, not just vision.

Root Cause

When Gemini returns 503 (UNAVAILABLE) or quota exceeded on a compression call, the call fails with no fallback because the fallback chain only exists for task == "vision".

Code Example

# agent/auxiliary_client.py, call_llm()
if task == "vision":
    effective_provider, client, final_model = resolve_vision_provider_client(...)
    if client is None and resolved_provider != "auto" and not resolved_base_url:
        # vision-only fallback logic
        ...

---

auxiliary:
  vision:
    provider: gemini
    model: gemini-2.5-flash
    fallback_provider: Xiaomi-TP
    fallback_model: mimo-v2-omni
    local_fallback_provider: ollama-launch
    local_fallback_model: llama3.2-vision:11b
  compression:
    provider: openrouter
    model: google/gemini-3-flash-preview
    fallback_provider: Xiaomi-TP
    fallback_model: mimo-v2.5-pro
    local_fallback_provider: ollama-launch
    local_fallback_model: llama3.3:70b
  stt:
    provider: deepgram
    fallback_provider: whisper-local
    fallback_model: large-v3

---

auxiliary:
  <task_name>:          # vision, compression, summarization, stt, session_search, web_extract, ...
    provider: ...
    model: ...
    base_url: ...
    api_key: ...
    api_mode: ...
    fallback_provider: ...
    fallback_model: ...
    fallback_base_url: ...
    local_fallback_provider: ...
    local_fallback_model: ...
    local_fallback_base_url: ...
    fallback_triggers:   # optional: which errors trigger fallback (default: all)
      - connection_error
      - rate_limit       # 429
      - server_error     # 5xx
      - quota_exceeded   # 402/403 quota
      - auth_error       # 401

---

def _resolve_auxiliary_client_with_fallback(task, provider, model, ...):
    """Resolve an auxiliary client, walking the fallback chain on failure."""
    # 1. Read task config + fallback config
    # 2. Try primary provider
    # 3. On failure -> check fallback_triggers
    # 4. Try fallback_provider
    # 5. On failure -> try local_fallback_provider
    # 6. Raise if all exhausted

---

# New generic fallback config reader
def _get_task_fallback_config(task: str) -> dict:
    """Read fallback_{provider,model,base_url} and local_fallback_{provider,model,base_url}."""
    task_config = _get_auxiliary_task_config(task)
    return {
        "fallback_provider": task_config.get("fallback_provider"),
        "fallback_model": task_config.get("fallback_model"),
        "fallback_base_url": task_config.get("fallback_base_url"),
        "local_fallback_provider": task_config.get("local_fallback_provider"),
        "local_fallback_model": task_config.get("local_fallback_model"),
        "local_fallback_base_url": task_config.get("local_fallback_base_url"),
        "fallback_triggers": task_config.get("fallback_triggers",
            ["connection_error", "rate_limit", "server_error", "quota_exceeded"]),
    }
RAW_BUFFERClick to expand / collapse

Summary

Extend the auxiliary model fallback chain mechanism — currently only wired for auxiliary.vision — to every auxiliary task: vision, compression, summarization, STT, session search, web extraction, and any future tasks.

This addresses users behind restrictive networks (Great Firewall of China, VPN-only access, corporate proxies) who need per-model fallback chains for all auxiliary operations, not just vision.

Problem

Current State

PR #25878 introduced a vision-specific fallback chain (fallback_provider/fallback_model/local_fallback_provider/local_fallback_model under auxiliary.vision). The detection + fallback logic is hardcoded inside call_llm()/async_call_llm() with if task == "vision": branches:

# agent/auxiliary_client.py, call_llm()
if task == "vision":
    effective_provider, client, final_model = resolve_vision_provider_client(...)
    if client is None and resolved_provider != "auto" and not resolved_base_url:
        # vision-only fallback logic
        ...

This means:

  • compression tasks never get a fallback — if the primary provider is blocked by the GFW or rate-limited, compression silently fails
  • stt (speech-to-text) tasks have no fallback chain at all
  • summarization falls through the same gap
  • Any future auxiliary task must re-implement fallback from scratch

Example Use Case

A China-based user configures:

auxiliary:
  vision:
    provider: gemini
    model: gemini-2.5-flash
    fallback_provider: Xiaomi-TP
    fallback_model: mimo-v2-omni
    local_fallback_provider: ollama-launch
    local_fallback_model: llama3.2-vision:11b
  compression:
    provider: openrouter
    model: google/gemini-3-flash-preview
    fallback_provider: Xiaomi-TP
    fallback_model: mimo-v2.5-pro
    local_fallback_provider: ollama-launch
    local_fallback_model: llama3.3:70b
  stt:
    provider: deepgram
    fallback_provider: whisper-local
    fallback_model: large-v3

When Gemini returns 503 (UNAVAILABLE) or quota exceeded on a compression call, the call fails with no fallback because the fallback chain only exists for task == "vision".

Proposed Solution

1. Extend the config schema generically

Every task under auxiliary: should accept the same fallback keys:

auxiliary:
  <task_name>:          # vision, compression, summarization, stt, session_search, web_extract, ...
    provider: ...
    model: ...
    base_url: ...
    api_key: ...
    api_mode: ...
    fallback_provider: ...
    fallback_model: ...
    fallback_base_url: ...
    local_fallback_provider: ...
    local_fallback_model: ...
    local_fallback_base_url: ...
    fallback_triggers:   # optional: which errors trigger fallback (default: all)
      - connection_error
      - rate_limit       # 429
      - server_error     # 5xx
      - quota_exceeded   # 402/403 quota
      - auth_error       # 401

2. Generic fallback resolution function

Instead of if task == "vision": special-casing in call_llm(), introduce a single _resolve_auxiliary_client_with_fallback() function:

def _resolve_auxiliary_client_with_fallback(task, provider, model, ...):
    """Resolve an auxiliary client, walking the fallback chain on failure."""
    # 1. Read task config + fallback config
    # 2. Try primary provider
    # 3. On failure -> check fallback_triggers
    # 4. Try fallback_provider
    # 5. On failure -> try local_fallback_provider
    # 6. Raise if all exhausted

Both call_llm() and async_call_llm() call this instead of the current per-task branch.

3. Configurable fallback triggers

Different tasks have different reliability needs. Define a common set of trigger conditions:

TriggerDetectionExample
connection_errorDNS/timeout/refusedGFW blocks OpenRouter
rate_limitHTTP 429Gemini free tier exhausted
server_errorHTTP 5xxUpstream 503 overload
quota_exceeded"quota exceeded" textGemini daily quota hit
auth_errorHTTP 401/403Expired API key
client_creation_failureclient is NoneUnconfigured provider

The _is_server_error() function already proposed in #25822 would be a prerequisite.

4. Cache awareness

The fallback chain should also interact with the client cache. When a fallback is triggered, the unhealthy provider should be marked so subsequent auxiliary calls don't retry the dead provider first — similar to the existing _mark_provider_unhealthy() pattern used for payment errors.

Prior Art

  • PR #25878 — Vision-only fallback chain (current implementation)
  • Issue #25594 — Vision capability detection broken for custom providers
  • Issue #25602 — Dashboard UI for auxiliary model fallback chains
  • Issue #25822 — Vision fallback fails to trigger on Gemini 503 server errors
  • _is_payment_error() / _try_payment_fallback() — existing generic fallback for payment errors (already task-agnostic, but only triggered when is_auto=True)

Implementation Sketch

# New generic fallback config reader
def _get_task_fallback_config(task: str) -> dict:
    """Read fallback_{provider,model,base_url} and local_fallback_{provider,model,base_url}."""
    task_config = _get_auxiliary_task_config(task)
    return {
        "fallback_provider": task_config.get("fallback_provider"),
        "fallback_model": task_config.get("fallback_model"),
        "fallback_base_url": task_config.get("fallback_base_url"),
        "local_fallback_provider": task_config.get("local_fallback_provider"),
        "local_fallback_model": task_config.get("local_fallback_model"),
        "local_fallback_base_url": task_config.get("local_fallback_base_url"),
        "fallback_triggers": task_config.get("fallback_triggers",
            ["connection_error", "rate_limit", "server_error", "quota_exceeded"]),
    }

Acceptance Criteria

  1. auxiliary.compression.fallback_provider is actually read and used when compression fails
  2. auxiliary.stt.fallback_provider works when the primary STT provider is unreachable
  3. auxiliary.summarization.local_fallback_provider works for local-only offline operation
  4. The existing auxiliary.vision fallback chain continues to work (backward compatibility)
  5. fallback_triggers allows per-task customization of which errors cause fallback
  6. Server errors (503 from Gemini) trigger fallback for all tasks, not just vision (#25822)
  7. All existing tests in tests/agent/test_auxiliary_client.py still pass

Related

This is a foundational piece for making Hermes Agent usable in GFW-restricted environments, VPN-only setups, and any network where different providers have varying availability. The China-based and enterprise user communities are particularly affected.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix feat: General per-task fallback chains for ALL auxiliary models (vision, compression, STT, summarization, etc.)