hermes - 💡(How to fix) Fix feat: fallback models for auxiliary/web_extract [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17067Fetched 2026-04-29 06:37:26
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×4

Error Message

The auxiliary config section supports only a single model per task (vision, web_extract, compression, etc.). If that model fails (rate limit, timeout, API error), there's no automatic fallback — the task returns an error or falls back to raw content (first 5K chars) without summarization.

Fix Action

Fix / Workaround

  • Adding complexity to the auxiliary dispatch path
  • Must define what counts as a 'failure' worth falling back on (timeout, rate limit, any non-2xx)
  • Fallback should not retry the fetch layer (Firecrawl/etc.) — only the LLM summarization phase
  • Could chain multiple fallbacks (primary → secondary → tertiary) if needed

Code Example

auxiliary:
  web_extract:
    provider: custom
    model: gemini-3.1-flash-lite-preview
    base_url: ...
    api_key: ...
    timeout: 120
    fallback:
      provider: custom
      model: gemma-4-26b-a4b-it
      base_url: ...
      api_key: ...
      timeout: 120
RAW_BUFFERClick to expand / collapse

Feature Request: Fallback Models for Auxiliary Tasks

Repo: NousResearch/hermes-agent

Title

feat: fallback models for auxiliary/web_extract

Body

Problem

The auxiliary config section supports only a single model per task (vision, web_extract, compression, etc.). If that model fails (rate limit, timeout, API error), there's no automatic fallback — the task returns an error or falls back to raw content (first 5K chars) without summarization.

The main conversation loop already has fallback_providers: [] for model-level fallback, but auxiliary tasks lack this entirely.

Proposed Solution

Add an optional fallback or backup field to each auxiliary task config, with the same structure as the primary config:

auxiliary:
  web_extract:
    provider: custom
    model: gemini-3.1-flash-lite-preview
    base_url: ...
    api_key: ...
    timeout: 120
    fallback:
      provider: custom
      model: gemma-4-26b-a4b-it
      base_url: ...
      api_key: ...
      timeout: 120

On failure (non-2xx, timeout, empty response), the system would automatically retry summarization with the fallback model before giving up.

Use Case

Users who pair a fast model (Gemini Flash Lite — ~19s/500 RPD) with a more resilient backup (Gemma 4 — ~33s/1.5K RPD) can get both speed and reliability without manual intervention.

Trade-offs Considered

  • Adding complexity to the auxiliary dispatch path
  • Must define what counts as a 'failure' worth falling back on (timeout, rate limit, any non-2xx)
  • Fallback should not retry the fetch layer (Firecrawl/etc.) — only the LLM summarization phase
  • Could chain multiple fallbacks (primary → secondary → tertiary) if needed

extent analysis

TL;DR

Add an optional fallback field to each auxiliary task config to enable automatic model fallback on failure.

Guidance

  • Define the conditions for fallback, such as non-2xx status codes, timeouts, or empty responses, to determine when to switch to the backup model.
  • Implement a retry mechanism for the LLM summarization phase using the fallback model, without retrying the fetch layer.
  • Consider the trade-offs of adding complexity to the auxiliary dispatch path and the potential need to chain multiple fallbacks.
  • Evaluate the performance impact of using a slower but more resilient backup model, such as Gemma 4, as a fallback.

Example

auxiliary:
  web_extract:
    provider: custom
    model: gemini-3.1-flash-lite-preview
    base_url: ...
    api_key: ...
    timeout: 120
    fallback:
      provider: custom
      model: gemma-4-26b-a4b-it
      base_url: ...
      api_key: ...
      timeout: 120

Notes

The proposed solution assumes that the fallback model has the same structure as the primary config, which may require additional validation or error handling.

Recommendation

Apply workaround by adding the fallback field to the auxiliary task config, as it provides a clear and targeted solution to the problem of model failure without requiring significant changes to the existing architecture.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING