hermes - ✅(Solved) Fix Multi-level fallback chain aborts on 401 from first fallback instead of cascading to second fallback [2 pull requests, 2 comments, 3 participants]

mpb250111-a11y · 2026-04-29T14:29:02Z

[hermes] PR 17518: fix agent : cascade nested fallback model chains - Repository: NousResearch/hermes-agent - Author: LeonSGP43 - State: open | merged: False -… # PR #17518: fix(agent): cascade nested fallback_model chains - Repository: NousResearch/hermes-agent - Author: LeonSGP43 - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/17518 ## Description (problem / solution / changelog) ## Summary Fixes #17485. Legacy configs can express multi-level provider failover by nesting `fallback_model` blocks. Hermes only kept the first dict in `_fallback_chain`, so after fallback 1 failed with a non-retryable/auth error, the retry loop had no second provider to activate and aborted despite the nested fallback config. This normalizes fallback config into an ordered chain before agent startup: - keeps the canonical list-form `fallback_providers` / list `fallback_model` behavior - preserves single-dict `fallback_model` behavior - flattens nested legacy `fallback_model` entries so fallback 1 can cascade to fallback 2 ## Verification - `scripts/run_tests.sh tests/run_agent/test_provider_fallback.py` (22 passed) - `git diff --check` ## Scope This only changes fallback-chain normalization and focused fallback tests. It does not alter provider-specific auth refresh, retry classification, or the canonical fallback command/config storage format. ## Changed files - `run_agent.py` (modified, +34/-9) - `tests/run_agent/test_provider_fallback.py` (modified, +62/-0) --- # PR #17535: fix(run_agent): normalize nested fallback chains to enable cascading provider failover - Repository: NousResearch/hermes-agent - Author: afurm - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/17535 ## Description (problem / solution / changelog) ## What does this PR do? This PR fixes provider fallback handling in `run_agent.py` by normalizing legacy nested `fallback_model` blocks into a single ordered runtime chain. Previously, nested fallback definitions could be skipped during failover, so fallback resolution did not consistently cascade past the first fallback provider. This change ensures fallback resolution is deterministic and traverses the full chain, including the nested case reported in #17485. ## Related Issue Fixes #17485 ## Type of Change - [x] 🐛 Bug fix (non-breaking change that fixes an issue) - [ ] ✨ New feature (non-breaking change that adds functionality) - [ ] 🔒 Security fix - [ ] 📝 Documentation update - [ ] ✅ Tests (adding or improving test coverage) - [ ] ♻️ Refactor (no behavior change) - [ ] 🎯 New skill (bundled or hub) ## Changes Made - [run_agent.py](/Users/afurm/Development/hermes-agent/run_agent.py): Added fallback-chain normalization logic to flatten legacy nested `fallback_model` payloads into the runtime fallback chain order, enabling proper cascade behavior. - [tests/run_agent/test_provider_fallback.py](/Users/afurm/Development/hermes-agent/tests/run_agent/test_provider_fallback.py): Added regression tests for: - nested fallback chain normalization - cascading fallback when an earlier fallback target raises a 401-like error - [tests/run_agent/test_fallback_model.py](/Users/afurm/Development/hermes-agent/tests/run_agent/test_fallback_model.py): Kept/verified coverage aligned with the updated normalization behavior and edge cases. ## How to Test 1. Run fallback provider tests: - `scripts/run_tests.sh tests/run_agent/test_provider_fallback.py -q` - `scripts/run_tests.sh tests/run_agent/test_fallback_model.py -q` 2. Confirm both suites pass: - `test_provider_fallback.py`: 21 passed - `test_fallback_model.py`: 28 passed 3. Add a provider configuration with a nested fallback chain and simulate a first fallback returning 401-like error; verify the agent continues cascading through the normalized fallback order and does not stop prematurely. ## Checklist ### Code - [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md) - [ ] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.) - [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate - [x] My PR contains **only** changes related to this fix/feature (no unrelated commits) - [ ] I've run `pytest tests/ -q` and all tests pass - [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features) - [ ] I've tested on my platform: ### Documentation & Housekeeping - [ ] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [ ] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [ ] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [ ] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousRese

TL;DR

The issue is likely due to a misconfiguration in the fallback model cascade, preventing it from correctly falling back to the next model (Mistral) after an invalid key error in Groq.

Guidance

Review the fallback_model block configuration to ensure that the cascade is properly defined, allowing the system to fall back to Mistral after a 401 error in Groq.
Verify that the logging indicates the correct sequence of events and that the "trying fallback..." message is followed by an attempt to use Mistral.
Check the version history of v0.11.0 to see if there are any known issues related to fallback model cascading.
Inspect the system's behavior when encountering a 401 error in Groq to determine why it aborts instead of continuing to the next fallback.

Example

No code snippet is provided due to lack of specific configuration details.

Notes

The issue seems to be specific to the configuration of the fallback models and their interaction with error handling. Without more details on the configuration or the exact error messages, it's challenging to provide a precise fix.

Recommendation

Apply workaround: Adjust the fallback_model configuration to ensure proper cascading, as the current behavior suggests a configuration issue rather than a version problem.

hermes - ✅(Solved) Fix Multi-level fallback chain aborts on 401 from first fallback instead of cascading to second fallback [2 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #17518: fix(agent): cascade nested fallback_model chains

Description (problem / solution / changelog)

Summary

Verification

Scope

Changed files

PR #17535: fix(run_agent): normalize nested fallback chains to enable cascading provider failover

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Screenshots / Logs

Changed files

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING