hermes - 💡(How to fix) Fix Per-agent fallback tier overrides for fleet right-sizing [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#25321Fetched 2026-05-14 03:47:18
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×4

Root Cause

Current fleet behavior on ollama-cloud 429: all 27 agents try sonnet next. If they collectively saturate sonnet too, they all promote to gpt-5 simultaneously. Tier-aware routing spreads load across the model bands instead of stampeding one rung.

Fix Action

Fix / Workaround

Workaround until shipped

Code Example

# ~/.hermes/config.yaml
fallback_tiers:
  cheap:
    - { provider: anthropic, model: claude-haiku-4-5, key_env: ANTHROPIC_API_KEY }
    - { provider: openai,    model: gpt-5-nano,        key_env: OPENAI_API_KEY }
    - { provider: anthropic, model: claude-sonnet-4-5, key_env: ANTHROPIC_API_KEY }
  mid:
    - { provider: anthropic, model: claude-sonnet-4-5, key_env: ANTHROPIC_API_KEY }
    - { provider: openai,    model: gpt-5,              key_env: OPENAI_API_KEY }
    - { provider: anthropic, model: claude-opus-4-7,   key_env: ANTHROPIC_API_KEY }
  hard:
    - { provider: anthropic, model: claude-opus-4-7,   key_env: ANTHROPIC_API_KEY }
    - { provider: openai,    model: gpt-5,              key_env: OPENAI_API_KEY }
    - { provider: anthropic, model: claude-sonnet-4-5, key_env: ANTHROPIC_API_KEY }

---

// ~/cortextos/orgs/gradata/agents/classifier/config.json
{ "runtime": "hermes", "tier": "cheap" }

---

hermes chat --tier cheap
HERMES_TIER=hard hermes chat
RAW_BUFFERClick to expand / collapse

Problem

fallback_providers in ~/.hermes/config.yaml is a single global chain. In a multi-agent fleet (27+ cortextOS agents), every agent falls back to the same models regardless of task complexity. A spam-classification agent gets opus; an architecture-doc agent gets capped at sonnet. Token spend is wrong-sized in both directions.

Proposal

Add a tier field to per-agent config (or, in hermes-only setups, a CLI flag / env var). Hermes reads the tier and selects from a tiered fallback chain.

Suggested tier mapping (configurable, these are defaults):

TierPrimaryFallback 1Fallback 2Use case
cheapclaude-haiku-4-5gpt-5-nanoclaude-sonnet-4-5Classify, extract, format, dedupe, lint
midclaude-sonnet-4-5gpt-5claude-opus-4-7Summarize, plan, write code, daily ops
hardclaude-opus-4-7gpt-5claude-sonnet-4-5Architecture, hard debugging, security review

Config shape

# ~/.hermes/config.yaml
fallback_tiers:
  cheap:
    - { provider: anthropic, model: claude-haiku-4-5, key_env: ANTHROPIC_API_KEY }
    - { provider: openai,    model: gpt-5-nano,        key_env: OPENAI_API_KEY }
    - { provider: anthropic, model: claude-sonnet-4-5, key_env: ANTHROPIC_API_KEY }
  mid:
    - { provider: anthropic, model: claude-sonnet-4-5, key_env: ANTHROPIC_API_KEY }
    - { provider: openai,    model: gpt-5,              key_env: OPENAI_API_KEY }
    - { provider: anthropic, model: claude-opus-4-7,   key_env: ANTHROPIC_API_KEY }
  hard:
    - { provider: anthropic, model: claude-opus-4-7,   key_env: ANTHROPIC_API_KEY }
    - { provider: openai,    model: gpt-5,              key_env: OPENAI_API_KEY }
    - { provider: anthropic, model: claude-sonnet-4-5, key_env: ANTHROPIC_API_KEY }

Per-agent override (cortextOS pattern):

// ~/cortextos/orgs/gradata/agents/classifier/config.json
{ "runtime": "hermes", "tier": "cheap" }

Or per-invocation:

hermes chat --tier cheap
HERMES_TIER=hard hermes chat

Acceptance criteria

  • Agents with tier: cheap use the cheap chain; opus never fires for them
  • Agents with tier: hard start on opus directly (skip the sonnet rung)
  • Default tier (mid) when unspecified, preserves current behavior for single-user setups
  • Falls back to global fallback_providers if fallback_tiers not configured (backward compat)
  • 🔄 Fallback chain (N providers): ... banner shows the tier name when applicable

Why this matters

Current fleet behavior on ollama-cloud 429: all 27 agents try sonnet next. If they collectively saturate sonnet too, they all promote to gpt-5 simultaneously. Tier-aware routing spreads load across the model bands instead of stampeding one rung.

Workaround until shipped

User can manually set per-agent model: overrides in each config.json, but that bypasses fallback entirely (no chain for the override). Tier-aware native support is cleaner.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Per-agent fallback tier overrides for fleet right-sizing [1 participants]