hermes - 💡(How to fix) Fix [Feature]: Capability-Based Multi-Model Routing — unified config for task-type, domain-tier, and cost-aware provider selection

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

Today there is no native way to configure this. Workarounds exist (delegate_task, multiple profiles, custom spawning) but none are transparent or automatic.

Code Example

# Layer 1: Process Models (cognitive roles)
process_models:
  cortex: anthropic/claude-4.6-sonnet
  coder_worker: openai/gpt-5.5
  search_worker: openai/gpt-5.5-search
  writer_worker: anthropic/claude-4.6-opus

# Layer 2: Capability Tiers (domain-specific)
capabilities:
  speed: google/gemini-3-flash
  tool_calling: google/gemini-3.5-flash
  data_viz: google/gemini-3.1-pro
  finance_legal: qwen/qwen-3.7-max
  long_context: mistralai/mistral-small-4
  cheap_fallback: qwen/qwen-3.5-35b-a3b

# Layer 3: Provider Routing (cost/throughput optimization)
provider_routing:
  sort: price
  fallback_providers:
    - provider: openrouter
      model: google/gemini-3-flash
RAW_BUFFERClick to expand / collapse

Problem or Use Case

Hermes currently uses a single model per agent session. For workflows spanning different capability domains (coding, creative writing, web search, data visualization, finance/legal analysis), this forces a compromise:

  • A strong model for everything is expensive and slow for simple tasks
  • A cheap model for everything underperforms on complex work
  • Manually switching models or spawning subagents is friction

Concrete use case: I want one agent session where:

  • Complex reasoning / planning → Claude Sonnet 4.6 / GPT 5.5
  • Coding & tools → GPT 5.5 (strong function calling)
  • Web search → GPT 5.5 Search (native web grounding)
  • Creative writing → Claude Opus 4.6 (best prose)
  • Quick / speed tasks → Gemini 3 Flash (low latency)
  • Data viz / math → Gemini 3.1 Pro
  • Finance / legal → Qwen 3.7 Max (domain strength)
  • Long context / logs → Mistral Small 4 (high recall)
  • Cheap fallback → Qwen 3.5 (low-cost generic chat)

Today there is no native way to configure this. Workarounds exist (delegate_task, multiple profiles, custom spawning) but none are transparent or automatic.

Proposed Solution

Architecture: 3-Layer Model Routing

Introduce three new config blocks that work together:

# Layer 1: Process Models (cognitive roles)
process_models:
  cortex: anthropic/claude-4.6-sonnet
  coder_worker: openai/gpt-5.5
  search_worker: openai/gpt-5.5-search
  writer_worker: anthropic/claude-4.6-opus

# Layer 2: Capability Tiers (domain-specific)
capabilities:
  speed: google/gemini-3-flash
  tool_calling: google/gemini-3.5-flash
  data_viz: google/gemini-3.1-pro
  finance_legal: qwen/qwen-3.7-max
  long_context: mistralai/mistral-small-4
  cheap_fallback: qwen/qwen-3.5-35b-a3b

# Layer 3: Provider Routing (cost/throughput optimization)
provider_routing:
  sort: price
  fallback_providers:
    - provider: openrouter
      model: google/gemini-3-flash

Routing Logic

The router lives in a new agent/model_router.py module:

  1. Task classification — the agent identifies the nature of each user turn (coding, writing, search, viz, finance, or generic)
  2. Model selection priority chain:
    • process_models.<role> (most specific — cognitive role)
    • capabilities.<domain> (domain tier — used when no specific role matches)
    • model.default (fallback — current behavior)
  3. Provider optimization — when a model is available through multiple providers, the provider_routing.sort strategy picks the best (cheapest, fastest)
  4. Backend health awareness — integrates with the backend health registry proposed in #32690: skip unhealthy backends, respect cooldowns

Key Design Goals

  • Backward compatiblemodel.default works as today; new sections are additive
  • User-defined categories — not hardcoded; users define what roles and tiers matter
  • Stateless routing — each model swap is a fresh pass; no complex cross-model state
  • Transparent — the agent handles routing; the user doesn't manually switch
  • Health-aware — builds directly on #32690's backend health registry

Routing Trigger Mechanisms

Option A — Agent self-routes (preferred): The system prompt includes the routing config so the agent recognizes task types and signals the router. The framework handles model swaps transparently.

Option B — Config-driven routing: The router pattern-matches incoming requests (e.g., if tool calls are terminal-heavy → route to coder_worker; if response needs SVG → route to data_viz) without agent involvement.

Both can coexist: Option A for ambiguous tasks, Option B for clear signal-based routing.

Alternatives Considered

ApproachWhy it falls short
delegate_task with model override (#4833, #26736)Requires the agent to manually decide to delegate; not automatic or transparent
Multiple Hermes profilesRequires manual switching; breaks conversation continuity; no cross-profile state sharing
Model-router skill (#29572)Skill-based approach is fragile — depends on agent instruction-following rather than framework-level routing; no provider optimization
Backend health routing only (#32690)Only addresses failover, not proactive capability-based routing
Chat vs tool-use routing (#26027)Too coarse — 2 buckets don't cover the variety of real workflows
Existing per-task routing (#4461)Closest, but doesn't include the process_models/capabilities hierarchy or provider cost optimization

Feature Type

Configuration option / Core agent improvement

Scope

Large (new module: agent/model_router.py, config schema changes, prompt builder integration, health-aware fallback integration)

Related Issues

  • #32690 — Backend health-aware fallback (foundation for health-aware routing)
  • #4461 — Multimodel Routing for Agent Profiles (closest prior proposal)
  • #26027 — Model routing for chat vs tool-use tasks (overlapping concept)
  • #4833 — Per-skill model routing + supervisor/execution model config (complementary)
  • #29572 — Model-router skill (alternative skill-based approach)

Contribution

I'd like to discuss the design here before implementing. Happy to contribute code once the approach is agreed upon.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING