hermes - 💡(How to fix) Fix [Feature]: Capability-Based Multi-Model Routing — unified config for task-type, domain-tier, and cost-aware provider selection

Code Example

# Layer 1: Process Models (cognitive roles)
process_models:
  cortex: anthropic/claude-4.6-sonnet
  coder_worker: openai/gpt-5.5
  search_worker: openai/gpt-5.5-search
  writer_worker: anthropic/claude-4.6-opus

# Layer 2: Capability Tiers (domain-specific)
capabilities:
  speed: google/gemini-3-flash
  tool_calling: google/gemini-3.5-flash
  data_viz: google/gemini-3.1-pro
  finance_legal: qwen/qwen-3.7-max
  long_context: mistralai/mistral-small-4
  cheap_fallback: qwen/qwen-3.5-35b-a3b

# Layer 3: Provider Routing (cost/throughput optimization)
provider_routing:
  sort: price
  fallback_providers:
    - provider: openrouter
      model: google/gemini-3-flash

Problem or Use Case

Hermes currently uses a single model per agent session. For workflows spanning different capability domains (coding, creative writing, web search, data visualization, finance/legal analysis), this forces a compromise:

A strong model for everything is expensive and slow for simple tasks
A cheap model for everything underperforms on complex work
Manually switching models or spawning subagents is friction

Concrete use case: I want one agent session where:

Complex reasoning / planning → Claude Sonnet 4.6 / GPT 5.5
Coding & tools → GPT 5.5 (strong function calling)
Web search → GPT 5.5 Search (native web grounding)
Creative writing → Claude Opus 4.6 (best prose)
Quick / speed tasks → Gemini 3 Flash (low latency)
Data viz / math → Gemini 3.1 Pro
Finance / legal → Qwen 3.7 Max (domain strength)
Long context / logs → Mistral Small 4 (high recall)
Cheap fallback → Qwen 3.5 (low-cost generic chat)

Today there is no native way to configure this. Workarounds exist (delegate_task, multiple profiles, custom spawning) but none are transparent or automatic.

Proposed Solution

Architecture: 3-Layer Model Routing

Introduce three new config blocks that work together:

# Layer 1: Process Models (cognitive roles)
process_models:
  cortex: anthropic/claude-4.6-sonnet
  coder_worker: openai/gpt-5.5
  search_worker: openai/gpt-5.5-search
  writer_worker: anthropic/claude-4.6-opus

# Layer 2: Capability Tiers (domain-specific)
capabilities:
  speed: google/gemini-3-flash
  tool_calling: google/gemini-3.5-flash
  data_viz: google/gemini-3.1-pro
  finance_legal: qwen/qwen-3.7-max
  long_context: mistralai/mistral-small-4
  cheap_fallback: qwen/qwen-3.5-35b-a3b

# Layer 3: Provider Routing (cost/throughput optimization)
provider_routing:
  sort: price
  fallback_providers:
    - provider: openrouter
      model: google/gemini-3-flash

Routing Logic

The router lives in a new agent/model_router.py module:

Task classification — the agent identifies the nature of each user turn (coding, writing, search, viz, finance, or generic)
Model selection priority chain:
- process_models.<role> (most specific — cognitive role)
- capabilities.<domain> (domain tier — used when no specific role matches)
- model.default (fallback — current behavior)
Provider optimization — when a model is available through multiple providers, the provider_routing.sort strategy picks the best (cheapest, fastest)
Backend health awareness — integrates with the backend health registry proposed in #32690: skip unhealthy backends, respect cooldowns

Key Design Goals

Backward compatible — model.default works as today; new sections are additive
User-defined categories — not hardcoded; users define what roles and tiers matter
Stateless routing — each model swap is a fresh pass; no complex cross-model state
Transparent — the agent handles routing; the user doesn't manually switch
Health-aware — builds directly on #32690's backend health registry

Routing Trigger Mechanisms

Option A — Agent self-routes (preferred): The system prompt includes the routing config so the agent recognizes task types and signals the router. The framework handles model swaps transparently.

Option B — Config-driven routing: The router pattern-matches incoming requests (e.g., if tool calls are terminal-heavy → route to coder_worker; if response needs SVG → route to data_viz) without agent involvement.

Both can coexist: Option A for ambiguous tasks, Option B for clear signal-based routing.

Alternatives Considered

Approach	Why it falls short
delegate_task with model override (#4833, #26736)	Requires the agent to manually decide to delegate; not automatic or transparent
Multiple Hermes profiles	Requires manual switching; breaks conversation continuity; no cross-profile state sharing
Model-router skill (#29572)	Skill-based approach is fragile — depends on agent instruction-following rather than framework-level routing; no provider optimization
Backend health routing only (#32690)	Only addresses failover, not proactive capability-based routing
Chat vs tool-use routing (#26027)	Too coarse — 2 buckets don't cover the variety of real workflows
Existing per-task routing (#4461)	Closest, but doesn't include the process_models/capabilities hierarchy or provider cost optimization

Feature Type

Configuration option / Core agent improvement

Scope

Large (new module: agent/model_router.py, config schema changes, prompt builder integration, health-aware fallback integration)

Related Issues

#32690 — Backend health-aware fallback (foundation for health-aware routing)
#4461 — Multimodel Routing for Agent Profiles (closest prior proposal)
#26027 — Model routing for chat vs tool-use tasks (overlapping concept)
#4833 — Per-skill model routing + supervisor/execution model config (complementary)
#29572 — Model-router skill (alternative skill-based approach)

Contribution

I'd like to discuss the design here before implementing. Happy to contribute code once the approach is agreed upon.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature]: Capability-Based Multi-Model Routing — unified config for task-type, domain-tier, and cost-aware provider selection

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround