hermes - 💡(How to fix) Fix Model Presets: per-turn expert-on-demand model escalation [3 comments, 2 participants]

apoapostolov · 2026-05-05T14:10:40Z

[hermes] Problem Right now, Hermes runs one model for the entire session. If you're on a cheap/fast model Gemini Flash, DeepSeek V4 Flash , and a turn comes up… ## Fix / Workaround 3. **Turn dispatch in run_conversation()**: Before `_call_llm()`, check if `_one_shot_preset` is set. If so: - Snapshot the current runtime (`self.model`, `self.provider`, etc.) - Load the preset's model (via `_switch_session_model()` or a lighter-weight override that avoids rebuilding prompt cache if the provider is the same) - Call LLM - After response arrives, restore the snapshot ## Problem Right now, Hermes runs one model for the entire session. If you're on a cheap/fast model (Gemini Flash, DeepSeek V4 Flash), and a turn comes up that genuinely needs Opus-level reasoning — designing a complex system, debugging a multi-module race condition, or reasoning about a tricky crypto protocol — you have two bad options: 1. **/model** to a bigger model, take the hit on all subsequent turns (waste of tokens and money), then /model back manually. 2. Power through on the weak model, get a mediocre answer, then re-prompt with /retry. Neither is good. What we need: the ability to **temporarily escalate to an expert model for a single turn**, then snap back to the default. ## Proposed Solution: Model Presets A new config section `model_presets` that defines named provider+model combinations, plus a lightweight mechanism to invoke them for one turn. ```yaml model_presets: expert: provider: openrouter model: anthropic/claude-opus-4-6 reasoning_effort: high deepthink: provider: deepseek model: deepseek-v4-pro reasoning_effort: high cheap: provider: openrouter model: google/gemini-3-flash-preview reasoning_effort: none ``` ### Interaction modes **1. Explicit: `/expert` (or `/preset deepthink`)** Before the next turn, you type `/expert` and the agent switches to the `expert` preset for that single call. After the model responds, the runtime swaps back to the default. If no preset is specified, `model_presets.expert` is the default target. `/preset ` lets you pick any named preset. **2. Implicit: auto-detect complexity** The agent loop sniffs signals that suggest this turn needs more horsepower: - User asks for a design decision, architecture review, or systems-level refactor - The previous assistant response was long (suggests deep multi-step reasoning in progress) - The user explicitly says "think carefully", "analyze deeply", "design", "propose architecture" - The task touches multiple files across different packages (cross-context reasoning signal) When confidence is high enough, the loop transparently routes the turn through the expert preset before falling back. The user shouldn't need to know the mechanism — the agent just "gives it the good model" for the hard parts. ### Slash command argument ```/expert``` — one-shot escalate to `model_presets.expert` ```/expert deepseek-v4-pro``` — one-shot escalate with an inline model override ```/preset deepthink``` — one-shot escalate to `model_presets.deepthink` ```/preset list``` — show available presets with their models ```/preset``` current status — shows the active preset, if any ## Implementation sketch The current `AIAgent._call_llm()` at `run_agent.py` line ~8386 takes `reasoning_config`, `max_tokens`, etc. The runtime carries `self.model`, `self.provider`, `self.base_url`, `self.api_mode` and `_switch_session_model()` already exists for swapping at line ~2290. The minimal change: 1. **Config**: Add `model_presets` dict to `DEFAULT_CONFIG` in `hermes_cli/config.py`. Each key = name, each value = `{provider, model, reasoning_effort?, base_url?, api_key?}`. 2. **Slash command**: Add `CommandDef("expert", ...)` and `CommandDef("preset", ...)` to `hermes_cli/commands.py`. Both set a one-shot flag on the AIAgent (`self._one_shot_preset` or similar). 3. **Turn dispatch in run_conversation()**: Before `_call_llm()`, check if `_one_shot_preset` is set. If so: - Snapshot the current runtime (`self.model`, `self.provider`, etc.) - Load the preset's model (via `_switch_session_model()` or a lighter-weight override that avoids rebuilding prompt cache if the provider is the same) - Call LLM - After response arrives, restore the snapshot 4. **Auto-detect** (optional Phase 2): A lightweight classifier — could be as simple as a regex pattern match on the user's message + prior context length heuristic, or a cheap model call (Gemini Flash costs pennies) that judges "does this need the big model?". The classifier runs before the LLM call in the main loop, returns a confidence score, and if above threshold, triggers the preset. This should be tunable/non-blocking — if the classifier fails, the default model runs, no harm. ### Edge cases - **Nesting**: If `/expert` is invoked and the call itself makes tool calls, all tool calls in that turn also run under the expert model (because the entire turn is one LLM loop). Only the *next* user turn restores the default. - **Concurrent presets**: last one wins. -

hermes2026-05-05 14:10:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#20249•Fetched 2026-05-06 06:37:52

View on GitHub

Comments

Participants

Timeline

Reactions

Author

apoapostolov

Participants

alt-glitch

apoapostolov

Timeline (top)

labeled ×4commented ×3cross-referenced ×2

Root Cause

Nesting: If /expert is invoked and the call itself makes tool calls, all tool calls in that turn also run under the expert model (because the entire turn is one LLM loop). Only the next user turn restores the default.
Concurrent presets: last one wins.
Provider mismatch: If the expert preset uses a different provider than the default (e.g. Flash via OpenRouter → Opus via Anthropic native), the API mode might differ. _switch_session_model() already handles this for the /model command — reuse that path.
Fallback interaction: If the expert model itself fails, the existing fallback chain should kick in (for that turn), then restore to default on next turn.
Subagents: /expert applies only to the parent agent's turn. Children via delegate_task should not inherit the one-shot preset unless explicitly configured — they have their own delegation.model config.

Fix Action

Fix / Workaround

Turn dispatch in run_conversation(): Before _call_llm(), check if _one_shot_preset is set. If so:
- Snapshot the current runtime (self.model, self.provider, etc.)
- Load the preset's model (via _switch_session_model() or a lighter-weight override that avoids rebuilding prompt cache if the provider is the same)
- Call LLM
- After response arrives, restore the snapshot

Code Example

model_presets:
  expert:
    provider: openrouter
    model: anthropic/claude-opus-4-6
    reasoning_effort: high
  deepthink:
    provider: deepseek
    model: deepseek-v4-pro
    reasoning_effort: high
  cheap:
    provider: openrouter
    model: google/gemini-3-flash-preview
    reasoning_effort: none

RAW_BUFFERClick to expand / collapse

Problem

Right now, Hermes runs one model for the entire session. If you're on a cheap/fast model (Gemini Flash, DeepSeek V4 Flash), and a turn comes up that genuinely needs Opus-level reasoning — designing a complex system, debugging a multi-module race condition, or reasoning about a tricky crypto protocol — you have two bad options:

/model to a bigger model, take the hit on all subsequent turns (waste of tokens and money), then /model back manually.
Power through on the weak model, get a mediocre answer, then re-prompt with /retry.

Neither is good. What we need: the ability to temporarily escalate to an expert model for a single turn, then snap back to the default.

Proposed Solution: Model Presets

A new config section model_presets that defines named provider+model combinations, plus a lightweight mechanism to invoke them for one turn.

model_presets:
  expert:
    provider: openrouter
    model: anthropic/claude-opus-4-6
    reasoning_effort: high
  deepthink:
    provider: deepseek
    model: deepseek-v4-pro
    reasoning_effort: high
  cheap:
    provider: openrouter
    model: google/gemini-3-flash-preview
    reasoning_effort: none

Interaction modes

1. Explicit: /expert (or /preset deepthink)

Before the next turn, you type /expert and the agent switches to the expert preset for that single call. After the model responds, the runtime swaps back to the default. If no preset is specified, model_presets.expert is the default target. /preset <name> lets you pick any named preset.

2. Implicit: auto-detect complexity

The agent loop sniffs signals that suggest this turn needs more horsepower:

User asks for a design decision, architecture review, or systems-level refactor
The previous assistant response was long (suggests deep multi-step reasoning in progress)
The user explicitly says "think carefully", "analyze deeply", "design", "propose architecture"
The task touches multiple files across different packages (cross-context reasoning signal)

When confidence is high enough, the loop transparently routes the turn through the expert preset before falling back. The user shouldn't need to know the mechanism — the agent just "gives it the good model" for the hard parts.

Slash command argument

/expert — one-shot escalate to model_presets.expert /expert deepseek-v4-pro — one-shot escalate with an inline model override /preset deepthink — one-shot escalate to model_presets.deepthink /preset list — show available presets with their models /preset current status — shows the active preset, if any

Implementation sketch

The current AIAgent._call_llm() at run_agent.py line ~8386 takes reasoning_config, max_tokens, etc. The runtime carries self.model, self.provider, self.base_url, self.api_mode and _switch_session_model() already exists for swapping at line ~2290.

The minimal change:

Config: Add model_presets dict to DEFAULT_CONFIG in hermes_cli/config.py. Each key = name, each value = {provider, model, reasoning_effort?, base_url?, api_key?}.
Slash command: Add CommandDef("expert", ...) and CommandDef("preset", ...) to hermes_cli/commands.py. Both set a one-shot flag on the AIAgent (self._one_shot_preset or similar).
Turn dispatch in run_conversation(): Before _call_llm(), check if _one_shot_preset is set. If so:
- Snapshot the current runtime (self.model, self.provider, etc.)
- Load the preset's model (via _switch_session_model() or a lighter-weight override that avoids rebuilding prompt cache if the provider is the same)
- Call LLM
- After response arrives, restore the snapshot
Auto-detect (optional Phase 2): A lightweight classifier — could be as simple as a regex pattern match on the user's message + prior context length heuristic, or a cheap model call (Gemini Flash costs pennies) that judges "does this need the big model?". The classifier runs before the LLM call in the main loop, returns a confidence score, and if above threshold, triggers the preset. This should be tunable/non-blocking — if the classifier fails, the default model runs, no harm.

Edge cases

Nesting: If /expert is invoked and the call itself makes tool calls, all tool calls in that turn also run under the expert model (because the entire turn is one LLM loop). Only the next user turn restores the default.
Concurrent presets: last one wins.
Provider mismatch: If the expert preset uses a different provider than the default (e.g. Flash via OpenRouter → Opus via Anthropic native), the API mode might differ. _switch_session_model() already handles this for the /model command — reuse that path.
Fallback interaction: If the expert model itself fails, the existing fallback chain should kick in (for that turn), then restore to default on next turn.
Subagents: /expert applies only to the parent agent's turn. Children via delegate_task should not inherit the one-shot preset unless explicitly configured — they have their own delegation.model config.

What this is NOT

Not a replacement for the fallback chain (which is reactive, for failures)
Not a replacement for delegation.model (which is per-subagent, not per-turn)
Not a full routing layer — it's a simple snapshot/restore around a single LLM call
Not automatic model selection on every turn (that's a separate, much harder problem — this is the pragmatic first step)

Related code references

run_agent.py:AIAgent.__init__() line ~1211: self.reasoning_config
run_agent.py:AIAgent._switch_session_model() line ~2290: runtime model swap
run_agent.py:AIAgent._try_activate_fallback() line ~7750: per-turn fallback (useful pattern for restore logic)
run_agent.py:AIAgent._restore_primary_runtime() line ~7772: existing per-turn restoration
hermes_cli/commands.py: slash command registry (add CommandDef entries)
hermes_cli/config.py:DEFAULT_CONFIG line ~386: where new config keys go
agent/models_dev.py: ModelInfo.reasoning flag already exists for metadata
agent/lmstudio_reasoning.py: reasoning-effort resolution pattern to reuse
cli-config.yaml.example: document the new section

Alternatives considered

Two persistent profiles and switch between them: Profiles are for fully isolated Hermes instances (config, skills, sessions). Overkill for a single-turn model swap.
delegate_task to the big model: That spawns a subagent with no conversation history. The point of /expert is to bring the full context to the powerful model.
Just run Opus all the time: Expensive and slow for the 90% of turns that don't need it.
Router middleware: Too heavy for v1. A simple snapshot/restore with a regex-based auto-detector is a 200-line change. Router can be built on top later.

extent analysis

TL;DR

Implement a model_presets config section and add slash commands to temporarily escalate to an expert model for a single turn.

Guidance

Add model_presets to config: Define named provider+model combinations in a new model_presets section in hermes_cli/config.py.
Implement slash commands: Add CommandDef entries for /expert and /preset in hermes_cli/commands.py to set a one-shot flag on the AIAgent.
Modify turn dispatch: Check for the one-shot flag in run_conversation() and snapshot the current runtime before loading the preset's model and calling the LLM.
Restore default model: After the response arrives, restore the snapshot to revert to the default model.

Example

# hermes_cli/config.py
DEFAULT_CONFIG = {
    # ...
    'model_presets': {
        'expert': {
            'provider': 'openrouter',
            'model': 'anthropic/claude-opus-4-6',
            'reasoning_effort': 'high'
        }
    }
}

# hermes_cli/commands.py
CommandDef('expert', ...)

# run_agent.py
def run_conversation():
    # ...
    if self._one_shot_preset:
        # Snapshot current runtime
        # Load preset's model
        # Call LLM
        # Restore default model after response
    # ...

Notes

This implementation assumes that the AIAgent class has the necessary methods for switching models and restoring the runtime. The auto-detect feature for complexity can be added in a separate phase.

Recommendation

Apply the workaround by implementing the model_presets config section and slash commands to temporarily escalate to an expert model for a single turn. This approach provides a flexible and efficient solution for handling complex tasks without incurring the cost of

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #conversation history #runtime error #dependency conflict #environment setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Model Presets: per-turn expert-on-demand model escalation [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Problem

Proposed Solution: Model Presets

Interaction modes

Slash command argument

Implementation sketch

Edge cases

What this is NOT

Related code references

Alternatives considered

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Model Presets: per-turn expert-on-demand model escalation [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Problem

Proposed Solution: Model Presets

Interaction modes

Slash command argument

Implementation sketch

Edge cases

What this is NOT

Related code references

Alternatives considered

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING