hermes - 💡(How to fix) Fix [Feature]: Separate Runtime Persistence from Inference Model Selection

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

but must exist as separate runtimes solely because model selection is statically bound to the runtime profile.

RAW_BUFFERClick to expand / collapse

Problem or Use Case

Related discussions: #6306 — per-task model selection in delegate_task #10995 — per-task delegate_task model routing #5012 — provider/model override support #5997 — per-skill model switching #9459 — profile-oriented subagent orchestration

This proposal focuses on a broader architectural concern:

runtime persistence vs inference model selection

Hermes currently couples runtime ownership and inference model selection through profiles/configuration.

Example:

profile: model: deepseek

This effectively creates the constraint:

One worker runtime == one model

As Hermes evolves toward:

persistent workers orchestration systems worktree isolation delegated subagents long-lived terminal sessions heterogeneous task execution

this coupling becomes increasingly restrictive.

Example problem:

coding-worker-deepseek coding-worker-qwen coding-worker-glm

All three runtimes may:

operate on the same repository share the same worktree preserve the same terminal state perform the same role

but must exist as separate runtimes solely because model selection is statically bound to the runtime profile.

This leads to:

duplicated runtime state duplicated queues fragmented terminal sessions duplicated orchestration logic inability to optimize model selection per task

Architecturally, runtime state and inference selection have different lifecycles.

Long-lived runtime concerns:

workspace terminal session git/worktree state queues memory/context state

Short-lived task concerns:

model selection provider selection routing policy latency/cost optimization

Current abstractions partially support model overrides and switching, but runtime ownership still remains tightly coupled to model identity.

Proposed Solution

Introduce an explicit separation between:

Worker Runtime vs Inference Execution

Instead of:

Profile └── Fixed Model

consider a runtime-oriented architecture:

WorkerRuntime ├── Workspace ├── Queue ├── Terminal ├── Memory ├── ContextEngine └── ModelRouter

Task execution flow:

model = worker.model_router.route(task)

session = provider_factory.create(model)

response = session.run(task_context)

This would allow a persistent runtime environment to dynamically select inference models per task without duplicating runtime state.

Example:

simple_refactor: model: qwen

hard_bug: model: deepseek

multilingual_ui: model: glm

while preserving:

the same terminal session the same git state the same worktree the same runtime memory/context

A minimal low-risk first step could simply support:

run_task( model_override="deepseek" )

or extending the existing delegate_task/provider override infrastructure already discussed in related issues.

This proposal is intended to remain fully backward compatible.

Existing profiles/configuration could still define default fallback models:

profile: model: deepseek

used whenever no task-level override or routing policy exists.

Alternatives Considered

  1. Duplicate worker profiles per model  Example:  coding-worker-gpt coding-worker-glm  This works today but causes runtime fragmentation and duplicated state. 
  2. Session-level model switching only  Hermes already supports runtime model switching and delegate_task overrides in various forms.()  However, this primarily solves:  "which model is active right now?"  rather than:  "should runtime persistence and inference selection be separate abstractions?"
  3. Isolated subagents per model  Subagent isolation is useful for sandboxing, but becomes expensive and operationally fragmented when the only difference between runtimes is inference model selection. 
  4. Skill-level model overrides only  Per-skill overrides are valuable and already proposed in #5997, but they do not fully address persistent runtime ownership and heterogeneous orchestration concerns.

Feature Type

Configuration option

Scope

Medium (few files, < 300 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING