hermes - 💡(How to fix) Fix [Feature]: Separate Runtime Persistence from Inference Model Selection

hermes2026-05-08 12:21:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

but must exist as separate runtimes solely because model selection is statically bound to the runtime profile.

RAW_BUFFERClick to expand / collapse

Problem or Use Case

Related discussions: #6306 — per-task model selection in delegate_task #10995 — per-task delegate_task model routing #5012 — provider/model override support #5997 — per-skill model switching #9459 — profile-oriented subagent orchestration

This proposal focuses on a broader architectural concern:

runtime persistence vs inference model selection

Hermes currently couples runtime ownership and inference model selection through profiles/configuration.

Example:

profile: model: deepseek

This effectively creates the constraint:

One worker runtime == one model

As Hermes evolves toward:

persistent workers orchestration systems worktree isolation delegated subagents long-lived terminal sessions heterogeneous task execution

this coupling becomes increasingly restrictive.

Example problem:

coding-worker-deepseek coding-worker-qwen coding-worker-glm

All three runtimes may:

operate on the same repository share the same worktree preserve the same terminal state perform the same role

but must exist as separate runtimes solely because model selection is statically bound to the runtime profile.

This leads to:

duplicated runtime state duplicated queues fragmented terminal sessions duplicated orchestration logic inability to optimize model selection per task

Architecturally, runtime state and inference selection have different lifecycles.

Long-lived runtime concerns:

workspace terminal session git/worktree state queues memory/context state

Short-lived task concerns:

model selection provider selection routing policy latency/cost optimization

Current abstractions partially support model overrides and switching, but runtime ownership still remains tightly coupled to model identity.

Proposed Solution

Introduce an explicit separation between:

Worker Runtime vs Inference Execution

Instead of:

Profile └── Fixed Model

consider a runtime-oriented architecture:

WorkerRuntime ├── Workspace ├── Queue ├── Terminal ├── Memory ├── ContextEngine └── ModelRouter

Task execution flow:

model = worker.model_router.route(task)

session = provider_factory.create(model)

response = session.run(task_context)

This would allow a persistent runtime environment to dynamically select inference models per task without duplicating runtime state.

Example:

simple_refactor: model: qwen

hard_bug: model: deepseek

multilingual_ui: model: glm

while preserving:

the same terminal session the same git state the same worktree the same runtime memory/context

A minimal low-risk first step could simply support:

run_task( model_override="deepseek" )

or extending the existing delegate_task/provider override infrastructure already discussed in related issues.

This proposal is intended to remain fully backward compatible.

Existing profiles/configuration could still define default fallback models:

profile: model: deepseek

used whenever no task-level override or routing policy exists.

Alternatives Considered

Duplicate worker profiles per model Example: coding-worker-gpt coding-worker-glm This works today but causes runtime fragmentation and duplicated state.
Session-level model switching only Hermes already supports runtime model switching and delegate_task overrides in various forms.() However, this primarily solves: "which model is active right now?" rather than: "should runtime persistence and inference selection be separate abstractions?"
Isolated subagents per model Subagent isolation is useful for sandboxing, but becomes expensive and operationally fragmented when the only difference between runtimes is inference model selection.
Skill-level model overrides only Per-skill overrides are valuable and already proposed in #5997, but they do not fully address persistent runtime ownership and heterogeneous orchestration concerns.

Feature Type

Configuration option

Scope

Medium (few files, < 300 lines)

Contribution

I'd like to implement this myself and submit a PR

Debug Report (optional)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#optimization #parallel task #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature]: Separate Runtime Persistence from Inference Model Selection

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Debug Report (optional)

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Separate Runtime Persistence from Inference Model Selection

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Debug Report (optional)

Still need to ship something?

RELATED_DISCOVERY

TRENDING