hermes - 💡(How to fix) Fix [Feature]: Configurable model for background memory and skill reviews [1 participants]

memory: review: model: null # null = fallback to self.model (current behavior) provider: null base_url: null api_key_env: null skills: review: model: null provider: null base_url: null api_key_env: null ---

Problem or Use Case

_spawn_background_review and the skill review path both hardcode model=self.model, inheriting the main model for all background reflection work. This creates a structurally different cost profile depending on which main model a user runs.

For users on premium models (Opus, GPT-5, etc.), the overhead is proportionally small. For users on cheap models (GLM-5.1, Kimi, etc.), background reviews effectively double token consumption on the main model's billing tier — since review work is as token-heavy as the visible conversation.

This proposal adds a configurable review model, analogous to the existing auxiliary.* pattern, so users can route background reflection to a cheaper model without changing their main model.

Proposed Solution

Running Hermes v0.10.0 with GLM-5.1 as main model, auxiliary tasks routed to glm-4.7-flashx:

Main conversation tokens per turn: ~17,100 (peak last_prompt_tokens in session metadata); also saw ~8,200–8,300 in 429 logs at a shorter history.
Background memory review tokens per nudge: not separately logged; estimated same order of magnitude as one main-model call on the full conversation snapshot at nudge time (~8k–17k prompt tokens observed range in this setup).
Nudge interval: 10
Resulting hidden main-model load: ~10% extra API calls vs visible user turns (1 review per 10 turns); token-weighted % not measured without per-call accounting.

Relevant code path: run_agent.py around _spawn_background_review (~line 1680, referenced in #5129), and the skill review counterpart triggered via _should_review_skills.

Both instantiate a fresh AIAgent(model=self.model, ...). There is currently no config override for this.

Proposed solution

Add review model configuration mirroring the existing auxiliary.* schema:

memory:
  review:
    model: null        # null = fallback to self.model (current behavior)
    provider: null
    base_url: null
    api_key_env: null

skills:
  review:
    model: null
    provider: null
    base_url: null
    api_key_env: null

When unset, behavior is unchanged. When set, background reviews instantiate their AIAgent with the override values.

Why this is a structural issue, not just a user cost concern

Hermes explicitly markets itself as runnable on any model, including cheap endpoints ("$5 VPS, any provider"). The README highlights z.ai/GLM, Kimi/Moonshot, MiMo, and custom endpoints as first-class options.

The current design implicitly assumes parity between main and review model cost tiers. Users choosing cheap models to make 24/7 agent operation affordable hit a hidden cost multiplier that the config surface doesn't expose.

The fix aligns background review with how Hermes already handles compression, vision, title generation, and web summarization — all of which are independently configurable via auxiliary.*.

Why keep review separate from `auxiliary.*`

Memory and skill review are content-level reflection, not side-tasks like title generation. Keeping them under their respective feature namespaces (memory.review, skills.review) makes the config self-documenting and preserves the semantic distinction.

Alternatives Considered

Is there a design reason I'm missing for why review must match main model quality? If so, the current behavior makes sense — but the docs could note this cost implication for cheap-model users.
If the proposal direction is acceptable, I'm happy to contribute a PR. Would you prefer a single PR covering both memory and skill review, or split?

#5129 (duplicate memory manager instances in review agent — same code path)
#8506 (smart routing blocks nudge counter — adjacent issue)

Happy to iterate on design before writing code. Let me know if a different approach fits the codebase better.

Feature Type

New tool

Scope

None

Contribution

I'd like to implement this myself and submit a PR

Debug Report (optional)

extent analysis

TL;DR

To address the issue of background reviews doubling token consumption on the main model's billing tier for users on cheap models, add a configurable review model that allows routing background reflection to a cheaper model.

Guidance

Introduce a new configuration option to specify a separate model for background reviews, similar to the existing auxiliary.* pattern.
Update the run_agent.py file around _spawn_background_review (~line 1680) and the skill review counterpart to use the configured review model.
Consider implementing a fallback to the main model if the review model is not configured.
Review the code changes to ensure they align with the existing design and do not introduce any unintended consequences.

Example

memory:
  review:
    model: glm-4.7-flashx
    provider: null
    base_url: null
    api_key_env: null

skills:
  review:
    model: glm-4.7-flashx
    provider: null
    base_url: null
    api_key_env: null

Notes

The proposed solution assumes that the review model can be configured independently of the main model. However, it is essential to consider the implications of using a different model for background reviews and ensure that it does not affect the overall functionality of the system.

Recommendation

Apply the proposed workaround by adding a configurable review model to route background reflection to a cheaper model, as it aligns with the existing design and addresses the issue of token consumption on cheap models.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature]: Configurable model for background memory and skill reviews [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem or Use Case

Proposed Solution

Proposed solution

Why this is a structural issue, not just a user cost concern

Why keep review separate from `auxiliary.*`

Alternatives Considered

Related

Feature Type

Scope

Contribution

Debug Report (optional)

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Configurable model for background memory and skill reviews [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem or Use Case

Proposed Solution

Proposed solution

Why this is a structural issue, not just a user cost concern

Why keep review separate from auxiliary.*

Alternatives Considered

Related

Feature Type

Scope

Contribution

Debug Report (optional)

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Why keep review separate from `auxiliary.*`