hermes - 💡(How to fix) Fix [Feature]: Configurable model for background memory and skill reviews [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#13578Fetched 2026-04-22 08:05:38
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
labeled ×1

Code Example

memory:
  review:
    model: null        # null = fallback to self.model (current behavior)
    provider: null
    base_url: null
    api_key_env: null

skills:
  review:
    model: null
    provider: null
    base_url: null
    api_key_env: null

---
RAW_BUFFERClick to expand / collapse

Problem or Use Case

_spawn_background_review and the skill review path both hardcode model=self.model, inheriting the main model for all background reflection work. This creates a structurally different cost profile depending on which main model a user runs.

For users on premium models (Opus, GPT-5, etc.), the overhead is proportionally small. For users on cheap models (GLM-5.1, Kimi, etc.), background reviews effectively double token consumption on the main model's billing tier — since review work is as token-heavy as the visible conversation.

This proposal adds a configurable review model, analogous to the existing auxiliary.* pattern, so users can route background reflection to a cheaper model without changing their main model.

Proposed Solution

Running Hermes v0.10.0 with GLM-5.1 as main model, auxiliary tasks routed to glm-4.7-flashx:

  • Main conversation tokens per turn: ~17,100 (peak last_prompt_tokens in session metadata); also saw ~8,200–8,300 in 429 logs at a shorter history.
  • Background memory review tokens per nudge: not separately logged; estimated same order of magnitude as one main-model call on the full conversation snapshot at nudge time (~8k–17k prompt tokens observed range in this setup).
  • Nudge interval: 10
  • Resulting hidden main-model load: ~10% extra API calls vs visible user turns (1 review per 10 turns); token-weighted % not measured without per-call accounting.

Relevant code path: run_agent.py around _spawn_background_review (~line 1680, referenced in #5129), and the skill review counterpart triggered via _should_review_skills.

Both instantiate a fresh AIAgent(model=self.model, ...). There is currently no config override for this.

Proposed solution

Add review model configuration mirroring the existing auxiliary.* schema:

memory:
  review:
    model: null        # null = fallback to self.model (current behavior)
    provider: null
    base_url: null
    api_key_env: null

skills:
  review:
    model: null
    provider: null
    base_url: null
    api_key_env: null

When unset, behavior is unchanged. When set, background reviews instantiate their AIAgent with the override values.

Why this is a structural issue, not just a user cost concern

Hermes explicitly markets itself as runnable on any model, including cheap endpoints ("$5 VPS, any provider"). The README highlights z.ai/GLM, Kimi/Moonshot, MiMo, and custom endpoints as first-class options.

The current design implicitly assumes parity between main and review model cost tiers. Users choosing cheap models to make 24/7 agent operation affordable hit a hidden cost multiplier that the config surface doesn't expose.

The fix aligns background review with how Hermes already handles compression, vision, title generation, and web summarization — all of which are independently configurable via auxiliary.*.

Why keep review separate from auxiliary.*

Memory and skill review are content-level reflection, not side-tasks like title generation. Keeping them under their respective feature namespaces (memory.review, skills.review) makes the config self-documenting and preserves the semantic distinction.

Alternatives Considered

  1. Is there a design reason I'm missing for why review must match main model quality? If so, the current behavior makes sense — but the docs could note this cost implication for cheap-model users.

  2. If the proposal direction is acceptable, I'm happy to contribute a PR. Would you prefer a single PR covering both memory and skill review, or split?

Related

  • #5129 (duplicate memory manager instances in review agent — same code path)
  • #8506 (smart routing blocks nudge counter — adjacent issue)

Happy to iterate on design before writing code. Let me know if a different approach fits the codebase better.

Feature Type

New tool

Scope

None

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

extent analysis

TL;DR

To address the issue of background reviews doubling token consumption on the main model's billing tier for users on cheap models, add a configurable review model that allows routing background reflection to a cheaper model.

Guidance

  • Introduce a new configuration option to specify a separate model for background reviews, similar to the existing auxiliary.* pattern.
  • Update the run_agent.py file around _spawn_background_review (~line 1680) and the skill review counterpart to use the configured review model.
  • Consider implementing a fallback to the main model if the review model is not configured.
  • Review the code changes to ensure they align with the existing design and do not introduce any unintended consequences.

Example

memory:
  review:
    model: glm-4.7-flashx
    provider: null
    base_url: null
    api_key_env: null

skills:
  review:
    model: glm-4.7-flashx
    provider: null
    base_url: null
    api_key_env: null

Notes

The proposed solution assumes that the review model can be configured independently of the main model. However, it is essential to consider the implications of using a different model for background reviews and ensure that it does not affect the overall functionality of the system.

Recommendation

Apply the proposed workaround by adding a configurable review model to route background reflection to a cheaper model, as it aligns with the existing design and addresses the issue of token consumption on cheap models.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Configurable model for background memory and skill reviews [1 participants]