hermes - 💡(How to fix) Fix feat: model_profiles — per-model toolset and memory config

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When switching models via /model (or hermes config set), Hermes only stores model, provider, api_key, base_url, and api_mode in the session override. There is no mechanism to also adjust toolsets or memory injection based on which model is active.

This is a significant friction point for users running local/constrained-context models alongside cloud models, where the tool schema overhead is proportionally much larger.

Root Cause

When switching models via /model (or hermes config set), Hermes only stores model, provider, api_key, base_url, and api_mode in the session override. There is no mechanism to also adjust toolsets or memory injection based on which model is active.

This is a significant friction point for users running local/constrained-context models alongside cloud models, where the tool schema overhead is proportionally much larger.

Fix Action

Fix / Workaround

The current workarounds are all lossy:

  • /toolsets disable computer_use vision ... after each /model switch — not sticky, must be repeated every session
  • Separate Hermes profile — requires a second bot token, second gateway, split config surface
  • Raise context_length to 256K — doubles headroom but doesn't reduce actual waste

Code Example

model_profiles:
  - name: local-flash           # optional friendly name
    match:
      model: "mlx-community/DeepSeek-V4-Flash"
      provider: "exo"           # optional; matches any provider if omitted
    disabled_toolsets:
      - computer_use
      - vision
      - image_gen
      - video
      - video_gen
    context_length: 262144       # optional override
    skip_memory: false           # optional; disable memory injection for this model
    extra_body:                  # optional; pass-through to provider
      temperature: 0.7

  - name: cloud-default
    match:
      provider: "anthropic"
    disabled_toolsets: []        # explicit no-op / reset to defaults
RAW_BUFFERClick to expand / collapse

Summary

When switching models via /model (or hermes config set), Hermes only stores model, provider, api_key, base_url, and api_mode in the session override. There is no mechanism to also adjust toolsets or memory injection based on which model is active.

This is a significant friction point for users running local/constrained-context models alongside cloud models, where the tool schema overhead is proportionally much larger.

Problem

Tool schemas consume ~30,000–40,000 tokens of system prompt overhead. For a cloud model with a 200K+ context window this is trivial. For a local model capped at 128K (e.g. DeepSeek V4-Flash on an Exo cluster), this represents ~23–31% of total context before any user turn, conversation history, or memory is injected. Add Honcho context (~10–15K), memory (~2K), persona (~1K), and skills list (~2K), and you're burning ~45% of context on infrastructure overhead for a model that didn't ask for computer_use or vision toolsets.

The current workarounds are all lossy:

  • /toolsets disable computer_use vision ... after each /model switch — not sticky, must be repeated every session
  • Separate Hermes profile — requires a second bot token, second gateway, split config surface
  • Raise context_length to 256K — doubles headroom but doesn't reduce actual waste

Proposed Solution

Add a model_profiles list to config.yaml that maps model+provider combos to toolset and memory overrides. When the active model matches a profile, those overrides are applied at session startup (and on /model switch without requiring /reset).

Proposed schema

model_profiles:
  - name: local-flash           # optional friendly name
    match:
      model: "mlx-community/DeepSeek-V4-Flash"
      provider: "exo"           # optional; matches any provider if omitted
    disabled_toolsets:
      - computer_use
      - vision
      - image_gen
      - video
      - video_gen
    context_length: 262144       # optional override
    skip_memory: false           # optional; disable memory injection for this model
    extra_body:                  # optional; pass-through to provider
      temperature: 0.7

  - name: cloud-default
    match:
      provider: "anthropic"
    disabled_toolsets: []        # explicit no-op / reset to defaults

Match semantics

  • Match on model (exact string or glob), provider, or both
  • First matching profile wins
  • No match → current behavior (full toolset, no overrides)

Activation

  • Applied at session build time (system prompt construction)
  • Applied immediately on /model switch without requiring /reset
  • Should be surfaced in /usage or /status so users can confirm which profile is active: Model profile: local-flash (3 toolsets disabled)

Implementation Notes

  • Session override (/model switch) is handled in cli.py / run_agent.py around where enabled_toolsets/disabled_toolsets are resolved — the profile lookup should happen there
  • hermes_cli/config.py's DEFAULT_CONFIG would need a model_profiles: [] key
  • The AIAgent.__init__ already accepts enabled_toolsets and disabled_toolsets — the profile just needs to inject into those before tool discovery runs
  • This is not a custom_providers extension — it lives in a separate top-level config key so it applies regardless of which provider mechanism is used

Why Not a Separate Profile?

A second profile requires: separate bot token, separate gateway process, separate Telegram bot, separate memory store, and separate cron config. That's 5x the surface area for what is fundamentally a "use fewer tools when I'm on a smaller model" preference. model_profiles is the minimal, correct abstraction.

Real-World Trigger

This was surfaced while running DeepSeek V4-Flash (128K ctx) on a 6-node Apple Silicon Exo cluster. With full toolsets enabled, ~46% of context was consumed before the first user turn. Manually disabling heavy toolsets after each /model switch (tedious and not sticky) brought this to ~23%. A model_profiles config block would make this zero-friction.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING