hermes - 💡(How to fix) Fix feat: model_profiles — per-model toolset and memory config

Root Cause

When switching models via /model (or hermes config set), Hermes only stores model, provider, api_key, base_url, and api_mode in the session override. There is no mechanism to also adjust toolsets or memory injection based on which model is active.

This is a significant friction point for users running local/constrained-context models alongside cloud models, where the tool schema overhead is proportionally much larger.

Fix Action

Fix / Workaround

The current workarounds are all lossy:

/toolsets disable computer_use vision ... after each /model switch — not sticky, must be repeated every session
Separate Hermes profile — requires a second bot token, second gateway, split config surface
Raise context_length to 256K — doubles headroom but doesn't reduce actual waste

Code Example

model_profiles:
  - name: local-flash           # optional friendly name
    match:
      model: "mlx-community/DeepSeek-V4-Flash"
      provider: "exo"           # optional; matches any provider if omitted
    disabled_toolsets:
      - computer_use
      - vision
      - image_gen
      - video
      - video_gen
    context_length: 262144       # optional override
    skip_memory: false           # optional; disable memory injection for this model
    extra_body:                  # optional; pass-through to provider
      temperature: 0.7

  - name: cloud-default
    match:
      provider: "anthropic"
    disabled_toolsets: []        # explicit no-op / reset to defaults

Summary

This is a significant friction point for users running local/constrained-context models alongside cloud models, where the tool schema overhead is proportionally much larger.

Problem

Tool schemas consume ~30,000–40,000 tokens of system prompt overhead. For a cloud model with a 200K+ context window this is trivial. For a local model capped at 128K (e.g. DeepSeek V4-Flash on an Exo cluster), this represents ~23–31% of total context before any user turn, conversation history, or memory is injected. Add Honcho context (~10–15K), memory (~2K), persona (~1K), and skills list (~2K), and you're burning ~45% of context on infrastructure overhead for a model that didn't ask for computer_use or vision toolsets.

The current workarounds are all lossy:

/toolsets disable computer_use vision ... after each /model switch — not sticky, must be repeated every session
Separate Hermes profile — requires a second bot token, second gateway, split config surface
Raise context_length to 256K — doubles headroom but doesn't reduce actual waste

Proposed Solution

Add a model_profiles list to config.yaml that maps model+provider combos to toolset and memory overrides. When the active model matches a profile, those overrides are applied at session startup (and on /model switch without requiring /reset).

Proposed schema

model_profiles:
  - name: local-flash           # optional friendly name
    match:
      model: "mlx-community/DeepSeek-V4-Flash"
      provider: "exo"           # optional; matches any provider if omitted
    disabled_toolsets:
      - computer_use
      - vision
      - image_gen
      - video
      - video_gen
    context_length: 262144       # optional override
    skip_memory: false           # optional; disable memory injection for this model
    extra_body:                  # optional; pass-through to provider
      temperature: 0.7

  - name: cloud-default
    match:
      provider: "anthropic"
    disabled_toolsets: []        # explicit no-op / reset to defaults

Match semantics

Match on model (exact string or glob), provider, or both
First matching profile wins
No match → current behavior (full toolset, no overrides)

Activation

Applied at session build time (system prompt construction)
Applied immediately on /model switch without requiring /reset
Should be surfaced in /usage or /status so users can confirm which profile is active: Model profile: local-flash (3 toolsets disabled)

Implementation Notes

Session override (/model switch) is handled in cli.py / run_agent.py around where enabled_toolsets/disabled_toolsets are resolved — the profile lookup should happen there
hermes_cli/config.py's DEFAULT_CONFIG would need a model_profiles: [] key
The AIAgent.__init__ already accepts enabled_toolsets and disabled_toolsets — the profile just needs to inject into those before tool discovery runs
This is not a custom_providers extension — it lives in a separate top-level config key so it applies regardless of which provider mechanism is used

Why Not a Separate Profile?

A second profile requires: separate bot token, separate gateway process, separate Telegram bot, separate memory store, and separate cron config. That's 5x the surface area for what is fundamentally a "use fewer tools when I'm on a smaller model" preference. model_profiles is the minimal, correct abstraction.

Real-World Trigger

This was surfaced while running DeepSeek V4-Flash (128K ctx) on a 6-node Apple Silicon Exo cluster. With full toolsets enabled, ~46% of context was consumed before the first user turn. Manually disabling heavy toolsets after each /model switch (tedious and not sticky) brought this to ~23%. A model_profiles config block would make this zero-friction.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering