hermes - 💡(How to fix) Fix feat(agent): allow pre_llm_call plugins to override model/provider/system

Fix Action

Fix / Workaround

For model-routing plugins, this means the hook exists at the correct lifecycle point, but its return value cannot affect the actual API kwargs used for the LLM call. Plugins therefore have to monkey-patch AIAgent._run_agent_loop() or similar internals, which is brittle across Hermes releases.

repo: https://github.com/ShockShoot/hermes-arc
monkey-patch reference: https://github.com/ShockShoot/hermes-arc/blob/main/patch_run_agent.py

Because the current pre_llm_call result is not applied to the actual runtime/API kwargs, ARC has to patch Hermes core behavior externally. This works, but it is not a good long-term extension pattern.

Code Example

return {
    "context": "Routing note: this turn is a software engineering task.",
    "runtime_override": {
        "provider": "openrouter",
        "model": "anthropic/claude-sonnet-4",
        "system_prompt": "You are an expert software engineer. Be concise and practical.",
    },
}

---

return {
    "runtime_override": {
        "restore_main": True,
    },
}

---

pre_results = invoke_hook("pre_llm_call", ...)
plugin_context_parts = []
runtime_override = {}

for result in pre_results:
    if isinstance(result, str):
        plugin_context_parts.append(result)
    elif isinstance(result, dict):
        if result.get("context"):
            plugin_context_parts.append(str(result["context"]))
        override = result.get("runtime_override") or {}
        # optionally also accept direct top-level model/provider keys
        runtime_override.update(validate_runtime_override(override, result))

if plugin_context_parts:
    inject_ephemeral_user_context(plugin_context_parts)

if runtime_override:
    apply_runtime_override(runtime_override)  # preferably via switch_model/provider resolver

api_kwargs = self._build_api_kwargs(api_messages)
response = client.chat.completions.create(**api_kwargs)

Problem

Hermes already exposes a pre_llm_call plugin hook around the point where the agent prepares to call the LLM. The hook fires successfully and is documented as a request-scoped extension point, but today the core loop only treats hook results as optional user-message context.

That makes it impossible for a plugin to cleanly override request-time runtime parameters such as:

model
provider
base_url
api_key / resolved provider credentials
api_mode
request-scoped system_prompt / persona overlay

Real-world use case

I maintain a community plugin, Hermes ARC / topic_detect, that routes conversations to different models based on topic/domain:

repo: https://github.com/ShockShoot/hermes-arc
monkey-patch reference: https://github.com/ShockShoot/hermes-arc/blob/main/patch_run_agent.py

The plugin classifies the latest conversation into domains such as software, math, science, finance, legal, healthcare, writing/language, or general/default. It then needs to switch the active model/provider before the LLM request is made.

Proposed interface

Allow pre_llm_call hooks to return dict | str | None, where existing behavior remains compatible:

str: append as ephemeral user-message context, as today
{"context": "..."}: append as ephemeral user-message context, as today
{"runtime_override": {...}}: apply request/runtime overrides before building/calling the LLM API
optionally, direct top-level keys could be accepted as shorthand if maintainers prefer:
- model
- provider
- base_url
- api_key
- api_mode
- system_prompt
- restore_main

Example plugin return:

return {
    "context": "Routing note: this turn is a software engineering task.",
    "runtime_override": {
        "provider": "openrouter",
        "model": "anthropic/claude-sonnet-4",
        "system_prompt": "You are an expert software engineer. Be concise and practical.",
    },
}

For a fallback/default route:

return {
    "runtime_override": {
        "restore_main": True,
    },
}

Desired semantics

Invoke pre_llm_call before the tool-calling loop, where it already fires.
Preserve current context-injection behavior for strings and context keys.
Merge validated runtime overrides before the LLM client/API kwargs are built or before the request is sent.
Use Hermes' existing provider/model resolution and switch_model() path where possible, instead of plugins mutating agent fields directly.
Keep overrides request-scoped or explicitly restorable so a specialist route does not accidentally leak into unrelated future turns.
Do not persist injected context or runtime-only plugin metadata into the session DB.
Log invalid/unsupported override keys and continue safely rather than crashing the agent.

Why this belongs in core

This is not only useful for ARC/topic routing. It enables a general plugin pattern for:

topic/domain model routing
cost-aware routing
latency-aware routing
provider failover experiments
temporary persona overlays
compliance/safety routing
A/B testing different models by session/platform/topic

Today all of these require either duplicating the agent loop or monkey-patching private internals.

Compatibility

This can be additive and non-breaking:

existing pre_llm_call hooks returning None, str, or {"context": ...} keep working
runtime override support only activates when a recognized override object/key is returned
unsupported keys can be ignored with a warning/debug log

Implementation sketch

Pseudo-flow:

pre_results = invoke_hook("pre_llm_call", ...)
plugin_context_parts = []
runtime_override = {}

for result in pre_results:
    if isinstance(result, str):
        plugin_context_parts.append(result)
    elif isinstance(result, dict):
        if result.get("context"):
            plugin_context_parts.append(str(result["context"]))
        override = result.get("runtime_override") or {}
        # optionally also accept direct top-level model/provider keys
        runtime_override.update(validate_runtime_override(override, result))

if plugin_context_parts:
    inject_ephemeral_user_context(plugin_context_parts)

if runtime_override:
    apply_runtime_override(runtime_override)  # preferably via switch_model/provider resolver

api_kwargs = self._build_api_kwargs(api_messages)
response = client.chat.completions.create(**api_kwargs)

ARC's current monkey-patch can serve as a concrete reference for the behavior needed, but the upstream implementation can be smaller and cleaner.

Questions for maintainers

Would maintainers prefer runtime_override as a nested key, or direct top-level keys in the hook return dict?
Should system_prompt overrides be allowed, or should the first version only support model/provider/client routing?
Should overrides be one-turn scoped by default, or should plugins explicitly return restore_main when they want to revert?

If this direction sounds acceptable, I can open a PR with tests and docs following CONTRIBUTING.md.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix feat(agent): allow pre_llm_call plugins to override model/provider/system_prompt at runtime

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Problem

Real-world use case

Proposed interface

Desired semantics

Why this belongs in core

Compatibility

Implementation sketch

Questions for maintainers

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix feat(agent): allow pre_llm_call plugins to override model/provider/system_prompt at runtime

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Problem

Real-world use case

Proposed interface

Desired semantics

Why this belongs in core

Compatibility

Implementation sketch

Questions for maintainers

Still need to ship something?

RELATED_DISCOVERY

TRENDING