hermes - 💡(How to fix) Fix feat(agent): allow pre_llm_call plugins to override model/provider/system_prompt at runtime

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Because the current pre_llm_call result is not applied to the actual runtime/API kwargs, ARC has to patch Hermes core behavior externally. This works, but it is not a good long-term extension pattern.

Fix Action

Fix / Workaround

For model-routing plugins, this means the hook exists at the correct lifecycle point, but its return value cannot affect the actual API kwargs used for the LLM call. Plugins therefore have to monkey-patch AIAgent._run_agent_loop() or similar internals, which is brittle across Hermes releases.

Because the current pre_llm_call result is not applied to the actual runtime/API kwargs, ARC has to patch Hermes core behavior externally. This works, but it is not a good long-term extension pattern.

Code Example

return {
    "context": "Routing note: this turn is a software engineering task.",
    "runtime_override": {
        "provider": "openrouter",
        "model": "anthropic/claude-sonnet-4",
        "system_prompt": "You are an expert software engineer. Be concise and practical.",
    },
}

---

return {
    "runtime_override": {
        "restore_main": True,
    },
}

---

pre_results = invoke_hook("pre_llm_call", ...)
plugin_context_parts = []
runtime_override = {}

for result in pre_results:
    if isinstance(result, str):
        plugin_context_parts.append(result)
    elif isinstance(result, dict):
        if result.get("context"):
            plugin_context_parts.append(str(result["context"]))
        override = result.get("runtime_override") or {}
        # optionally also accept direct top-level model/provider keys
        runtime_override.update(validate_runtime_override(override, result))

if plugin_context_parts:
    inject_ephemeral_user_context(plugin_context_parts)

if runtime_override:
    apply_runtime_override(runtime_override)  # preferably via switch_model/provider resolver

api_kwargs = self._build_api_kwargs(api_messages)
response = client.chat.completions.create(**api_kwargs)
RAW_BUFFERClick to expand / collapse

Problem

Hermes already exposes a pre_llm_call plugin hook around the point where the agent prepares to call the LLM. The hook fires successfully and is documented as a request-scoped extension point, but today the core loop only treats hook results as optional user-message context.

That makes it impossible for a plugin to cleanly override request-time runtime parameters such as:

  • model
  • provider
  • base_url
  • api_key / resolved provider credentials
  • api_mode
  • request-scoped system_prompt / persona overlay

For model-routing plugins, this means the hook exists at the correct lifecycle point, but its return value cannot affect the actual API kwargs used for the LLM call. Plugins therefore have to monkey-patch AIAgent._run_agent_loop() or similar internals, which is brittle across Hermes releases.

Real-world use case

I maintain a community plugin, Hermes ARC / topic_detect, that routes conversations to different models based on topic/domain:

The plugin classifies the latest conversation into domains such as software, math, science, finance, legal, healthcare, writing/language, or general/default. It then needs to switch the active model/provider before the LLM request is made.

Because the current pre_llm_call result is not applied to the actual runtime/API kwargs, ARC has to patch Hermes core behavior externally. This works, but it is not a good long-term extension pattern.

Proposed interface

Allow pre_llm_call hooks to return dict | str | None, where existing behavior remains compatible:

  • str: append as ephemeral user-message context, as today
  • {"context": "..."}: append as ephemeral user-message context, as today
  • {"runtime_override": {...}}: apply request/runtime overrides before building/calling the LLM API
  • optionally, direct top-level keys could be accepted as shorthand if maintainers prefer:
    • model
    • provider
    • base_url
    • api_key
    • api_mode
    • system_prompt
    • restore_main

Example plugin return:

return {
    "context": "Routing note: this turn is a software engineering task.",
    "runtime_override": {
        "provider": "openrouter",
        "model": "anthropic/claude-sonnet-4",
        "system_prompt": "You are an expert software engineer. Be concise and practical.",
    },
}

For a fallback/default route:

return {
    "runtime_override": {
        "restore_main": True,
    },
}

Desired semantics

  1. Invoke pre_llm_call before the tool-calling loop, where it already fires.
  2. Preserve current context-injection behavior for strings and context keys.
  3. Merge validated runtime overrides before the LLM client/API kwargs are built or before the request is sent.
  4. Use Hermes' existing provider/model resolution and switch_model() path where possible, instead of plugins mutating agent fields directly.
  5. Keep overrides request-scoped or explicitly restorable so a specialist route does not accidentally leak into unrelated future turns.
  6. Do not persist injected context or runtime-only plugin metadata into the session DB.
  7. Log invalid/unsupported override keys and continue safely rather than crashing the agent.

Why this belongs in core

This is not only useful for ARC/topic routing. It enables a general plugin pattern for:

  • topic/domain model routing
  • cost-aware routing
  • latency-aware routing
  • provider failover experiments
  • temporary persona overlays
  • compliance/safety routing
  • A/B testing different models by session/platform/topic

Today all of these require either duplicating the agent loop or monkey-patching private internals.

Compatibility

This can be additive and non-breaking:

  • existing pre_llm_call hooks returning None, str, or {"context": ...} keep working
  • runtime override support only activates when a recognized override object/key is returned
  • unsupported keys can be ignored with a warning/debug log

Implementation sketch

Pseudo-flow:

pre_results = invoke_hook("pre_llm_call", ...)
plugin_context_parts = []
runtime_override = {}

for result in pre_results:
    if isinstance(result, str):
        plugin_context_parts.append(result)
    elif isinstance(result, dict):
        if result.get("context"):
            plugin_context_parts.append(str(result["context"]))
        override = result.get("runtime_override") or {}
        # optionally also accept direct top-level model/provider keys
        runtime_override.update(validate_runtime_override(override, result))

if plugin_context_parts:
    inject_ephemeral_user_context(plugin_context_parts)

if runtime_override:
    apply_runtime_override(runtime_override)  # preferably via switch_model/provider resolver

api_kwargs = self._build_api_kwargs(api_messages)
response = client.chat.completions.create(**api_kwargs)

ARC's current monkey-patch can serve as a concrete reference for the behavior needed, but the upstream implementation can be smaller and cleaner.

Questions for maintainers

  1. Would maintainers prefer runtime_override as a nested key, or direct top-level keys in the hook return dict?
  2. Should system_prompt overrides be allowed, or should the first version only support model/provider/client routing?
  3. Should overrides be one-turn scoped by default, or should plugins explicitly return restore_main when they want to revert?

If this direction sounds acceptable, I can open a PR with tests and docs following CONTRIBUTING.md.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING