hermes - ✅(Solved) Fix platforms: split storage from LLM-invocation gate (group-chat 'observe but don't invoke' mode) [1 pull requests, 1 participants]

hermes2026-04-25 12:02:39

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#15621•Fetched 2026-04-26 05:26:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jscholz

Participants

jscholz

Timeline (top)

referenced ×4labeled ×3cross-referenced ×1

Group-chat platforms (WhatsApp, Slack, Telegram, Discord) today have a single gate (require_mention / equivalent) that conflates two semantically distinct concerns:

Should this message be persisted as conversation history?
Should the LLM be invoked for this turn?

When require_mention=true, untagged messages are dropped entirely — they never reach state.db. When the bot is later @-mentioned, the agent has no context of the conversation that preceded the mention. When require_mention=false, the LLM runs on every message — expensive in tokens, wider prompt-injection surface, often unwanted because the user explicitly does NOT want the bot replying to ambient banter.

The natural mode for group chats is observe always, invoke on tag:

Untagged group messages → store in conversation history, agent has context for next mention
@-mentioned messages → store + invoke LLM as today

Neither current setting expresses this. Filing this issue to discuss the design before putting up a PR.

Root Cause

Fix Action

Fix / Workaround

Dispatcher branch on the result. For STORE_ONLY:

Append [<sender_name>] <body> to the session's conversation_history (using the existing sender-prefix patch from #15413)
Persist to state.db/messages via the same SessionState write path the agent loop uses
Skip agent.run / LLM invocation entirely
No SSE event flows to clients (no reply being computed)

PR fix notes

PR #15633: fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry

Repository: NousResearch/hermes-agent
Author: teknium1
State: closed | merged: True
Link: https://github.com/NousResearch/hermes-agent/pull/15633

Description (problem / solution / changelog)

Summary

Generalizes the temperature-specific 400 retry that landed in PR #15621 so the same reactive strategy covers any rejected request parameter, and fixes a latent bug in the pre-existing max_tokens → max_completion_tokens retry branch.

Credit @nicholasrae (PR #15416) for the generalization pattern. His PR also proposed the temperature retry, which landed independently via #15621 + #15623.

Changes

agent/auxiliary_client.py:
- New _is_unsupported_parameter_error(exc, param) — matches the same phrasings as the old temperature detector plus unrecognized parameter / invalid parameter, against any named param.
- _is_unsupported_temperature_error is now a back-compat wrapper (existing imports/tests unchanged).
- max_tokens retry branch now gates on max_tokens is not None (was silently assigning max_completion_tokens = None on the retry) and ALSO matches via the generic helper, so phrasings like Unknown parameter: max_tokens no longer slip through.
tests/agent/test_unsupported_parameter_retry.py: 18 new tests.

Validation

	Before	After
`tests/agent/test_unsupported_temperature_retry.py`	19 pass	19 pass
`tests/agent/test_unsupported_parameter_retry.py`	—	18 pass
`tests/run_agent/test_flush_memories_codex.py` + `tests/agent/test_auxiliary_client.py`	86 pass	86 pass

No behavior change for the reported bug (that's fixed by #15621 + #15623 on main). This PR only hardens the surrounding retry ladder for future provider quirks and the latent None-max_tokens edge case.

Changed files

agent/auxiliary_client.py (modified, +40/-15)
tests/agent/test_unsupported_parameter_retry.py (added, +201/-0)

Code Example

class HandlingMode(Enum):
    DROP = "drop"               # filtered (allowlist fail, bot's own echo, etc.)
    STORE_ONLY = "store_only"   # append to history, DON'T invoke agent loop
    PROCESS = "process"         # full path — store + invoke

def _classify(self, data) -> HandlingMode:
    if not self._is_allowed(data):
        return HandlingMode.DROP
    if not self._is_group(data):
        return HandlingMode.PROCESS
    if data["chatId"] in self._whatsapp_free_response_chats():
        return HandlingMode.PROCESS
    if self._whatsapp_require_mention():
        if self._message_addresses_bot(data):
            return HandlingMode.PROCESS
        return HandlingMode.STORE_ONLY    # ← NEW: today this is DROP
    return HandlingMode.PROCESS

RAW_BUFFERClick to expand / collapse

Summary

Group-chat platforms (WhatsApp, Slack, Telegram, Discord) today have a single gate (require_mention / equivalent) that conflates two semantically distinct concerns:

Should this message be persisted as conversation history?
Should the LLM be invoked for this turn?

The natural mode for group chats is observe always, invoke on tag:

Untagged group messages → store in conversation history, agent has context for next mention
@-mentioned messages → store + invoke LLM as today

Neither current setting expresses this. Filing this issue to discuss the design before putting up a PR.

Proposed shape

Tri-state gate in gateway/platforms/<adapter>.py:

class HandlingMode(Enum):
    DROP = "drop"               # filtered (allowlist fail, bot's own echo, etc.)
    STORE_ONLY = "store_only"   # append to history, DON'T invoke agent loop
    PROCESS = "process"         # full path — store + invoke

def _classify(self, data) -> HandlingMode:
    if not self._is_allowed(data):
        return HandlingMode.DROP
    if not self._is_group(data):
        return HandlingMode.PROCESS
    if data["chatId"] in self._whatsapp_free_response_chats():
        return HandlingMode.PROCESS
    if self._whatsapp_require_mention():
        if self._message_addresses_bot(data):
            return HandlingMode.PROCESS
        return HandlingMode.STORE_ONLY    # ← NEW: today this is DROP
    return HandlingMode.PROCESS

Dispatcher branch on the result. For STORE_ONLY:

Append [<sender_name>] <body> to the session's conversation_history (using the existing sender-prefix patch from #15413)
Persist to state.db/messages via the same SessionState write path the agent loop uses
Skip agent.run / LLM invocation entirely
No SSE event flows to clients (no reply being computed)

Same abstraction applies cleanly to slack/telegram/discord — they all have the same should_process shape today and would benefit identically.

Why this matters cross-platform

Every group-chat bot deployment hits this. Discord bots, Slack bots, even IRC bots have the same problem — you want context awareness across the channel, but you don't want the model running on every message. The current binary is the wrong abstraction.

Backward compatibility

Existing require_mention=false behavior unchanged — every message still triggers PROCESS.
Existing require_mention=true behavior CHANGES — instead of dropping untagged, store them. New WHATSAPP_DROP_UNTAGGED=true (default false) preserves the old drop-entirely behavior for users who explicitly want that (cost minimization, privacy, etc.).

If maintainers prefer not to flip the default, an explicit WHATSAPP_GROUP_MODE: drop|store|process enum is also fine.

PR coming

I'll put up a PR shortly with the implementation. Wanted to file the issue first so the design is debatable before code. The reference deployment is my own setup — group chat with two friends; today I have to choose between "agent has no context of any banter" (require_mention=true) and "agent runs on every troll message" (require_mention=false). The store-only mode resolves it cleanly.

extent analysis

TL;DR

Implement a tri-state gate in the gateway to handle messages as "drop", "store_only", or "process" to address the issue of conflating conversation history and LLM invocation.

Guidance

Introduce a new HandlingMode enum with "drop", "store_only", and "process" values to replace the binary require_mention flag.
Update the _classify method to return the new HandlingMode based on the message data and platform-specific rules.
Implement a dispatcher branch to handle "store_only" messages by appending to conversation history and persisting to state.db without invoking the LLM.
Consider adding a new configuration option, such as WHATSAPP_GROUP_MODE, to allow users to choose between the new behavior and the old "drop" behavior.

Example

class HandlingMode(Enum):
    DROP = "drop"
    STORE_ONLY = "store_only"
    PROCESS = "process"

def _classify(self, data) -> HandlingMode:
    # ...
    if self._whatsapp_require_mention():
        if self._message_addresses_bot(data):
            return HandlingMode.PROCESS
        return HandlingMode.STORE_ONLY
    # ...

Notes

The proposed solution assumes that the existing sender-prefix patch from #15413 can be reused to append sender information to the conversation history.

Recommendation

Apply the proposed tri-state gate workaround to address the issue, as it provides a more fine-grained control over message handling and resolves the conflation of conversation history and LLM invocation.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#conversation history #search optimization #API routing #API middleware #SSR setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix platforms: split storage from LLM-invocation gate (group-chat 'observe but don't invoke' mode) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #15633: fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry

Description (problem / solution / changelog)

Summary

Changes

Validation

Changed files

Code Example

Summary

Proposed shape

Why this matters cross-platform

Backward compatibility

PR coming

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix platforms: split storage from LLM-invocation gate (group-chat 'observe but don't invoke' mode) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #15633: fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry

Description (problem / solution / changelog)

Summary

Changes

Validation

Changed files

Code Example

Summary

Proposed shape

Why this matters cross-platform

Backward compatibility

PR coming

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING