hermes - 💡(How to fix) Fix Discord streaming can duplicate final responses across chunked delivery [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#25349Fetched 2026-05-14 03:47:08
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
labeled ×4cross-referenced ×1

When Discord streaming/chunked delivery is enabled, Hermes can send the same final assistant response twice. In the observed case the agent generated one final answer, but Discord received two identical groups of response chunks about 30 seconds apart.

Error Message

  • One user message triggered one agent turn.
  • The local session/transcript showed a single assistant final response for that turn.
  • Discord received the same final response twice, split into the same chunk sequence.
  • The duplicate was not a second model generation; it appears to be delivery-level duplication.

Root Cause

The bug is noisy and confusing in Discord threads, especially for long assistant responses that are split into multiple chunks. It also makes it hard to distinguish a real second turn from a delivery duplicate.

Code Example

T+0s   bot chunk 1, hash A
T+1s   bot chunk 2, hash B
T+2s   bot chunk 3, hash C
T+30s  bot chunk 1, hash A
T+31s  bot chunk 2, hash B
T+32s  bot chunk 3, hash C
RAW_BUFFERClick to expand / collapse

Summary

When Discord streaming/chunked delivery is enabled, Hermes can send the same final assistant response twice. In the observed case the agent generated one final answer, but Discord received two identical groups of response chunks about 30 seconds apart.

Observed behavior

  • One user message triggered one agent turn.
  • The local session/transcript showed a single assistant final response for that turn.
  • Discord received the same final response twice, split into the same chunk sequence.
  • The duplicate was not a second model generation; it appears to be delivery-level duplication.

Sanitized shape of the observed Discord output:

T+0s   bot chunk 1, hash A
T+1s   bot chunk 2, hash B
T+2s   bot chunk 3, hash C
T+30s  bot chunk 1, hash A
T+31s  bot chunk 2, hash B
T+32s  bot chunk 3, hash C

Expected behavior

A final assistant response should be delivered exactly once per session/generation/final-response. If streaming already delivered the final body, the normal final-send path should be suppressed. If a fallback/retry path runs, it should be idempotent and should not resend the exact same final content to the same Discord target.

Suspected area

This looks related to coordination between:

  • gateway/stream_consumer.py final delivery tracking;
  • gateway/run.py normal final-send suppression via already_sent / final_response_sent;
  • gateway/platforms/base.py _send_with_retry;
  • gateway/platforms/discord.py chunked send() behavior for messages over the 2000-char Discord limit.

A robust fix may need a delivery idempotency guard keyed by platform, chat/thread, session/generation, response hash, and delivery kind, so streamed/fallback/normal final paths cannot deliver the same final response twice.

Why this matters

The bug is noisy and confusing in Discord threads, especially for long assistant responses that are split into multiple chunks. It also makes it hard to distinguish a real second turn from a delivery duplicate.

Notes

This is not about duplicate model generation. The observed agent/session state had one final response; the duplication happened at the Discord delivery layer.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A final assistant response should be delivered exactly once per session/generation/final-response. If streaming already delivered the final body, the normal final-send path should be suppressed. If a fallback/retry path runs, it should be idempotent and should not resend the exact same final content to the same Discord target.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING