hermes - ✅(Solved) Fix Race condition: /new during active agent session never sends response (Telegram gateway) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18912Fetched 2026-05-03 04:53:37
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
labeled ×4commented ×1cross-referenced ×1

Root Cause

The /new command takes a different code path depending on whether an agent is running:

Fix Action

Workaround

Wait for the current response to finish before typing /new, or use /stop first.

PR fix notes

PR #18915: fix(gateway): send /new response before cancel_session_processing to avoid race (#18912)

Description (problem / solution / changelog)

Summary

Fixes #18912

When /new is issued via Telegram while an agent is actively processing a message, the session resets correctly but the "✨ Session reset!" confirmation is never sent. The bot shows "typing..." indefinitely with no reply.

Root Cause

In _dispatch_active_session_command() (gateway/platforms/base.py line 2491-2512), the response send happens after cancel_session_processing():

response = await self._message_handler(event)
await self.cancel_session_processing(...)  # ← cancel first
_text, _eph_ttl = self._unwrap_ephemeral(response)
if _text:
    _r = await self._send_with_retry(...)   # ← send after cancel

Task cancellation side effects (the 5-second asyncio.wait_for(asyncio.shield(task)) in cancel_session_processing, plus cleanup by the cancelled task's finally blocks) can interfere with the send, silently dropping the confirmation.

In contrast, when no agent is running, the response goes through _process_message_background() (which logs "Sending response") and works correctly.

Fix

  1. Reorder: Send the response before cancelling the old task. This eliminates the race — by the time cancellation runs, the confirmation is already sent.

  2. Add logging: Added logger.info("[%s] Sending command '/%s' response (%d chars) to %s", ...) at the send point, matching the pattern at line 2800 in _process_message_background. This makes future send failures visible in logs instead of silently disappearing.

Testing

Manual reproduction:

  1. Send a long-running request to the bot on Telegram
  2. While the agent is still processing, type /new
  3. Before fix: Session resets but confirmation never sent (bot shows "typing...")
  4. After fix: Confirmation "✨ Session reset!" is delivered immediately

Code Path Comparison

PathAgent running?DispatchResponse mechanismBroken?
ANo_process_message_backgroundNormal send✅ Works
BYes_dispatch_active_session_commandSend after cancel❌ Race condition
B (fixed)Yes_dispatch_active_session_commandSend before cancel✅ Fixed

Changed files

  • gateway/platforms/base.py (modified, +19/-7)

Code Example

00:43:36Invalidated run generation (session_reset)
00:43:37Sending response (223 chars) to 95787569

---

01:17:40Invalidated run generation (new_command)
01:17:40Invalidated run generation (session_reset)
 (no "Sending response" follows — gaps indefinitely until next user message)
RAW_BUFFERClick to expand / collapse

Bug Description

When /new is issued via Telegram while an agent is actively processing a message (mid-response), the session is reset correctly but the confirmation response is never sent back to the user. The bot shows "typing..." indefinitely with no reply.

Steps to Reproduce

  1. Send a long-running request to the bot on Telegram (a prompt that takes 10+ seconds)
  2. While the agent is still processing and showing "typing...", type /new
  3. Observe: session resets (logs show Invalidated run generation (new_command) + (session_reset)) but the "✨ Session reset!" confirmation is never sent
  4. Any subsequent message sent after this works fine — the gateway is alive and responsive

Contrast with Working Case

When no agent is running and /new is issued, it works perfectly:

✅ 00:43:36 — Invalidated run generation (session_reset)
✅ 00:43:37 — Sending response (223 chars) to 95787569

When an agent IS running, the response goes missing:

❌ 01:17:40 — Invalidated run generation (new_command)
❌ 01:17:40 — Invalidated run generation (session_reset)
⏳ (no "Sending response" follows — gaps indefinitely until next user message)

Root Cause Analysis

The /new command takes a different code path depending on whether an agent is running:

Code Path A — Agent NOT running (works)

_handle_message() → normal command dispatch at gateway/run.py line 4992: if canonical == "new": return await self._handle_reset_command(event) → response sent via _process_message_background

Code Path B — Agent IS running (broken)

handle_message() → adapter detects active session at base.py line 2553 → routes to _dispatch_active_session_command() (line 2574) for new/reset/stop → inside _handle_message(), early intercept at run.py line 4666 fires → _interrupt_and_clear_session() (line 4668) → _handle_reset_command() (line 4676) → returns EphemeralReply → back in _dispatch_active_session_command, calls cancel_session_processing() (line 2494) → attempts _send_with_retry() (line 2501)

The response is generated (the EphemeralReply from _handle_reset_command) but never reaches Telegram.

Key observations:

  1. Invisible failure — The Sending response log (base.py line 2800) only fires in _process_message_background, not in the _dispatch_active_session_command path. This makes the failure invisible without deep code inspection.

  2. Race in _dispatch_active_session_command — After receiving the response at line 2491, it cancels the old session task at line 2494 via cancel_session_processing(). The interaction between this cancellation and the response send at line 2501 is suspect:

    • cancel_session_processing pops _session_tasks[session_key] and calls task.cancel() (line 2420)
    • asyncio.wait_for(asyncio.shield(task), timeout=5.0) awaits the cancelled task
  3. Interleaved state mutations_interrupt_and_clear_session() (called from _handle_message at line 4668) already calls _release_running_agent_state(), adapter.interrupt_session_activity() (which sets the interrupt Event and calls stop_typing()), and adapter.get_pending_message(). These overlap with cancel_session_processing() called later by the adapter, creating a window for lost messages.

Key Code Locations

FileLineDescription
gateway/run.py4666-4676Early intercept for /new when agent is running
gateway/run.py6579-6691_handle_reset_command() — works correctly in isolation
gateway/run.py11096-11119_interrupt_and_clear_session() — interrupts agent + clears state
gateway/platforms/base.py2460-2523_dispatch_active_session_command() — the race-prone dispatch
gateway/platforms/base.py2393-2441cancel_session_processing() — cancels old task after response received

Suggested Fix

In _dispatch_active_session_command() (base.py line 2491-2512):

  1. Add logging — add a logger.info("[%s] Sending response (%d chars) to %s", self.name, len(text_content), event.source.chat_id) matching the one at line 2800 so future failures are visible
  2. Ensure send is cancellation-safe — wrap _send_with_retry so it cannot be affected by the concurrent cancel_session_processing at line 2494. Consider sending the response before cancelling the old task (swap lines 2491-2494 and 2499-2512)
  3. Verify task isolation — confirm cancel_session_processing() targets only the background processing task and not the coroutine executing _dispatch_active_session_command()

Workaround

Wait for the current response to finish before typing /new, or use /stop first.

Environment

  • OS: macOS (Silicon)
  • Gateway: launchd service, Telegram polling mode
  • Adapter: telegram.py (python-telegram-bot v20+)
  • Provider: DeepSeek (deepseek-v4-flash)

extent analysis

TL;DR

The most likely fix involves modifying the _dispatch_active_session_command function to ensure the response is sent before cancelling the old task, and adding logging to make future failures visible.

Guidance

  • Modify the _dispatch_active_session_command function to send the response before cancelling the old task by swapping lines 2491-2494 and 2499-2512.
  • Add a log statement to track when a response is sent, similar to the one at line 2800, to make future failures visible.
  • Verify that cancel_session_processing only targets the background processing task and not the coroutine executing _dispatch_active_session_command.
  • Consider using a workaround by waiting for the current response to finish before typing /new, or using /stop first.

Example

No code snippet is provided as the issue is more related to the logic and sequence of operations rather than a specific code error.

Notes

The provided analysis suggests a race condition between sending the response and cancelling the old task. The suggested fix aims to address this by changing the order of operations and adding logging for better visibility.

Recommendation

Apply the workaround of waiting for the current response to finish before typing /new, or using /stop first, until the root cause can be fully addressed and the suggested fix can be implemented and tested.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Race condition: /new during active agent session never sends response (Telegram gateway) [1 pull requests, 1 comments, 2 participants]