hermes - 💡(How to fix) Fix guardrail-halt-style silent stream close at partial_stream_recovery + fallback_prior_turn_content sites [4 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Follow-up from #30770 / PR #31448 (which fixed the guardrail_halt case).

The fix added stream_delta_callback(final_response) to the guardrail-halt branch of agent/conversation_loop.py so SSE/TUI clients receive the synthesized halt message before the stream closes. Two structurally-identical sibling sites have the same gap and weren't touched by the original fix:

  1. partial_stream_recovery (~L3556) — final_response = _recovered (text reconstructed from a partial stream after a think block). The recovered text wasn't necessarily streamed in full this turn.
  2. fallback_prior_turn_content (~L3582) — final_response = agent._strip_think_blocks(fallback).strip() where fallback is the previous turn's content. The previous turn's content was streamed on the previous SSE response, so the current SSE writer drains an empty queue.

Both sites: assign final_response to text that wasn't (fully) streamed this turn, then break out of the loop. SSE clients see a finish chunk with zero content delta.

Root Cause

Follow-up from #30770 / PR #31448 (which fixed the guardrail_halt case).

The fix added stream_delta_callback(final_response) to the guardrail-halt branch of agent/conversation_loop.py so SSE/TUI clients receive the synthesized halt message before the stream closes. Two structurally-identical sibling sites have the same gap and weren't touched by the original fix:

  1. partial_stream_recovery (~L3556) — final_response = _recovered (text reconstructed from a partial stream after a think block). The recovered text wasn't necessarily streamed in full this turn.
  2. fallback_prior_turn_content (~L3582) — final_response = agent._strip_think_blocks(fallback).strip() where fallback is the previous turn's content. The previous turn's content was streamed on the previous SSE response, so the current SSE writer drains an empty queue.

Both sites: assign final_response to text that wasn't (fully) streamed this turn, then break out of the loop. SSE clients see a finish chunk with zero content delta.

Fix Action

Fixed

RAW_BUFFERClick to expand / collapse

Context

Follow-up from #30770 / PR #31448 (which fixed the guardrail_halt case).

The fix added stream_delta_callback(final_response) to the guardrail-halt branch of agent/conversation_loop.py so SSE/TUI clients receive the synthesized halt message before the stream closes. Two structurally-identical sibling sites have the same gap and weren't touched by the original fix:

  1. partial_stream_recovery (~L3556) — final_response = _recovered (text reconstructed from a partial stream after a think block). The recovered text wasn't necessarily streamed in full this turn.
  2. fallback_prior_turn_content (~L3582) — final_response = agent._strip_think_blocks(fallback).strip() where fallback is the previous turn's content. The previous turn's content was streamed on the previous SSE response, so the current SSE writer drains an empty queue.

Both sites: assign final_response to text that wasn't (fully) streamed this turn, then break out of the loop. SSE clients see a finish chunk with zero content delta.

Impact

Lower than the guardrail case — these paths trigger on partial-stream / empty-response recovery rather than on every guardrail hit. But the failure shape is identical: blank Open WebUI bubble, indistinguishable from a crash.

Fix sketch

Mirror the guardrail-halt pattern: after assigning final_response and before break, fire it through agent.stream_delta_callback(final_response) if the callback exists. Optionally extract into a small helper since this is now a recurring pattern.

Why a separate issue

Keeping #31448 narrowly scoped to the reported bug. The recovery-path versions are less frequently hit and worth a small extraction pass (helper to dedupe the 3 emit sites) rather than a third inline copy of the same block.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING