hermes - 💡(How to fix) Fix Silent failure: stream timeout after partial text delivery shows no error to user

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When a streaming API call times out after some text has already been delivered to the user but before any tool calls are generated, the system silently returns a truncated stub response with no user-facing error message. The conversation turn ends as if it completed normally, but the model's intended actions (tool calls) were never executed.

Error Message

else: # Text-only partial delivery — warn user that response may be incomplete _warn = ( "\n\n⚠ Stream timed out during response. " "The response may be incomplete. " "Ask me to retry if you want to continue." ) _partial_text = (_partial_text or "") + _warn try: agent._fire_stream_delta(_warn) except Exception: pass logger.warning( "Partial stream delivered before error; ...", len(_partial_text or ""), result["error"], )

Root Cause

In agent/chat_completion_helpers.py, the interruptible_streaming_api_call function has this logic (around line 2074):

if result["error"] is not None:
    if deltas_were_sent["yes"]:
        # Some content already delivered — can't retry (would duplicate)
        if _partial_names:  # tool calls were in-flight
            # ✅ GOOD: Shows warning "⚠ Stream stalled mid tool-call..."
        else:  # just text, no tool calls
            # ❌ BUG: Only logs a warning, returns stub silently
            logger.warning("Partial stream delivered before error...")
            # Returns stub response with partial text, turn ends "normally"
    raise result["error"]  # Only reached if nothing was sent yet

The else branch (text-only partial delivery) produces no user-facing signal that the stream failed. The stub response has finish_reason="stop", so the conversation loop treats it as a successful turn.

Code Example

if result["error"] is not None:
    if deltas_were_sent["yes"]:
        # Some content already delivered — can't retry (would duplicate)
        if _partial_names:  # tool calls were in-flight
            # ✅ GOOD: Shows warning "⚠ Stream stalled mid tool-call..."
        else:  # just text, no tool calls
            # ❌ BUG: Only logs a warning, returns stub silently
            logger.warning("Partial stream delivered before error...")
            # Returns stub response with partial text, turn ends "normally"
    raise result["error"]  # Only reached if nothing was sent yet

---

Stream timed out after partial delivery. The response may be incomplete.
Ask me to retry if you want to continue.

---

else:
    # Text-only partial delivery — warn user that response may be incomplete
    _warn = (
        "\n\n⚠ Stream timed out during response. "
        "The response may be incomplete. "
        "Ask me to retry if you want to continue."
    )
    _partial_text = (_partial_text or "") + _warn
    try:
        agent._fire_stream_delta(_warn)
    except Exception:
        pass
    logger.warning(
        "Partial stream delivered before error; ...",
        len(_partial_text or ""), result["error"],
    )
RAW_BUFFERClick to expand / collapse

Description

When a streaming API call times out after some text has already been delivered to the user but before any tool calls are generated, the system silently returns a truncated stub response with no user-facing error message. The conversation turn ends as if it completed normally, but the model's intended actions (tool calls) were never executed.

Root Cause

In agent/chat_completion_helpers.py, the interruptible_streaming_api_call function has this logic (around line 2074):

if result["error"] is not None:
    if deltas_were_sent["yes"]:
        # Some content already delivered — can't retry (would duplicate)
        if _partial_names:  # tool calls were in-flight
            # ✅ GOOD: Shows warning "⚠ Stream stalled mid tool-call..."
        else:  # just text, no tool calls
            # ❌ BUG: Only logs a warning, returns stub silently
            logger.warning("Partial stream delivered before error...")
            # Returns stub response with partial text, turn ends "normally"
    raise result["error"]  # Only reached if nothing was sent yet

The else branch (text-only partial delivery) produces no user-facing signal that the stream failed. The stub response has finish_reason="stop", so the conversation loop treats it as a successful turn.

What the User Sees

  1. Model starts outputting text (e.g., "Understood!I'll.....")
  2. User sees this text appear in the terminal
  3. Stream times out (120s read timeout exceeded)
  4. The turn silently ends — no error, no retry, no warning
  5. The timer stops (appears "stuck")
  6. Tool calls the model was about to make are never executed
  7. User waits indefinitely thinking the model is still working

Expected Behavior

When a stream times out after partial text delivery (no tool calls in flight), the user should still see a warning message, similar to the existing tool-call-in-flight case:

⚠ Stream timed out after partial delivery. The response may be incomplete.
Ask me to retry if you want to continue.

Steps to Reproduce

  1. Use a reasoning model (e.g., mimo-v2.5-pro) with a large context (40K+ tokens)
  2. Send a request that requires tool calls (e.g., "help me redesign this webpage")
  3. The model starts reasoning, begins outputting text
  4. The stream read timeout (120s default) is exceeded before tool calls are generated
  5. The turn ends silently with truncated text

Environment

  • Hermes Agent v0.14.0
  • Provider: xiaomi (mimo-v2.5-pro) / Z.AI (glm-5.1)
  • Default stream read timeout: 120s
  • Large context sessions (40K+ tokens)

Frequency

Observed 25+ times across 3 days (May 18, 24, 25) with both xiaomi and Z.AI providers. More frequent with reasoning models and large contexts.

Relevant Code

  • agent/chat_completion_helpers.py lines 2074-2138 — the silent stub path
  • agent/chat_completion_helpers.py lines 1378-1380 — HERMES_STREAM_READ_TIMEOUT default 120s
  • agent/conversation_loop.py lines 1140-1155 — spinner/callback cleanup after streaming

Suggested Fix

In the else branch (line 2118-2125), add a user-visible warning similar to the tool-call case:

else:
    # Text-only partial delivery — warn user that response may be incomplete
    _warn = (
        "\n\n⚠ Stream timed out during response. "
        "The response may be incomplete. "
        "Ask me to retry if you want to continue."
    )
    _partial_text = (_partial_text or "") + _warn
    try:
        agent._fire_stream_delta(_warn)
    except Exception:
        pass
    logger.warning(
        "Partial stream delivered before error; ...",
        len(_partial_text or ""), result["error"],
    )

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Silent failure: stream timeout after partial text delivery shows no error to user