hermes - 💡(How to fix) Fix Stale stream + partial-stream-stub creates unrecoverable retry loop on large tool calls

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When a streaming API call goes stale mid-tool-call (e.g. a large write_file), the partial-stream-stub recovery path sets finish_reason="stop" with tool_calls=None. This causes the conversation loop to treat the turn as a completed text response, returning only the ~116-char warning message. When the user says "continue", the model retries the same large tool call, hits the same stale stream, and the cycle repeats indefinitely.

Root Cause

In agent/chat_completion_helpers.py (~line 2186-2207):

  • When a partial stream has pending tool call names, _stub_finish_reason is set to "stop" (not "length")
  • The stub has tool_calls=None

In agent/conversation_loop.py:

  • finish_reason="stop" skips the finish_reason == "length" continuation/retry branch entirely
  • assistant_message.tool_calls is None, so it falls through to the text-response path
  • The 116-char warning becomes final_response and the turn ends
  • On retry, the model attempts the identical large tool call → same timeout → same result

Fix Action

Fix / Workaround

  1. Detect repeated stale-stream failures on the same tool call pattern. After 2+ consecutive partial-stream-stubs dropping the same tool name, inject a system message telling the model to break the output into smaller chunks (e.g. "Your previous write_file was too large and the stream timed out. Break the content into smaller pieces or use multiple patch calls.")
RAW_BUFFERClick to expand / collapse

Summary

When a streaming API call goes stale mid-tool-call (e.g. a large write_file), the partial-stream-stub recovery path sets finish_reason="stop" with tool_calls=None. This causes the conversation loop to treat the turn as a completed text response, returning only the ~116-char warning message. When the user says "continue", the model retries the same large tool call, hits the same stale stream, and the cycle repeats indefinitely.

Reproduction

  1. Ask the agent (via gateway) to produce a large write_file output (e.g. 15-30K+ tokens of HTML)
  2. The inference stream goes stale (no chunks for 180s)
  3. Stale-stream detector kills the connection
  4. Agent returns: "⚠ Stream stalled mid tool-call (write_file); the action was not executed. Ask me to retry if you want to continue."
  5. User says "continue" → same result, loops forever

Root Cause

In agent/chat_completion_helpers.py (~line 2186-2207):

  • When a partial stream has pending tool call names, _stub_finish_reason is set to "stop" (not "length")
  • The stub has tool_calls=None

In agent/conversation_loop.py:

  • finish_reason="stop" skips the finish_reason == "length" continuation/retry branch entirely
  • assistant_message.tool_calls is None, so it falls through to the text-response path
  • The 116-char warning becomes final_response and the turn ends
  • On retry, the model attempts the identical large tool call → same timeout → same result

Design Issue

The finish_reason="stop" choice is intentional (comment says "the agent should hand control back rather than auto-retry a tool call that may have side-effects"). But this assumes the user manually intervenes meaningfully. In practice, users say "continue" and the model just retries the same thing.

Suggested Fixes

  1. Detect repeated stale-stream failures on the same tool call pattern. After 2+ consecutive partial-stream-stubs dropping the same tool name, inject a system message telling the model to break the output into smaller chunks (e.g. "Your previous write_file was too large and the stream timed out. Break the content into smaller pieces or use multiple patch calls.")

  2. Consider using finish_reason="length" for partial-stream-stubs with dropped tool calls so the existing continuation machinery can handle them, with appropriate guards against re-executing side-effectful tools.

  3. Upstream: Investigate why long-running streams (~60-130s generation time for large tool call arguments) go stale through the inference proxy. The 180s stale-stream timeout may be shorter than the proxy's own idle timeout, or TCP keepalives may not be reaching through.

Affected Code

  • agent/chat_completion_helpers.pyinterruptible_streaming_api_call(), partial-stream-stub construction
  • agent/conversation_loop.py — finish_reason branching, no detection of repeated stale-stream patterns

Related

  • #25689 (stale stream timeout does not trigger fallback chain)
  • #31128 (stale-stream handler tries to rebuild OpenAI client when provider is Anthropic)
  • #28161 (Anthropic streaming: stale/retry paths cause hangs)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING