hermes - 💡(How to fix) Fix Stale stream + partial-stream-stub creates unrecoverable retry loop on large tool calls

StepCodex · 2026-05-25T10:04:30Z

[hermes] When a streaming API call goes stale mid-tool-call e.g. a large write file , the partial-stream-stub recovery path sets finish reason="stop" with tool… When a streaming API call goes stale mid-tool-call (e.g. a large `write_file`), the partial-stream-stub recovery path sets `finish_reason="stop"` with `tool_calls=None`. This causes the conversation loop to treat the turn as a completed text response, returning only the ~116-char warning message. When the user says "continue", the model retries the same large tool call, hits the same stale stream, and the cycle repeats indefinitely. ## Fix / Workaround 1. **Detect repeated stale-stream failures on the same tool call pattern.** After 2+ consecutive partial-stream-stubs dropping the same tool name, inject a system message telling the model to break the output into smaller chunks (e.g. "Your previous write_file was too large and the stream timed out. Break the content into smaller pieces or use multiple patch calls.") ## Summary When a streaming API call goes stale mid-tool-call (e.g. a large `write_file`), the partial-stream-stub recovery path sets `finish_reason="stop"` with `tool_calls=None`. This causes the conversation loop to treat the turn as a completed text response, returning only the ~116-char warning message. When the user says "continue", the model retries the same large tool call, hits the same stale stream, and the cycle repeats indefinitely. ## Reproduction 1. Ask the agent (via gateway) to produce a large `write_file` output (e.g. 15-30K+ tokens of HTML) 2. The inference stream goes stale (no chunks for 180s) 3. Stale-stream detector kills the connection 4. Agent returns: `"⚠ Stream stalled mid tool-call (write_file); the action was not executed. Ask me to retry if you want to continue."` 5. User says "continue" → same result, loops forever ## Root Cause In `agent/chat_completion_helpers.py` (~line 2186-2207): - When a partial stream has pending tool call names, `_stub_finish_reason` is set to `"stop"` (not `"length"`) - The stub has `tool_calls=None` In `agent/conversation_loop.py`: - `finish_reason="stop"` skips the `finish_reason == "length"` continuation/retry branch entirely - `assistant_message.tool_calls` is `None`, so it falls through to the text-response path - The 116-char warning becomes `final_response` and the turn ends - On retry, the model attempts the identical large tool call → same timeout → same result ## Design Issue The `finish_reason="stop"` choice is intentional (comment says "the agent should hand control back rather than auto-retry a tool call that may have side-effects"). But this assumes the user manually intervenes meaningfully. In practice, users say "continue" and the model just retries the same thing. ## Suggested Fixes 1. **Detect repeated stale-stream failures on the same tool call pattern.** After 2+ consecutive partial-stream-stubs dropping the same tool name, inject a system message telling the model to break the output into smaller chunks (e.g. "Your previous write_file was too large and the stream timed out. Break the content into smaller pieces or use multiple patch calls.") 2. **Consider using `finish_reason="length"` for partial-stream-stubs with dropped tool calls** so the existing continuation machinery can handle them, with appropriate guards against re-executing side-effectful tools. 3. **Upstream:** Investigate why long-running streams (~60-130s generation time for large tool call arguments) go stale through the inference proxy. The 180s stale-stream timeout may be shorter than the proxy's own idle timeout, or TCP keepalives may not be reaching through. ## Affected Code - `agent/chat_completion_helpers.py` — `interruptible_streaming_api_call()`, partial-stream-stub construction - `agent/conversation_loop.py` — finish_reason branching, no detection of repeated stale-stream patterns ## Related - #25689 (stale stream timeout does not trigger fallback chain) - #31128 (stale-stream handler tries to rebuild OpenAI client when provider is Anthropic) - #28161 (Anthropic streaming: stale/retry paths cause hangs)

hermes2026-05-25 10:04:30

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

When a streaming API call goes stale mid-tool-call (e.g. a large write_file), the partial-stream-stub recovery path sets finish_reason="stop" with tool_calls=None. This causes the conversation loop to treat the turn as a completed text response, returning only the ~116-char warning message. When the user says "continue", the model retries the same large tool call, hits the same stale stream, and the cycle repeats indefinitely.

Root Cause

In agent/chat_completion_helpers.py (~line 2186-2207):

When a partial stream has pending tool call names, _stub_finish_reason is set to "stop" (not "length")
The stub has tool_calls=None

In agent/conversation_loop.py:

finish_reason="stop" skips the finish_reason == "length" continuation/retry branch entirely
assistant_message.tool_calls is None, so it falls through to the text-response path
The 116-char warning becomes final_response and the turn ends
On retry, the model attempts the identical large tool call → same timeout → same result

Fix Action

Fix / Workaround

Detect repeated stale-stream failures on the same tool call pattern. After 2+ consecutive partial-stream-stubs dropping the same tool name, inject a system message telling the model to break the output into smaller chunks (e.g. "Your previous write_file was too large and the stream timed out. Break the content into smaller pieces or use multiple patch calls.")

RAW_BUFFERClick to expand / collapse

Summary

Reproduction

Ask the agent (via gateway) to produce a large write_file output (e.g. 15-30K+ tokens of HTML)
The inference stream goes stale (no chunks for 180s)
Stale-stream detector kills the connection
Agent returns: "⚠ Stream stalled mid tool-call (write_file); the action was not executed. Ask me to retry if you want to continue."
User says "continue" → same result, loops forever

Root Cause

In agent/chat_completion_helpers.py (~line 2186-2207):

When a partial stream has pending tool call names, _stub_finish_reason is set to "stop" (not "length")
The stub has tool_calls=None

In agent/conversation_loop.py:

finish_reason="stop" skips the finish_reason == "length" continuation/retry branch entirely
assistant_message.tool_calls is None, so it falls through to the text-response path
The 116-char warning becomes final_response and the turn ends
On retry, the model attempts the identical large tool call → same timeout → same result

Design Issue

The finish_reason="stop" choice is intentional (comment says "the agent should hand control back rather than auto-retry a tool call that may have side-effects"). But this assumes the user manually intervenes meaningfully. In practice, users say "continue" and the model just retries the same thing.

Suggested Fixes

Detect repeated stale-stream failures on the same tool call pattern. After 2+ consecutive partial-stream-stubs dropping the same tool name, inject a system message telling the model to break the output into smaller chunks (e.g. "Your previous write_file was too large and the stream timed out. Break the content into smaller pieces or use multiple patch calls.")
Consider using finish_reason="length" for partial-stream-stubs with dropped tool calls so the existing continuation machinery can handle them, with appropriate guards against re-executing side-effectful tools.
Upstream: Investigate why long-running streams (~60-130s generation time for large tool call arguments) go stale through the inference proxy. The 180s stale-stream timeout may be shorter than the proxy's own idle timeout, or TCP keepalives may not be reaching through.

Affected Code

agent/chat_completion_helpers.py — interruptible_streaming_api_call(), partial-stream-stub construction
agent/conversation_loop.py — finish_reason branching, no detection of repeated stale-stream patterns

#25689 (stale stream timeout does not trigger fallback chain)
#31128 (stale-stream handler tries to rebuild OpenAI client when provider is Anthropic)
#28161 (Anthropic streaming: stale/retry paths cause hangs)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Stale stream + partial-stream-stub creates unrecoverable retry loop on large tool calls

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Summary

Reproduction

Root Cause

Design Issue

Suggested Fixes

Affected Code

Related

Still need to ship something?

TRENDING