claude-code - 💡(How to fix) Fix [BUG] Stream idle timeout / partial response during long tool-use turns on Claude Code Web (Opus 4.7, 1M and non-1M) [1 comments, 2 participants]

masterzhuang · 2026-04-17T00:40:24Z

[claude-code] During turns where the assistant is about to produce a long text output e.g. drafting a ~400-line markdown design doc after a few Read/Bash tool… During turns where the assistant is about to produce a long text output (e.g. drafting a ~400-line markdown design doc after a few Read/Bash tool calls), the stream terminates with: `API Error: Stream idle timeout - partial response received` The error is *not* triggered by the tool calls themselves — tool results arrive normally. It consistently fires in the window between the last tool result and the start (or middle) of the long text reply. ## Fix / Workaround ## Workarounds tried (none fully effective) - Slimmed CLAUDE.md to reduce per-turn context overhead — still fails. - Switched `claude-opus-4-7[1m]` → `claude-opus-4-7` — still fails. - Starting a fresh session helps temporarily, but the error returns after the session grows. ### Preflight Checklist - [x] I have searched [existing issues](https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet - [x] This is a single bug report (please file separate reports for different bugs) - [x] I am using the latest version of Claude Code ### What's Wrong? ## Environment - Platform: Claude Code Web (claude.ai/code) - Model(s) affected: - `claude-opus-4-7[1m]` (1M context) — reproducible - `claude-opus-4-7` (standard context) — also reproducible after switching - OS: Linux sandbox (provided by the web harness) - Session type: long-running conversation with multiple tool calls (Read / Bash / Grep), working in a git repo - Approx. transcript length when error first appeared: mid-session, after several dozen tool calls and a few long assistant messages ## Summary During turns where the assistant is about to produce a long text output (e.g. drafting a ~400-line markdown design doc after a few Read/Bash tool calls), the stream terminates with: `API Error: Stream idle timeout - partial response received` The error is *not* triggered by the tool calls themselves — tool results arrive normally. It consistently fires in the window between the last tool result and the start (or middle) of the long text reply. ## Reproduction 1. Open a Claude Code Web session with Opus 4.7 (1M) on a non-trivial repo (mine: ~several hundred markdown/py files, custom CLAUDE.md). 2. Hold a long design discussion (tens of turns, many Read/Grep/Bash calls, several multi-paragraph replies). 3. Ask the assistant to draft a long markdown document (~400+ lines) into a file, preceded by 1–2 exploratory tool calls. 4. Observe `Stream idle timeout - partial response received` fire after the tool calls complete but before / during the long write. Retry attempts (even after slimming CLAUDE.md and switching from `claude-opus-4-7[1m]` to plain `claude-opus-4-7`) reproduce the same error. ## Actual behavior Stream aborts mid-turn with an idle timeout; partial response is discarded from the user's perspective and the Write tool call never executes. The session is usable afterwards, but the same turn cannot be completed — it fails repeatedly at roughly the same point. ## Impact - Blocks any workflow that involves drafting a long file in a single turn (design docs, protocol revisions, report packs). - Forces the user to manually split work into smaller chunks, losing the model's ability to produce a coherent document in one pass. - Switching to the non-1M model does not resolve it, so it does not appear to be strictly a 1M-context issue. ## Workarounds tried (none fully effective) - Slimmed CLAUDE.md to reduce per-turn context overhead — still fails. - Switched `claude-opus-4-7[1m]` → `claude-opus-4-7` — still fails. - Starting a fresh session helps temporarily, but the error returns after the session grows. ## Additional context - The moments when errors occur most frequently are always during the combination of “multi‑turn conversations + about to generate long markdown.” - The web version cannot call /help, and there is no way to report issues directly from the client, so I am submitting this issue manually. - Session ID : https://claude.ai/code/session_014AtytC2zxPcAqaoj9LwcGi ## Ask 1. Is the idle timeout threshold tunable (e.g. via a server-side setting or client flag)? 2. Can the stream be kept alive with heartbeats during long text generation so long Write tool calls don't get cut off? 3. Is there a known interaction between 1M context and long tail-end text streaming that we should avoid? ### What Should Happen? ## Expected behavior The assistant should finish streaming the long reply, or at minimum fail with a retriable error that preserves the in-progress Write/Edit tool call. ### Error Messages/Logs ```shell API Error: Stream idle timeout - partial response received ``` ### Steps to Reproduce 1. Open a Claude Code Web session with Opus 4.7 (1M) on a non-trivial repo (mine: ~several hundred markdown/py files, custom CLAUDE.md). 2. Hold a long design discussion (tens of turn

claude-code2026-04-17 00:40:24

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#49619•Fetched 2026-04-17 08:36:03

View on GitHub

Comments

Participants

Timeline

Reactions

Author

masterzhuang

Participants

github-actions[bot]

masterzhuang

Timeline (top)

labeled ×5commented ×1

During turns where the assistant is about to produce a long text output (e.g. drafting a ~400-line markdown design doc after a few Read/Bash tool calls), the stream terminates with:

API Error: Stream idle timeout - partial response received

The error is not triggered by the tool calls themselves — tool results arrive normally. It consistently fires in the window between the last tool result and the start (or middle) of the long text reply.

Error Message

API Error: Stream idle timeout - partial response received

Root Cause

During turns where the assistant is about to produce a long text output (e.g. drafting a ~400-line markdown design doc after a few Read/Bash tool calls), the stream terminates with:

API Error: Stream idle timeout - partial response received

Fix Action

Fix / Workaround

Workarounds tried (none fully effective)

Slimmed CLAUDE.md to reduce per-turn context overhead — still fails.
Switched claude-opus-4-7[1m] → claude-opus-4-7 — still fails.
Starting a fresh session helps temporarily, but the error returns after the session grows.

Code Example

API Error: Stream idle timeout - partial response received

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

Environment

Platform: Claude Code Web (claude.ai/code)
Model(s) affected:
- claude-opus-4-7[1m] (1M context) — reproducible
- claude-opus-4-7 (standard context) — also reproducible after switching
OS: Linux sandbox (provided by the web harness)
Session type: long-running conversation with multiple tool calls (Read / Bash / Grep), working in a git repo
Approx. transcript length when error first appeared: mid-session, after several dozen tool calls and a few long assistant messages

Summary

During turns where the assistant is about to produce a long text output (e.g. drafting a ~400-line markdown design doc after a few Read/Bash tool calls), the stream terminates with:

API Error: Stream idle timeout - partial response received

Reproduction

Open a Claude Code Web session with Opus 4.7 (1M) on a non-trivial repo (mine: ~several hundred markdown/py files, custom CLAUDE.md).
Hold a long design discussion (tens of turns, many Read/Grep/Bash calls, several multi-paragraph replies).
Ask the assistant to draft a long markdown document (~400+ lines) into a file, preceded by 1–2 exploratory tool calls.
Observe Stream idle timeout - partial response received fire after the tool calls complete but before / during the long write.

Retry attempts (even after slimming CLAUDE.md and switching from claude-opus-4-7[1m] to plain claude-opus-4-7) reproduce the same error.

Actual behavior

Stream aborts mid-turn with an idle timeout; partial response is discarded from the user's perspective and the Write tool call never executes. The session is usable afterwards, but the same turn cannot be completed — it fails repeatedly at roughly the same point.

Impact

Blocks any workflow that involves drafting a long file in a single turn (design docs, protocol revisions, report packs).
Forces the user to manually split work into smaller chunks, losing the model's ability to produce a coherent document in one pass.
Switching to the non-1M model does not resolve it, so it does not appear to be strictly a 1M-context issue.

Workarounds tried (none fully effective)

Slimmed CLAUDE.md to reduce per-turn context overhead — still fails.
Switched claude-opus-4-7[1m] → claude-opus-4-7 — still fails.
Starting a fresh session helps temporarily, but the error returns after the session grows.

Additional context

The moments when errors occur most frequently are always during the combination of “multi‑turn conversations + about to generate long markdown.”
The web version cannot call /help, and there is no way to report issues directly from the client, so I am submitting this issue manually.
Session ID : https://claude.ai/code/session_014AtytC2zxPcAqaoj9LwcGi

Ask

Is the idle timeout threshold tunable (e.g. via a server-side setting or client flag)?
Can the stream be kept alive with heartbeats during long text generation so long Write tool calls don't get cut off?
Is there a known interaction between 1M context and long tail-end text streaming that we should avoid?

What Should Happen?

Expected behavior

The assistant should finish streaming the long reply, or at minimum fail with a retriable error that preserves the in-progress Write/Edit tool call.

Error Messages/Logs

API Error: Stream idle timeout - partial response received

Steps to Reproduce

Open a Claude Code Web session with Opus 4.7 (1M) on a non-trivial repo (mine: ~several hundred markdown/py files, custom CLAUDE.md).
Hold a long design discussion (tens of turns, many Read/Grep/Bash calls, several multi-paragraph replies).
Ask the assistant to draft a long markdown document (~400+ lines) into a file, preceded by 1–2 exploratory tool calls.
Observe Stream idle timeout - partial response received fire after the tool calls complete but before / during the long write.

Claude Model

Opus

Is this a regression?

I don't know

Last Working Version

No response

Claude Code Version

Web (claude.ai/code), encountered on 2026-04-17

Platform

Anthropic API

Operating System

Windows

Terminal/Shell

Other

Additional Information

Additional observation (session 014AtytC2zxPcAqaoj9LwcGi, 2026-04-17): Stream timeout fired during a ~210-line Edit call. The file write itself completed successfully (verified via git status post-timeout); only the client-side stream terminated. This suggests the issue is in keep-alive / heartbeat between tool-result-accepted and next-assistant- token, not in the tool execution itself.

extent analysis

TL;DR

Implementing a keep-alive or heartbeat mechanism during long text generation could potentially resolve the Stream idle timeout - partial response received error.

Guidance

Investigate the possibility of adjusting the idle timeout threshold, either through a server-side setting or a client flag, to accommodate longer text generation times.
Explore implementing a heartbeat or keep-alive mechanism between the client and server during long text generation to prevent the stream from timing out.
Consider optimizing the text generation process to reduce the time it takes to produce long responses, potentially by breaking up the response into smaller chunks or using a more efficient generation algorithm.
Review the interaction between the 1M context model and long text streaming to identify any potential issues or optimizations that can be made.

Example

No specific code example can be provided without more information about the underlying implementation, but a potential solution could involve sending periodic keep-alive messages from the client to the server during long text generation, such as:

import time

def generate_long_text():
    # Start generating long text
    start_time = time.time()
    while time.time() - start_time < 60:  # 1 minute timeout
        # Send keep-alive message to server
        send_keep_alive()
        # Generate next chunk of text
        yield next_chunk_of_text()

Notes

The exact solution will depend on the specifics of the implementation and the requirements of the system. Further investigation and testing will be necessary to determine the best approach.

Recommendation

Apply a workaround, such as implementing a keep-alive or heartbeat mechanism, to mitigate the Stream idle timeout - partial response received error until a more permanent solution can be found. This approach can help prevent the stream from timing out during long text generation, allowing the system to function more reliably.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

The assistant should finish streaming the long reply, or at minimum fail with a retriable error that preserves the in-progress Write/Edit tool call.

#api #batch processing #GPU compatibility #latency issue #model loading

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

claude-code - 💡(How to fix) Fix [BUG] Stream idle timeout / partial response during long tool-use turns on Claude Code Web (Opus 4.7, 1M and non-1M) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workarounds tried (none fully effective)

Code Example

Preflight Checklist

What's Wrong?

Environment

Summary

Reproduction

Actual behavior

Impact

Workarounds tried (none fully effective)

Additional context

Ask

What Should Happen?

Expected behavior

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING