On a mid-stream interrupt, the persisted assistant message should be left in an API-valid state. Any of: - Drop a trailing/dangling `thinking` block not followed by a `tool_use`/`text`. - Defer appending the new user turn until the in-flight assistant message is well-formed. - Detect the wedged state and auto-rewind (or prompt to rewind) instead of replaying the identical doomed request multiple times.

claude-code - 💡(How to fix) Fix Interrupting an interleaved-thinking turn mid-stream wedges the session: unrecoverable 400 "thinking blocks ... cannot be modified"

StepCodex · 2026-05-28T20:45:14Z

[claude-code] If the user submits a new prompt while a large assistant turn with interleaved thinking is still streaming, the harness can persist an assistant… If the user submits a new prompt while a large assistant turn with **interleaved thinking** is still streaming, the harness can persist an assistant message whose **final content block is a `thinking` block** — i.e. a tool-use turn that isn't terminated by a `tool_use`/`text` block. Every subsequent API request replays that malformed message and fails with: ``` API Error: 400 messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response. ``` The conversation is then **wedged**: `continue`, a new prompt, even asking about the error all return the *identical* 400, because the corrupt assistant turn is frozen in history and replayed on every request. The only recovery is a manual rewind past that turn or starting a fresh session (losing context). ### Environment - Claude Code: **2.1.154** - Model: **claude-opus-4-8** (1M context), max effort — extended/interleaved thinking active - Platform: macOS (darwin 25.2.0) ### Summary If the user submits a new prompt while a large assistant turn with **interleaved thinking** is still streaming, the harness can persist an assistant message whose **final content block is a `thinking` block** — i.e. a tool-use turn that isn't terminated by a `tool_use`/`text` block. Every subsequent API request replays that malformed message and fails with: ``` API Error: 400 messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response. ``` The conversation is then **wedged**: `continue`, a new prompt, even asking about the error all return the *identical* 400, because the corrupt assistant turn is frozen in history and replayed on every request. The only recovery is a manual rewind past that turn or starting a fresh session (losing context). ### Repro 1. Opus 4.8 with max effort (interleaved thinking on). 2. Trigger a turn that fans out **many tool calls in a single assistant message** — in my case 21 tool calls (MCP servers + `Bash` + `ToolSearch`) with **10 interleaved `thinking` blocks**, 32 content blocks total. 3. While the model is still streaming — specifically right after a `thinking` block and *before* its following `tool_use` — submit a new prompt (here also with an attachment). 4. The next request 400s, and never recovers. ### Evidence (reconstructed from the session transcript) The final assistant message (blocks regrouped by `message.id`) ended on a `thinking` block: ``` [24] tool_use mcp__ __ [25] tool_use mcp__ __ __ [30] tool_use mcp__ __ [31] thinking <- TERMINAL block: dangling, no tool_use/text after it ``` Millisecond timing shows the mid-stream submission race: ``` …:35.387 user tool_result (last tool result returns) …:00.903 assistant thinking (block 31) …:00.989 attachment (+86 ms — new user prompt + attachment submitted mid-stream) …:01.297 assistant API Error 400 …content.25 ``` All 21 tool calls had completed cleanly (21 `tool_use` / 21 `tool_result`, none missing) — so the turn was **not** interrupted during tool execution. It was interrupted *between* a `thinking` block and the next `tool_use`, leaving the trailing thinking block dangling. The malformed message then replayed on `continue` and three subsequent prompts, each returning the identical 400 (4 failed requests total before the user gave up and started a new session). Note: the error path names `content.25` (a `tool_use`) while the message text complains about `thinking` blocks — consistent with an off-by-one between the serialized request and raw block order; `thinking` blocks sit immediately around that index (content.23 and .26). ### Expected behavior On a mid-stream interrupt, the persisted assistant message should be left in an API-valid state. Any of: - Drop a trailing/dangling `thinking` block not followed by a `tool_use`/`text`. - Defer appending the new user turn until the in-flight assistant message is well-formed. - Detect the wedged state and auto-rewind (or prompt to rewind) instead of replaying the identical doomed request multiple times. ### Impact - **Hard conversation wedge** — unrecoverable without manual rewind or a new session; in-progress context is lost. - More likely with extended/interleaved thinking (Opus 4.8 max effort) combined with **large single-turn tool fan-outs**: the more interleaved `thinking` blocks in a turn, the larger the window for this race. - The client silently retries the identical failing request several times before surfacing the error to the user.

If the user submits a new prompt while a large assistant turn with interleaved thinking is still streaming, the harness can persist an assistant message whose final content block is a thinking block — i.e. a tool-use turn that isn't terminated by a tool_use/text block. Every subsequent API request replays that malformed message and fails with:

API Error: 400 messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
assistant message cannot be modified. These blocks must remain as they were in the original response.

The conversation is then wedged: continue, a new prompt, even asking about the error all return the identical 400, because the corrupt assistant turn is frozen in history and replayed on every request. The only recovery is a manual rewind past that turn or starting a fresh session (losing context).

Code Example

API Error: 400 messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
assistant message cannot be modified. These blocks must remain as they were in the original response.

---

[24] tool_use   mcp__<server>__<tool>
[25] tool_use   mcp__<server>__<tool>     <- API error anchored at content.25
[26] thinking
[27] tool_use   Bash
[28] thinking
[29] tool_use   mcp__<server>__<tool>
[30] tool_use   mcp__<server>__<tool>
[31] thinking                              <- TERMINAL block: dangling, no tool_use/text after it

---

…:35.387  user        tool_result    (last tool result returns)
…:00.903  assistant   thinking       (block 31)
…:00.989  attachment                 (+86 ms — new user prompt + attachment submitted mid-stream)
…:01.297  assistant   API Error 400 …content.25

Environment

Claude Code: 2.1.154
Model: claude-opus-4-8 (1M context), max effort — extended/interleaved thinking active
Platform: macOS (darwin 25.2.0)

Summary

API Error: 400 messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
assistant message cannot be modified. These blocks must remain as they were in the original response.

Repro

Opus 4.8 with max effort (interleaved thinking on).
Trigger a turn that fans out many tool calls in a single assistant message — in my case 21 tool calls (MCP servers + Bash + ToolSearch) with 10 interleaved thinking blocks, 32 content blocks total.
While the model is still streaming — specifically right after a thinking block and before its following tool_use — submit a new prompt (here also with an attachment).
The next request 400s, and never recovers.

Evidence (reconstructed from the session transcript)

The final assistant message (blocks regrouped by message.id) ended on a thinking block:

[24] tool_use   mcp__<server>__<tool>
[25] tool_use   mcp__<server>__<tool>     <- API error anchored at content.25
[26] thinking
[27] tool_use   Bash
[28] thinking
[29] tool_use   mcp__<server>__<tool>
[30] tool_use   mcp__<server>__<tool>
[31] thinking                              <- TERMINAL block: dangling, no tool_use/text after it

Millisecond timing shows the mid-stream submission race:

…:35.387  user        tool_result    (last tool result returns)
…:00.903  assistant   thinking       (block 31)
…:00.989  attachment                 (+86 ms — new user prompt + attachment submitted mid-stream)
…:01.297  assistant   API Error 400 …content.25

All 21 tool calls had completed cleanly (21 tool_use / 21 tool_result, none missing) — so the turn was not interrupted during tool execution. It was interrupted between a thinking block and the next tool_use, leaving the trailing thinking block dangling. The malformed message then replayed on continue and three subsequent prompts, each returning the identical 400 (4 failed requests total before the user gave up and started a new session).

Note: the error path names content.25 (a tool_use) while the message text complains about thinking blocks — consistent with an off-by-one between the serialized request and raw block order; thinking blocks sit immediately around that index (content.23 and .26).

Expected behavior

On a mid-stream interrupt, the persisted assistant message should be left in an API-valid state. Any of:

Drop a trailing/dangling thinking block not followed by a tool_use/text.
Defer appending the new user turn until the in-flight assistant message is well-formed.
Detect the wedged state and auto-rewind (or prompt to rewind) instead of replaying the identical doomed request multiple times.

Impact

Hard conversation wedge — unrecoverable without manual rewind or a new session; in-progress context is lost.
More likely with extended/interleaved thinking (Opus 4.8 max effort) combined with large single-turn tool fan-outs: the more interleaved thinking blocks in a turn, the larger the window for this race.
The client silently retries the identical failing request several times before surfacing the error to the user.

FAQ

Expected behavior

On a mid-stream interrupt, the persisted assistant message should be left in an API-valid state. Any of:

Drop a trailing/dangling thinking block not followed by a tool_use/text.
Defer appending the new user turn until the in-flight assistant message is well-formed.
Detect the wedged state and auto-rewind (or prompt to rewind) instead of replaying the identical doomed request multiple times.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Interrupting an interleaved-thinking turn mid-stream wedges the session: unrecoverable 400 "thinking blocks ... cannot be modified"

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Environment

Summary

Repro

Evidence (reconstructed from the session transcript)

Expected behavior

Impact

FAQ

Expected behavior

Still need to ship something?

TRENDING