claude-code - 💡(How to fix) Fix Interrupting an interleaved-thinking turn mid-stream wedges the session: unrecoverable 400 "thinking blocks ... cannot be modified"

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

If the user submits a new prompt while a large assistant turn with interleaved thinking is still streaming, the harness can persist an assistant message whose final content block is a thinking block — i.e. a tool-use turn that isn't terminated by a tool_use/text block. Every subsequent API request replays that malformed message and fails with:

API Error: 400 messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
assistant message cannot be modified. These blocks must remain as they were in the original response.

The conversation is then wedged: continue, a new prompt, even asking about the error all return the identical 400, because the corrupt assistant turn is frozen in history and replayed on every request. The only recovery is a manual rewind past that turn or starting a fresh session (losing context).

Error Message

API Error: 400 messages.N.content.M: thinking or redacted_thinking blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response.

Root Cause

The conversation is then wedged: continue, a new prompt, even asking about the error all return the identical 400, because the corrupt assistant turn is frozen in history and replayed on every request. The only recovery is a manual rewind past that turn or starting a fresh session (losing context).

Code Example

API Error: 400 messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
assistant message cannot be modified. These blocks must remain as they were in the original response.

---

[24] tool_use   mcp__<server>__<tool>
[25] tool_use   mcp__<server>__<tool>     <- API error anchored at content.25
[26] thinking
[27] tool_use   Bash
[28] thinking
[29] tool_use   mcp__<server>__<tool>
[30] tool_use   mcp__<server>__<tool>
[31] thinking                              <- TERMINAL block: dangling, no tool_use/text after it

---

:35.387  user        tool_result    (last tool result returns)
:00.903  assistant   thinking       (block 31)
:00.989  attachment                 (+86 ms — new user prompt + attachment submitted mid-stream)
:01.297  assistant   API Error 400 …content.25
RAW_BUFFERClick to expand / collapse

Environment

  • Claude Code: 2.1.154
  • Model: claude-opus-4-8 (1M context), max effort — extended/interleaved thinking active
  • Platform: macOS (darwin 25.2.0)

Summary

If the user submits a new prompt while a large assistant turn with interleaved thinking is still streaming, the harness can persist an assistant message whose final content block is a thinking block — i.e. a tool-use turn that isn't terminated by a tool_use/text block. Every subsequent API request replays that malformed message and fails with:

API Error: 400 messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
assistant message cannot be modified. These blocks must remain as they were in the original response.

The conversation is then wedged: continue, a new prompt, even asking about the error all return the identical 400, because the corrupt assistant turn is frozen in history and replayed on every request. The only recovery is a manual rewind past that turn or starting a fresh session (losing context).

Repro

  1. Opus 4.8 with max effort (interleaved thinking on).
  2. Trigger a turn that fans out many tool calls in a single assistant message — in my case 21 tool calls (MCP servers + Bash + ToolSearch) with 10 interleaved thinking blocks, 32 content blocks total.
  3. While the model is still streaming — specifically right after a thinking block and before its following tool_use — submit a new prompt (here also with an attachment).
  4. The next request 400s, and never recovers.

Evidence (reconstructed from the session transcript)

The final assistant message (blocks regrouped by message.id) ended on a thinking block:

[24] tool_use   mcp__<server>__<tool>
[25] tool_use   mcp__<server>__<tool>     <- API error anchored at content.25
[26] thinking
[27] tool_use   Bash
[28] thinking
[29] tool_use   mcp__<server>__<tool>
[30] tool_use   mcp__<server>__<tool>
[31] thinking                              <- TERMINAL block: dangling, no tool_use/text after it

Millisecond timing shows the mid-stream submission race:

…:35.387  user        tool_result    (last tool result returns)
…:00.903  assistant   thinking       (block 31)
…:00.989  attachment                 (+86 ms — new user prompt + attachment submitted mid-stream)
…:01.297  assistant   API Error 400 …content.25

All 21 tool calls had completed cleanly (21 tool_use / 21 tool_result, none missing) — so the turn was not interrupted during tool execution. It was interrupted between a thinking block and the next tool_use, leaving the trailing thinking block dangling. The malformed message then replayed on continue and three subsequent prompts, each returning the identical 400 (4 failed requests total before the user gave up and started a new session).

Note: the error path names content.25 (a tool_use) while the message text complains about thinking blocks — consistent with an off-by-one between the serialized request and raw block order; thinking blocks sit immediately around that index (content.23 and .26).

Expected behavior

On a mid-stream interrupt, the persisted assistant message should be left in an API-valid state. Any of:

  • Drop a trailing/dangling thinking block not followed by a tool_use/text.
  • Defer appending the new user turn until the in-flight assistant message is well-formed.
  • Detect the wedged state and auto-rewind (or prompt to rewind) instead of replaying the identical doomed request multiple times.

Impact

  • Hard conversation wedge — unrecoverable without manual rewind or a new session; in-progress context is lost.
  • More likely with extended/interleaved thinking (Opus 4.8 max effort) combined with large single-turn tool fan-outs: the more interleaved thinking blocks in a turn, the larger the window for this race.
  • The client silently retries the identical failing request several times before surfacing the error to the user.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

On a mid-stream interrupt, the persisted assistant message should be left in an API-valid state. Any of:

  • Drop a trailing/dangling thinking block not followed by a tool_use/text.
  • Defer appending the new user turn until the in-flight assistant message is well-formed.
  • Detect the wedged state and auto-rewind (or prompt to rewind) instead of replaying the identical doomed request multiple times.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING