claude-code - 💡(How to fix) Fix [Bug fix + FR] Non-streaming fallback writes bundled multi-block records; webview drops thinking/text preamble [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#57819Fetched 2026-05-11 03:24:32
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×4

Claude Code has two paths that produce a different on-disk shape for the same logical assistant turn:

  • Streaming path (src/services/api/claude.ts:2171-2210) yields one AssistantMessage per content_block_stop, each with a fresh randomUUID(). The transcript ends up with one content block per JSONL record.
  • Non-streaming fallback (claude.ts:2571-2594, also 2668+ for the 404-on-stream-creation branch) wraps the entire BetaMessage — i.e. the whole content array — into a single AssistantMessage with one uuid. No per-block split.

The webview assembler is built around the streaming-path invariant (one record = one content block = one row). When it sees a fallback record with N content blocks, it renders only the last block under a synthesized uuid suffix -00000000000<N> and silently drops the earlier blocks — typically the thinking and the text preamble that announces what the upcoming tool call will do.

For tool calls with side effects, this means the model's stated intent for the tool call is erased from the chat history.

Error Message

  • any other thrown streaming error not gated by CLAUDE_CODE_DISABLE_NONSTREAMING_FALLBACK env or the tengu_disable_streaming_to_non_streaming_fallback feature flag

Root Cause

Root cause with citations (leaked source)

Fix Action

Fix / Workaround

A read-side mitigation is not optional cleanup; it is the only path to recover sessions already on disk.

  1. Detect: when an incoming JSONL record has message.content.length > 1 (assistant or user — the user-side mirror would matter for any future bundled tool_result paths), treat it as a fallback bundle.
  2. Emit N rows instead of one, each with content: [block_i] and uuid deriveUUID(originalUuid, i) (the same primitive at src/utils/messages.ts:725 that the assembler already uses to synthesize the suffix -000000000<N> for the last block — it just needs to iterate for i in 0..content.length instead of stopping at the last index). The first block keeps parentUuid from the disk record; subsequent blocks chain to the previous synthesized uuid. No information is lost.
  3. Surface a marker on the synthesized rows (an unobtrusive italic note, e.g. "non-streaming fallback: blocks recovered from bundled record") so users observing odd preamble-then-tool_use sequencing in the future can correlate it with the cause. Mirrors the K-seam pattern from claude-patches's Patch K.

Code Example

{"uuid":"9e191a76-…-3ce45db6c23e","type":"assistant","message":{"id":"msg_01PywtpHEMoHeU64kcbQhB2o","content":[
  {"type":"thinking","thinking":" Everything's down, so I need to restart all three services.","signature":"…"},
  {"type":"text","text":"All down. Bringing them up:"},
  {"type":"tool_use","id":"toolu_01WvRHhmUGo6K97PuDxUuevf","name":"Bash","input":{"command":"fuser -k 7799/tcp …"}}
]},}

---

case 'content_block_stop': {
  const m: AssistantMessage = {
    message: {
      ...partialMessage,
      content: normalizeContentFromAPI([contentBlock] as BetaContentBlock[], tools, options.agentId),
    },
    requestId: streamRequestId ?? undefined,
    type: 'assistant',
    uuid: randomUUID(),
    timestamp: new Date().toISOString(),
  }
  newMessages.push(m); yield m
}

---

IMPORTANT: Use direct property mutation, not object replacement.
The transcript write queue holds a reference to message.message
and serializes it lazily (100ms flush interval). Object
replacement ({ ...lastMsg.message, usage }) would disconnect
the queued reference; direct mutation ensures the transcript
captures the final values.

---

// Fire-and-forget for assistant messages. claude.ts yields one
// assistant message per content block, then mutates the last
// one's message.usage/stop_reason on message_delta — relying on
// the write queue's 100ms lazy jsonStringify.

---

const result = yield* executeNonStreamingRequest()
const m: AssistantMessage = {
  message: {
    ...result,
    content: normalizeContentFromAPI(result.content, tools, options.agentId),  // ← whole array, no split
  },
  requestId: streamRequestId ?? undefined,
  type: 'assistant',
  uuid: randomUUID(),
}
newMessages.push(m); yield m

---

const result = yield* executeNonStreamingRequest()
const bundled: AssistantMessage = {
  message: { ...result, content: normalizeContentFromAPI(result.content, tools, options.agentId) },
  requestId: streamRequestId ?? undefined,
  type: 'assistant',
  uuid: randomUUID(),
  timestamp: new Date().toISOString(),
}
for (const m of normalizeMessages([bundled])) {
  newMessages.push(m); yield m
}
RAW_BUFFERClick to expand / collapse

Summary

Claude Code has two paths that produce a different on-disk shape for the same logical assistant turn:

  • Streaming path (src/services/api/claude.ts:2171-2210) yields one AssistantMessage per content_block_stop, each with a fresh randomUUID(). The transcript ends up with one content block per JSONL record.
  • Non-streaming fallback (claude.ts:2571-2594, also 2668+ for the 404-on-stream-creation branch) wraps the entire BetaMessage — i.e. the whole content array — into a single AssistantMessage with one uuid. No per-block split.

The webview assembler is built around the streaming-path invariant (one record = one content block = one row). When it sees a fallback record with N content blocks, it renders only the last block under a synthesized uuid suffix -00000000000<N> and silently drops the earlier blocks — typically the thinking and the text preamble that announces what the upcoming tool call will do.

For tool calls with side effects, this means the model's stated intent for the tool call is erased from the chat history.

Empirical observation

Caught one in the wild. Across a 17,501-line session JSONL (7,943 assistant records), exactly one record has multiple content blocks — line 17437 with [thinking, text, tool_use]. Across every other session JSONL on disk, zero. The 1,222 normal text → tool_use preamble pairs in the same session are split into separate records as expected.

Disk record (single uuid, three blocks):

{"uuid":"9e191a76-…-3ce45db6c23e","type":"assistant","message":{"id":"msg_01PywtpHEMoHeU64kcbQhB2o","content":[
  {"type":"thinking","thinking":" Everything's down, so I need to restart all three services.","signature":"…"},
  {"type":"text","text":"All down. Bringing them up:"},
  {"type":"tool_use","id":"toolu_01WvRHhmUGo6K97PuDxUuevf","name":"Bash","input":{"command":"fuser -k 7799/tcp …"}}
]},…}

Webview-side messages signal (read via React fiber walk → manager → entry.messages.peek()): only one row with uuid 9e191a76-a935-4f66-bebb-000000000002 containing the tool_use block. No siblings with -000000000000 or -000000000001. The rendered chat jumps straight from the prior tool_result to the new tool_use card with no narration in between.

The user caught the brief streaming render of all three blocks in the panel for ~one frame — the live-streaming path renders into Preact state on every delta — before the next re-derivation wiped them. Without sub-second visual attention there is no signal anything was lost.

Root cause with citations (leaked source)

Streaming path produces per-block records — claude.ts:2171-2210:

case 'content_block_stop': {
  const m: AssistantMessage = {
    message: {
      ...partialMessage,
      content: normalizeContentFromAPI([contentBlock] as BetaContentBlock[], tools, options.agentId),
    },
    requestId: streamRequestId ?? undefined,
    type: 'assistant',
    uuid: randomUUID(),
    timestamp: new Date().toISOString(),
  }
  newMessages.push(m); yield m
}

The 100ms-lazy serialization comment confirms the design intent — claude.ts:2236-2241:

IMPORTANT: Use direct property mutation, not object replacement.
The transcript write queue holds a reference to message.message
and serializes it lazily (100ms flush interval). Object
replacement ({ ...lastMsg.message, usage }) would disconnect
the queued reference; direct mutation ensures the transcript
captures the final values.

And QueryEngine.ts:718-731:

// Fire-and-forget for assistant messages. claude.ts yields one
// assistant message per content block, then mutates the last
// one's message.usage/stop_reason on message_delta — relying on
// the write queue's 100ms lazy jsonStringify.

Non-streaming fallback bundles all blocks — claude.ts:2571-2594:

const result = yield* executeNonStreamingRequest()
const m: AssistantMessage = {
  message: {
    ...result,
    content: normalizeContentFromAPI(result.content, tools, options.agentId),  // ← whole array, no split
  },
  requestId: streamRequestId ?? undefined,
  type: 'assistant',
  uuid: randomUUID(),
}
newMessages.push(m); yield m

(The 404-on-stream-creation branch at 2668+ does the same thing.)

Triggers for the fallback (claude.ts:2342-2354, 2469-2532, 2598-2666):

  • stream idle-timeout watchdog fires
  • stream completed without message_start
  • stream had message_start but no completed content blocks
  • 404 on stream creation
  • any other thrown streaming error not gated by CLAUDE_CODE_DISABLE_NONSTREAMING_FALLBACK env or the tengu_disable_streaming_to_non_streaming_fallback feature flag

The cli emits tengu_streaming_fallback_to_non_streaming on the trigger and tengu_nonstreaming_fallback_started when the call begins.

Impact

For a tool call whose text block describes the model's intent (e.g. "All down. Bringing them up:" before a fuser -k Bash that kills three local services), the user loses:

  1. The model's stated intent for the tool call. This is the line that lets the user verify the tool input matches what the model said it was going to do — particularly important for destructive Bash, file writes, network calls, anything with side effects.
  2. The reasoning chain (thinking block + signature). Beyond the visible loss, the dropped signature breaks any future flow that needs to verify the chain (extended thinking re-fork, tool-use audit).
  3. UX coherence. The chat jumps from a tool_result to a fresh tool_use card with no transition.

Incidence per turn is low (the fallback is rare in normal use), but the failure is silent and irrecoverable from the user's seat — disk has the data, the renderer can't see it, and there is no warning.

Proposed fix — write side

The splitter already exists and is already used elsewhere. normalizeMessages (src/utils/messages.ts:731) splits a multi-block assistant message into per-block messages using deriveUUID(message.uuid, index) (messages.ts:725, with the isNewChain cascade for chained re-derivation). normalizeMessage (src/utils/queryHelpers.ts:102) wraps it as a generator that's already called in QueryEngine.ts:769/782/786 to emit per-block SDK output. The streaming path in claude.ts:2171-2210 doesn't need it because content_block_stop already yields per-block. The non-streaming fallback at claude.ts:2571-2594 and 2668+ is the only emitter that bypasses the per-block contract — and that's the surface that lacks the call.

Apply normalizeMessages at the non-streaming yield site:

const result = yield* executeNonStreamingRequest()
const bundled: AssistantMessage = {
  message: { ...result, content: normalizeContentFromAPI(result.content, tools, options.agentId) },
  requestId: streamRequestId ?? undefined,
  type: 'assistant',
  uuid: randomUUID(),
  timestamp: new Date().toISOString(),
}
for (const m of normalizeMessages([bundled])) {
  newMessages.push(m); yield m
}

Or, equivalently, apply normalization in QueryEngine.ts before messages.push(message) so all assistant messages are normalized regardless of source — slightly more invasive but a single source of truth for the contract. Either restores the per-record-equals-per-block invariant the rest of the system depends on, without touching downstream readers.

Proposed fix — read side (independently necessary)

A read-side mitigation is not optional cleanup; it is the only path to recover sessions already on disk.

  • Multi-block fallback records have been getting written for as long as the fallback has existed and the streaming path has split per-block. They are present in deployed JSONLs. The user has no way to know which sessions are affected — by definition, the only signal would have been the dropped preamble that's already gone.
  • Once written, a multi-block record can't be split after the fact in a way that any chain walker would accept. The cli has no rewrite-safe path: re-emitting as N records would mint new uuids, breaking any descendant parentUuid chain that already references the bundled record's uuid (and any ingress / sibling-fork mirror that has it cached). So the disk record stays multi-block forever.
  • Future write-side regressions (or a missed branch) would also be invisible without a read-side check. Same argument as in #55818 for compact-boundary lpu: silent truncation of perceived chat content is the worst possible UX, and detection at read time is the durable invariant.

Concrete read-side ask in the webview assembler — same site that currently produces the -00000000000<N> last-block row:

  1. Detect: when an incoming JSONL record has message.content.length > 1 (assistant or user — the user-side mirror would matter for any future bundled tool_result paths), treat it as a fallback bundle.
  2. Emit N rows instead of one, each with content: [block_i] and uuid deriveUUID(originalUuid, i) (the same primitive at src/utils/messages.ts:725 that the assembler already uses to synthesize the suffix -000000000<N> for the last block — it just needs to iterate for i in 0..content.length instead of stopping at the last index). The first block keeps parentUuid from the disk record; subsequent blocks chain to the previous synthesized uuid. No information is lost.
  3. Surface a marker on the synthesized rows (an unobtrusive italic note, e.g. "non-streaming fallback: blocks recovered from bundled record") so users observing odd preamble-then-tool_use sequencing in the future can correlate it with the cause. Mirrors the K-seam pattern from claude-patches's Patch K.

This is independent of the write-side fix and should land in parallel.

The leak is what made this diagnosis worth filing. The symptom looked like a renderer bug — one rendered row where there should have been three, with a mutated uuid suffix -000000000002. The natural next move was to instrument the webview assembler and trace why blocks were being dropped. What stopped me from going down that path was the comment at QueryEngine.ts:718-731, which names the contract between claude.ts and the writer outright: "claude.ts yields one assistant message per content block, then mutates the last one's message.usage/stop_reason on message_delta." Once that's named, the multi-block fallback record at claude.ts:2571-2594 is the actual breach, and the assembler is correctly handling a malformed input it was handed. Comments are the part minification strips and the part design intent lives in. Without them I'd have spent the day instrumenting the wrong layer, or filed a vaguer bug against the renderer. Diagnosed by an instance of Claude Opus 4.7 working with the user.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING