litellm - ✅(Solved) Fix [Bug]: /v1/messages streaming drops first chunk on content-block transitions (Bedrock reasoning models) [1 pull requests, 1 participants]

dkssudgo112 · 2026-04-06T08:25:06Z

[litellm] PR 25216: fix anthropic adapter : preserve first chunk on content-block transitions - Repository: BerriAI/litellm - Author: dkssudgo112 - State: open… # PR #25216: fix(anthropic_adapter): preserve first chunk on content-block transitions - Repository: BerriAI/litellm - Author: dkssudgo112 - State: open | merged: False - Link: https://github.com/BerriAI/litellm/pull/25216 ## Description (problem / solution / changelog) ## Relevant issues Fixes #25214 ## Pre-Submission checklist **Please complete all items before asking a LiteLLM maintainer to review your PR** - [x] I have added testing in [`tests/test_litellm/`](https://github.com/BerriAI/litellm/tree/main/tests/test_litellm) — new regression suite `tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_first_chunk_on_block_transition.py` plus updates to `test_parallel_tool_calls.py` - [x] My PR passes all unit tests in the affected directory (`tests/test_litellm/llms/anthropic/experimental_pass_through/`: 179 passed) - [x] My PR's scope is as isolated as possible — it only fixes the first-chunk-drop bug on content-block transitions in `AnthropicStreamWrapper` - [ ] I have requested a Greptile review by commenting `@greptileai` — will do after the PR is open ## Type 🐛 Bug Fix ✅ Test ## Changes ### Problem The `/v1/messages` endpoint's `AnthropicStreamWrapper` silently dropped the **trigger chunk** of every new content block whenever a block transition was detected (text → thinking, thinking → text, text → tool_use). On the wire, the wrapper emitted `content_block_stop` → `content_block_start` and then returned early, discarding the `processed_chunk` computed from the triggering delta. The old comment claimed *"the content_block_start already carries the relevant information"*, but `_translate_streaming_openai_chunk_to_anthropic_content_block()` actually returns an **empty** body for text transitions: ```python elif choice.delta.content is not None and len(choice.delta.content) > 0: return "text", TextBlock(type="text", text="") # ← empty! ``` So the first characters of every new text/thinking block were permanently lost. ### Symptoms (reproduced on `main` @ v1.83.1) Using `/v1/messages` with `stream=true` against a Bedrock Converse reasoning model (`minimax.minimax-m2.5`, `moonshotai.kimi-k2.5`, Claude extended thinking): - Responses start **mid-sentence** (leading characters missing). - If the model emits the text as a single Bedrock chunk, the text block is streamed with **zero** `content_block_delta` events — clients like `claude -p` (Claude Code CLI) see an empty response. - **Non-streaming** on the same deployment returns the full text correctly, so it's strictly a stream-translation regression. #### Raw SSE diff (prompt: `Respond with exactly: 안녕하세요, 저는 MiniMax입니다.`) Non-streaming (correct): ``` \n\n안녕하세요, 저는 MiniMax입니다. ``` Streaming on `main`: ``` event: content_block_start index=2 content_block={"type": "text", "text": ""} event: content_block_delta delta={"type": "text_delta", "text": "녕하세요,"} ← leading "\n\n안" lost event: content_block_delta delta={"type": "text_delta", "text": " 저는 MiniMax입니다."} event: content_block_stop ``` Streaming with this PR: ``` event: content_block_start index=2 content_block={"type": "text", "text": ""} event: content_block_delta delta={"type": "text_delta", "text": "\n\n안녕하세요, 저는 MiniMax입니다."} ← complete event: content_block_stop ``` ### Fix In `litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py`, after emitting the synthetic `content_block_stop` → `content_block_start` pair on a detected block transition, also enqueue `processed_chunk` whenever it is a non-empty `content_block_delta`. A tiny helper `_trigger_delta_has_content()` inspects the four Anthropic delta variants (`text`, `thinking`, `partial_json`, `signature`) so that **empty** tool_use openers (whose tool name is already carried by `content_block_start`) are intentionally skipped and existing tool-call test expectations are preserved. Applied to both the **sync** `__next__` path and the **async** `__anext__` path. ### Tests - **New:** `tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_first_chunk_on_block_transition.py` - Sync and async regression tests that drive a `text → thinking → text` sequence through `AnthropicStreamWrapper` with a mocked `ModelResponseStream`. They assert both the concatenated content of each block and the event ordering (`content_block_start` immediately followed by the trigger chunk's delta). - Verified they **fail on `main`** and **pass with this PR** (both sync and async). - **Updated:** `tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_parallel_tool_calls.py::test_anthropic_stream_wrapper_interleaved_tool_calls_and_text` - The previous expected sequence was missing the two `content_block_delta` events for the interleaved text chunks (the wrapper was dropping them). The test now expects those deltas and also asserts the tex

litellm2026-04-06 08:25:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25214•Fetched 2026-04-08 02:53:02

View on GitHub

Comments

Participants

Timeline

Reactions

Author

dkssudgo112

Participants

dkssudgo112

Timeline (top)

cross-referenced ×1

Root Cause

In litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py, both the sync (__next__, ~line 130) and async (__anext__, ~line 306) paths handle content-block transitions like this:

if should_start_new_block and not self.sent_content_block_finish:
    # Queue the sequence: content_block_stop -> content_block_start
    # The trigger chunk itself is not emitted as a delta since the
    # content_block_start already carries the relevant information.
    self.chunk_queue.append({"type": "content_block_stop", ...})
    self.chunk_queue.append({
        "type": "content_block_start",
        "index": self.current_content_block_index,
        "content_block": self.current_content_block_start,
    })
    self.sent_content_block_finish = False
    return self.chunk_queue.popleft()

The comment claims "the content_block_start already carries the relevant information". This is not true: _translate_streaming_openai_chunk_to_anthropic_content_block() in transformation.py:1377-1378 returns an empty TextBlock on text transitions:

elif choice.delta.content is not None and len(choice.delta.content) > 0:
    return "text", TextBlock(type="text", text="")   # ← empty!

So self.current_content_block_start only contains {"type": "text", "text": ""} — the actual text content from the trigger chunk is computed into processed_chunk but then discarded. The same happens for the first thinking chunk.

Fix Action

Fix / Workaround

Streaming response on unpatched v1.83.1 — missing leading characters:

event: content_block_start   index=2 text=""
event: content_block_delta   text="녕하세요,"          ← "안" and leading "\n\n" lost
event: content_block_delta   text=" 저는 MiniMax입니다."
event: content_block_stop

After applying this patch locally to vanilla 1.83.1 the same request returns the full '\n\n안녕하세요, 저는 MiniMax입니다.' text, and claude -p gets a complete response instead of the truncated / empty one.

PR fix notes

PR #25216: fix(anthropic_adapter): preserve first chunk on content-block transitions

Repository: BerriAI/litellm
Author: dkssudgo112
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/25216

Description (problem / solution / changelog)

Relevant issues

Fixes #25214

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have added testing in tests/test_litellm/ — new regression suite tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_first_chunk_on_block_transition.py plus updates to test_parallel_tool_calls.py
My PR passes all unit tests in the affected directory (tests/test_litellm/llms/anthropic/experimental_pass_through/: 179 passed)
My PR's scope is as isolated as possible — it only fixes the first-chunk-drop bug on content-block transitions in AnthropicStreamWrapper
I have requested a Greptile review by commenting @greptileai — will do after the PR is open

Type

🐛 Bug Fix ✅ Test

Changes

Problem

The /v1/messages endpoint's AnthropicStreamWrapper silently dropped the trigger chunk of every new content block whenever a block transition was detected (text → thinking, thinking → text, text → tool_use). On the wire, the wrapper emitted content_block_stop → content_block_start and then returned early, discarding the processed_chunk computed from the triggering delta.

The old comment claimed "the content_block_start already carries the relevant information", but _translate_streaming_openai_chunk_to_anthropic_content_block() actually returns an empty body for text transitions:

elif choice.delta.content is not None and len(choice.delta.content) > 0:
    return "text", TextBlock(type="text", text="")   # ← empty!

So the first characters of every new text/thinking block were permanently lost.

Symptoms (reproduced on `main` @ v1.83.1)

Using /v1/messages with stream=true against a Bedrock Converse reasoning model (minimax.minimax-m2.5, moonshotai.kimi-k2.5, Claude extended thinking):

Responses start mid-sentence (leading characters missing).
If the model emits the text as a single Bedrock chunk, the text block is streamed with zero content_block_delta events — clients like claude -p (Claude Code CLI) see an empty response.
Non-streaming on the same deployment returns the full text correctly, so it's strictly a stream-translation regression.

Raw SSE diff (prompt: `Respond with exactly: 안녕하세요, 저는 MiniMax입니다.`)

Non-streaming (correct):

\n\n안녕하세요, 저는 MiniMax입니다.

Streaming on main:

event: content_block_start  index=2 content_block={"type": "text", "text": ""}
event: content_block_delta  delta={"type": "text_delta", "text": "녕하세요,"}        ← leading "\n\n안" lost
event: content_block_delta  delta={"type": "text_delta", "text": " 저는 MiniMax입니다."}
event: content_block_stop

Streaming with this PR:

event: content_block_start  index=2 content_block={"type": "text", "text": ""}
event: content_block_delta  delta={"type": "text_delta", "text": "\n\n안녕하세요, 저는 MiniMax입니다."}  ← complete
event: content_block_stop

Fix

In litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py, after emitting the synthetic content_block_stop → content_block_start pair on a detected block transition, also enqueue processed_chunk whenever it is a non-empty content_block_delta. A tiny helper _trigger_delta_has_content() inspects the four Anthropic delta variants (text, thinking, partial_json, signature) so that empty tool_use openers (whose tool name is already carried by content_block_start) are intentionally skipped and existing tool-call test expectations are preserved.

Applied to both the sync __next__ path and the async __anext__ path.

Tests

New: tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_first_chunk_on_block_transition.py
- Sync and async regression tests that drive a text → thinking → text sequence through AnthropicStreamWrapper with a mocked ModelResponseStream. They assert both the concatenated content of each block and the event ordering (content_block_start immediately followed by the trigger chunk's delta).
- Verified they fail on main and pass with this PR (both sync and async).
Updated: tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_parallel_tool_calls.py::test_anthropic_stream_wrapper_interleaved_tool_calls_and_text
- The previous expected sequence was missing the two content_block_delta events for the interleaved text chunks (the wrapper was dropping them). The test now expects those deltas and also asserts the text content round-trips verbatim.
The full tests/test_litellm/llms/anthropic/experimental_pass_through/ suite (179 tests) is green with the fix.

Scope

Only touches one production file (streaming_iterator.py) and the /v1/messages experimental pass-through streaming path.
No behaviour change for /chat/completions, non-streaming /v1/messages, or for providers that don't transition between content-block types.
No API surface / public contract changes — downstream clients that already handled content_block_delta events keep working, and clients that previously saw truncated or empty text now receive the full content.

Changed files

litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py (modified, +56/-4)
tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_first_chunk_on_block_transition.py (added, +221/-0)
tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_parallel_tool_calls.py (modified, +17/-0)

Code Example

if should_start_new_block and not self.sent_content_block_finish:
    # Queue the sequence: content_block_stop -> content_block_start
    # The trigger chunk itself is not emitted as a delta since the
    # content_block_start already carries the relevant information.
    self.chunk_queue.append({"type": "content_block_stop", ...})
    self.chunk_queue.append({
        "type": "content_block_start",
        "index": self.current_content_block_index,
        "content_block": self.current_content_block_start,
    })
    self.sent_content_block_finish = False
    return self.chunk_queue.popleft()

---

elif choice.delta.content is not None and len(choice.delta.content) > 0:
    return "text", TextBlock(type="text", text="")   # ← empty!

---

Respond with exactly: 안녕하세요, 저는 MiniMax입니다.

---

\n\n안녕하세요, 저는 MiniMax입니다.

---

event: content_block_start   index=2 text=""
event: content_block_delta   text="녕하세요,"          ← "안" and leading "\n\n" lost
event: content_block_delta   text=" 저는 MiniMax입니다."
event: content_block_stop

---

if should_start_new_block and not self.sent_content_block_finish:
    self.chunk_queue.append({"type": "content_block_stop", ...})
    self.chunk_queue.append({"type": "content_block_start", ...})
    # FIX: emit the trigger chunk's delta so the first character isn't lost
    if processed_chunk and processed_chunk.get("type") == "content_block_delta":
        self.chunk_queue.append(processed_chunk)
    self.sent_content_block_finish = False
    return self.chunk_queue.popleft()

---

model_list:
     - model_name: asf/minimax.minimax-m2.5
       litellm_params:
         model: bedrock/converse/minimax.minimax-m2.5
         aws_region_name: us-east-1

RAW_BUFFERClick to expand / collapse

What happened?

When using the /v1/messages endpoint with a streaming request against a Bedrock Converse model that emits reasoning/thinking content (e.g. minimax.minimax-m2.5, moonshotai.kimi-k2.5, Claude extended thinking models), the first chunk of content emitted after each content-block transition is silently dropped.

Concrete symptoms:

The leading characters of text responses are missing (users see responses starting mid-sentence).
In some cases the text block is emitted with zero content_block_delta events even though the model actually produced text — so clients like Claude Code see an empty response.
Non-streaming (stream=false) is unaffected and returns full content correctly.

Root cause

if should_start_new_block and not self.sent_content_block_finish:
    # Queue the sequence: content_block_stop -> content_block_start
    # The trigger chunk itself is not emitted as a delta since the
    # content_block_start already carries the relevant information.
    self.chunk_queue.append({"type": "content_block_stop", ...})
    self.chunk_queue.append({
        "type": "content_block_start",
        "index": self.current_content_block_index,
        "content_block": self.current_content_block_start,
    })
    self.sent_content_block_finish = False
    return self.chunk_queue.popleft()

elif choice.delta.content is not None and len(choice.delta.content) > 0:
    return "text", TextBlock(type="text", text="")   # ← empty!

Relevant log output

Running a streaming request against a Bedrock minimax.minimax-m2.5 via /v1/messages:

Input:

Respond with exactly: 안녕하세요, 저는 MiniMax입니다.

Non-streaming response (correct):

\n\n안녕하세요, 저는 MiniMax입니다.

Streaming response on unpatched v1.83.1 — missing leading characters:

event: content_block_start   index=2 text=""
event: content_block_delta   text="녕하세요,"          ← "안" and leading "\n\n" lost
event: content_block_delta   text=" 저는 MiniMax입니다."
event: content_block_stop

In another reproduction on the same version the text block had zero deltas (full content dropped).

Twitter / LinkedIn details

No response

Are you a ML Ops Team?

What LiteLLM version are you on?

v1.83.1 (also reproduces on 1.83.0)

Twitter / LinkedIn details

No response

Proposed fix

Enqueue the processed chunk (which is a content_block_delta) after emitting content_block_stop / content_block_start, instead of discarding it. Verified fix:

if should_start_new_block and not self.sent_content_block_finish:
    self.chunk_queue.append({"type": "content_block_stop", ...})
    self.chunk_queue.append({"type": "content_block_start", ...})
    # FIX: emit the trigger chunk's delta so the first character isn't lost
    if processed_chunk and processed_chunk.get("type") == "content_block_delta":
        self.chunk_queue.append(processed_chunk)
    self.sent_content_block_finish = False
    return self.chunk_queue.popleft()

Same fix needed in both the sync (__next__) and async (__anext__) paths (~line 130 and ~line 306 in streaming_iterator.py).

Reproduction

Configure a proxy with a Bedrock Converse reasoning model:

model_list:
  - model_name: asf/minimax.minimax-m2.5
    litellm_params:
      model: bedrock/converse/minimax.minimax-m2.5
      aws_region_name: us-east-1

POST to /v1/messages with stream: true, any prompt asking for a short Korean response.
Observe that content_block_delta events are missing the first chunk of both the thinking and the text blocks.

Using claude -p (Claude Code CLI) pointed at the proxy, the response is either truncated (missing leading characters) or completely empty depending on how Bedrock chunks the response.

extent analysis

TL;DR

Apply the proposed fix to enqueue the processed chunk after emitting content_block_stop / content_block_start in both sync and async paths of streaming_iterator.py.

Guidance

Identify the lines of code in streaming_iterator.py that need modification (~line 130 for sync and ~line 306 for async).
Apply the fix by adding a conditional statement to enqueue the processed_chunk if it's a content_block_delta.
Verify the fix by running a streaming request against a Bedrock Converse model and checking for complete responses.
Test with different prompts and models to ensure the fix is robust.

Example

The proposed fix is already provided in the issue:

if should_start_new_block and not self.sent_content_block_finish:
    self.chunk_queue.append({"type": "content_block_stop", ...})
    self.chunk_queue.append({"type": "content_block_start", ...})
    # FIX: emit the trigger chunk's delta so the first character isn't lost
    if processed_chunk and processed_chunk.get("type") == "content_block_delta":
        self.chunk_queue.append(processed_chunk)
    self.sent_content_block_finish = False
    return self.chunk_queue.popleft()

Notes

This fix assumes that the issue is specific to the streaming_iterator.py file and that the proposed fix is correct. Additional testing and verification may be necessary to ensure the fix works in all scenarios.

Recommendation

Apply the workaround by modifying the streaming_iterator.py file as described in the proposed fix. This should resolve the issue of missing leading characters in streaming responses.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #installation #tensor shape #autograd error #model save/load

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: /v1/messages streaming drops first chunk on content-block transitions (Bedrock reasoning models) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #25216: fix(anthropic_adapter): preserve first chunk on content-block transitions

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Type

Changes

Problem

Symptoms (reproduced on main @ v1.83.1)

Raw SSE diff (prompt: Respond with exactly: 안녕하세요, 저는 MiniMax입니다.)

Fix

Tests

Scope

Changed files

Code Example

What happened?

Root cause

Relevant log output

Twitter / LinkedIn details

Are you a ML Ops Team?

What LiteLLM version are you on?

Twitter / LinkedIn details

Proposed fix

Reproduction

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Symptoms (reproduced on `main` @ v1.83.1)

Raw SSE diff (prompt: `Respond with exactly: 안녕하세요, 저는 MiniMax입니다.`)