litellm - ✅(Solved) Fix [Bug]: Bedrock Claude extended thinking streams mixed choice indices (0 and 1), breaking OpenAI compatibility [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23178Fetched 2026-04-08 00:38:08
View on GitHub
Comments
1
Participants
2
Timeline
9
Reactions
0
Author
Timeline (top)
labeled ×3cross-referenced ×2referenced ×2commented ×1

Root Cause

Standard OpenAI-compatible streaming clients (which concatenate chunks by asserting chunk.choices[0].index remains consistent, or by using zip()) crash because they expect a single generation to maintain a consistent index (usually 0) throughout the stream.

Fix Action

Fixed

PR fix notes

PR #23248: fix(bedrock): normalise streaming choice index=0 for extended-thinkin…

Description (problem / solution / changelog)

…g blocks (issue #23178)

When Claude extended-thinking is enabled on Bedrock the converse API emits two content-block types in the same response: contentBlockIndex=0 → reasoning / thinking block contentBlockIndex=1 → text block

The existing converse_chunk_parser already hardcodes StreamingChoices(index=0) for every event (tool-calls fix from #22867), so the normalisation is already in place for the converse path. The AmazonAnthropicClaudeStreamDecoder (invoke/anthropic path) likewise always sets index=0 via AnthropicModelResponseIterator.chunk_parser.

This commit adds explicit regression tests for both paths covering the full thinking-block event sequence (start, delta, signature, stop) and the subsequent text-block events that arrive on contentBlockIndex=1, ensuring choices[0].index is always 0 and OpenAI-compatible clients do not crash.

Relevant issues

<!-- e.g. "Fixes #000" -->

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

<!-- Select the type of Pull Request --> <!-- Keep only the necessary ones -->

🆕 New Feature 🐛 Bug Fix 🧹 Refactoring 📖 Documentation 🚄 Infrastructure ✅ Test

Changes

Changed files

  • tests/test_litellm/llms/bedrock/chat/test_streaming_choice_index.py (modified, +135/-1)

PR #23249: fix(bedrock): add regression tests for extended-thinking streaming ch…

Description (problem / solution / changelog)

…oice index (#23178)

Relevant issues

<!-- e.g. "Fixes #000" -->

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

<!-- Select the type of Pull Request --> <!-- Keep only the necessary ones -->

🆕 New Feature 🐛 Bug Fix 🧹 Refactoring 📖 Documentation 🚄 Infrastructure ✅ Test

Changes

Changed files

  • tests/test_litellm/llms/bedrock/chat/test_streaming_choice_index.py (modified, +135/-1)

Code Example

import litellm

# Target Bedrock model supporting extended thinking
model = "bedrock/anthropic.claude-sonnet-4-5-20250929-v1:0"

print("Starting stream...")

response = litellm.completion(
    model=model,
    messages=[{"role": "user", "content": "What is 27 * 453? Think step by step."}],
    max_tokens=4000, # Must be larger than budget_tokens
    stream=True,
    # Explicitly trigger Bedrock extended thinking
    thinking={
        "type": "enabled",
        "budget_tokens": 2048
    }
)

indices_seen = set()

for chunk in response:
    if chunk.choices:
        idx = chunk.choices[0].index
        if idx not in indices_seen:
            print(f"\n--- Switched to Choice Index: {idx} ---")
            indices_seen.add(idx)
        
        # Print a snippet of the content to show the transition
        delta = chunk.choices[0].delta
        content = getattr(delta, "content", None) or getattr(delta, "reasoning_content", "")
        if content:
            print(content, end="", flush=True)

print(f"\n\nTest Finished. Unique Choice Indices seen (n=1): {indices_seen}")

---

event: message_start
data: {"type": "message_start", "message": {"id": "msg_01...", "type": "message", "role": "assistant", "content": [], "model": "claude-3-7-sonnet-20250219", "stop_reason": null, "stop_sequence": null}}

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "thinking", "thinking": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "Let me solve this step by step:\n\n1. First break down 27 * 453"}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "\n2. 453 = 400 + 50 + 3"}}

// Additional thinking deltas...

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "signature_delta", "signature": "EqQBCgIYAhIM1gbcDa9GJwZA2b3hGgxBdjrkzLoky3dl1pkiMOYds..."}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "text", "text": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 1, "delta": {"type": "text_delta", "text": "27 * 453 = 12,231"}}

// Additional text deltas...

event: content_block_stop
data: {"type": "content_block_stop", "index": 1}

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence": null}}

event: message_stop
data: {"type": "message_stop"}
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When streaming responses using Bedrock's Claude Sonnet 4.5 with extended thinking enabled (e.g., reasoning_effort="low" or using the thinking param), the stream emits ChoiceDelta chunks with mixed indices.

Specifically, the stream yields chunks with index=0 for the reasoning/thinking blocks, but then switches to index=1 for the actual text content blocks.

Standard OpenAI-compatible streaming clients (which concatenate chunks by asserting chunk.choices[0].index remains consistent, or by using zip()) crash because they expect a single generation to maintain a consistent index (usually 0) throughout the stream.

Note: This discrepancy appears specific to the Bedrock routing. If we use the direct Anthropic provider (anthropic/claude-sonnet-4-5-20250929), LiteLLM correctly keeps all chunks on index=0.

Expected Behavior

LiteLLM should normalize the Bedrock stream to ensure all chunks for a single generated response belong to index=0 to maintain strict OpenAI compatibility.

Thinking blocks should map to the reasoning fields (e.g., delta.reasoning_content or provider_specific_fields), and standard text should map to delta.content—but both should remain under chunk.choices[0].index == 0.

Steps to Reproduce

Can be reproduced with simple completion request:

import litellm

# Target Bedrock model supporting extended thinking
model = "bedrock/anthropic.claude-sonnet-4-5-20250929-v1:0"

print("Starting stream...")

response = litellm.completion(
    model=model,
    messages=[{"role": "user", "content": "What is 27 * 453? Think step by step."}],
    max_tokens=4000, # Must be larger than budget_tokens
    stream=True,
    # Explicitly trigger Bedrock extended thinking
    thinking={
        "type": "enabled",
        "budget_tokens": 2048
    }
)

indices_seen = set()

for chunk in response:
    if chunk.choices:
        idx = chunk.choices[0].index
        if idx not in indices_seen:
            print(f"\n--- Switched to Choice Index: {idx} ---")
            indices_seen.add(idx)
        
        # Print a snippet of the content to show the transition
        delta = chunk.choices[0].delta
        content = getattr(delta, "content", None) or getattr(delta, "reasoning_content", "")
        if content:
            print(content, end="", flush=True)

print(f"\n\nTest Finished. Unique Choice Indices seen (n=1): {indices_seen}")

Relevant log output

From https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-use-extended-thinking

event: message_start
data: {"type": "message_start", "message": {"id": "msg_01...", "type": "message", "role": "assistant", "content": [], "model": "claude-3-7-sonnet-20250219", "stop_reason": null, "stop_sequence": null}}

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "thinking", "thinking": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "Let me solve this step by step:\n\n1. First break down 27 * 453"}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "\n2. 453 = 400 + 50 + 3"}}

// Additional thinking deltas...

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "signature_delta", "signature": "EqQBCgIYAhIM1gbcDa9GJwZA2b3hGgxBdjrkzLoky3dl1pkiMOYds..."}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "text", "text": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 1, "delta": {"type": "text_delta", "text": "27 * 453 = 12,231"}}

// Additional text deltas...

event: content_block_stop
data: {"type": "content_block_stop", "index": 1}

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence": null}}

event: message_stop
data: {"type": "message_stop"}

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

1.81.12-stable

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To normalize the Bedrock stream and ensure all chunks for a single generated response belong to index=0, we need to modify the LiteLLM proxy to handle the content_block_start and content_block_delta events.

Here are the steps:

  • Identify the content_block_start event and check its index.
  • If the index is not 0, update the index to 0 in the subsequent content_block_delta events.
  • Ensure that the chunk.choices[0].index remains consistent throughout the stream.

Example code snippet:

def normalize_bedrock_stream(response):
    index = 0
    for chunk in response:
        if chunk.choices:
            delta = chunk.choices[0].delta
            if hasattr(delta, "content") or hasattr(delta, "reasoning_content"):
                # Update the index to 0 if it's not already
                chunk.choices[0].index = 0
            yield chunk

You can use this function to wrap the litellm.completion response:

response = litellm.completion(
    # ... (rest of the code remains the same)
)

normalized_response = normalize_bedrock_stream(response)
for chunk in normalized_response:
    # Process the normalized chunks
    if chunk.choices:
        idx = chunk.choices[0].index
        delta = chunk.choices[0].delta
        content = getattr(delta, "content", None) or getattr(delta, "reasoning_content", "")
        if content:
            print(content, end="", flush=True)

Verification

To verify that the fix worked, you can check the chunk.choices[0].index value throughout the stream. It should remain consistent at 0.

Extra Tips

Make sure to test the modified code with different input scenarios to ensure that it works correctly in all cases. Additionally, you may want to consider adding error handling to handle any unexpected events or errors that may occur during the stream processing.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING