litellm - ✅(Solved) Fix [Bug]: Bedrock Claude extended thinking streams mixed choice indices (0 and 1), breaking OpenAI compatibility [2 pull requests, 1 comments, 2 participants]

litellm2026-03-09 21:01:49

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#23178•Fetched 2026-04-08 00:38:08

View on GitHub

Comments

Participants

Timeline

Reactions

Author

rodlaf

Participants

rodlaf

weiguangli-io

Timeline (top)

labeled ×3cross-referenced ×2referenced ×2commented ×1

Root Cause

Standard OpenAI-compatible streaming clients (which concatenate chunks by asserting chunk.choices[0].index remains consistent, or by using zip()) crash because they expect a single generation to maintain a consistent index (usually 0) throughout the stream.

Code Example

import litellm

# Target Bedrock model supporting extended thinking
model = "bedrock/anthropic.claude-sonnet-4-5-20250929-v1:0"

print("Starting stream...")

response = litellm.completion(
    model=model,
    messages=[{"role": "user", "content": "What is 27 * 453? Think step by step."}],
    max_tokens=4000, # Must be larger than budget_tokens
    stream=True,
    # Explicitly trigger Bedrock extended thinking
    thinking={
        "type": "enabled",
        "budget_tokens": 2048
    }
)

indices_seen = set()

for chunk in response:
    if chunk.choices:
        idx = chunk.choices[0].index
        if idx not in indices_seen:
            print(f"\n--- Switched to Choice Index: {idx} ---")
            indices_seen.add(idx)
        
        # Print a snippet of the content to show the transition
        delta = chunk.choices[0].delta
        content = getattr(delta, "content", None) or getattr(delta, "reasoning_content", "")
        if content:
            print(content, end="", flush=True)

print(f"\n\nTest Finished. Unique Choice Indices seen (n=1): {indices_seen}")

---

event: message_start
data: {"type": "message_start", "message": {"id": "msg_01...", "type": "message", "role": "assistant", "content": [], "model": "claude-3-7-sonnet-20250219", "stop_reason": null, "stop_sequence": null}}

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "thinking", "thinking": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "Let me solve this step by step:\n\n1. First break down 27 * 453"}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "\n2. 453 = 400 + 50 + 3"}}

// Additional thinking deltas...

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "signature_delta", "signature": "EqQBCgIYAhIM1gbcDa9GJwZA2b3hGgxBdjrkzLoky3dl1pkiMOYds..."}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "text", "text": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 1, "delta": {"type": "text_delta", "text": "27 * 453 = 12,231"}}

// Additional text deltas...

event: content_block_stop
data: {"type": "content_block_stop", "index": 1}

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence": null}}

event: message_stop
data: {"type": "message_stop"}

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When streaming responses using Bedrock's Claude Sonnet 4.5 with extended thinking enabled (e.g., reasoning_effort="low" or using the thinking param), the stream emits ChoiceDelta chunks with mixed indices.

Specifically, the stream yields chunks with index=0 for the reasoning/thinking blocks, but then switches to index=1 for the actual text content blocks.

Note: This discrepancy appears specific to the Bedrock routing. If we use the direct Anthropic provider (anthropic/claude-sonnet-4-5-20250929), LiteLLM correctly keeps all chunks on index=0.

Expected Behavior

LiteLLM should normalize the Bedrock stream to ensure all chunks for a single generated response belong to index=0 to maintain strict OpenAI compatibility.

Thinking blocks should map to the reasoning fields (e.g., delta.reasoning_content or provider_specific_fields), and standard text should map to delta.content—but both should remain under chunk.choices[0].index == 0.

Steps to Reproduce

Can be reproduced with simple completion request:

import litellm

# Target Bedrock model supporting extended thinking
model = "bedrock/anthropic.claude-sonnet-4-5-20250929-v1:0"

print("Starting stream...")

response = litellm.completion(
    model=model,
    messages=[{"role": "user", "content": "What is 27 * 453? Think step by step."}],
    max_tokens=4000, # Must be larger than budget_tokens
    stream=True,
    # Explicitly trigger Bedrock extended thinking
    thinking={
        "type": "enabled",
        "budget_tokens": 2048
    }
)

indices_seen = set()

for chunk in response:
    if chunk.choices:
        idx = chunk.choices[0].index
        if idx not in indices_seen:
            print(f"\n--- Switched to Choice Index: {idx} ---")
            indices_seen.add(idx)
        
        # Print a snippet of the content to show the transition
        delta = chunk.choices[0].delta
        content = getattr(delta, "content", None) or getattr(delta, "reasoning_content", "")
        if content:
            print(content, end="", flush=True)

print(f"\n\nTest Finished. Unique Choice Indices seen (n=1): {indices_seen}")

Relevant log output

From https://docs.aws.amazon.com/bedrock/latest/userguide/claude-messages-extended-thinking.html#claude-messages-use-extended-thinking

event: message_start
data: {"type": "message_start", "message": {"id": "msg_01...", "type": "message", "role": "assistant", "content": [], "model": "claude-3-7-sonnet-20250219", "stop_reason": null, "stop_sequence": null}}

event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "thinking", "thinking": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "Let me solve this step by step:\n\n1. First break down 27 * 453"}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "\n2. 453 = 400 + 50 + 3"}}

// Additional thinking deltas...

event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "signature_delta", "signature": "EqQBCgIYAhIM1gbcDa9GJwZA2b3hGgxBdjrkzLoky3dl1pkiMOYds..."}}

event: content_block_stop
data: {"type": "content_block_stop", "index": 0}

event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "text", "text": ""}}

event: content_block_delta
data: {"type": "content_block_delta", "index": 1, "delta": {"type": "text_delta", "text": "27 * 453 = 12,231"}}

// Additional text deltas...

event: content_block_stop
data: {"type": "content_block_stop", "index": 1}

event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence": null}}

event: message_stop
data: {"type": "message_stop"}

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

1.81.12-stable

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To normalize the Bedrock stream and ensure all chunks for a single generated response belong to index=0, we need to modify the LiteLLM proxy to handle the content_block_start and content_block_delta events.

Here are the steps:

Identify the content_block_start event and check its index.
If the index is not 0, update the index to 0 in the subsequent content_block_delta events.
Ensure that the chunk.choices[0].index remains consistent throughout the stream.

Example code snippet:

def normalize_bedrock_stream(response):
    index = 0
    for chunk in response:
        if chunk.choices:
            delta = chunk.choices[0].delta
            if hasattr(delta, "content") or hasattr(delta, "reasoning_content"):
                # Update the index to 0 if it's not already
                chunk.choices[0].index = 0
            yield chunk

You can use this function to wrap the litellm.completion response:

response = litellm.completion(
    # ... (rest of the code remains the same)
)

normalized_response = normalize_bedrock_stream(response)
for chunk in normalized_response:
    # Process the normalized chunks
    if chunk.choices:
        idx = chunk.choices[0].index
        delta = chunk.choices[0].delta
        content = getattr(delta, "content", None) or getattr(delta, "reasoning_content", "")
        if content:
            print(content, end="", flush=True)

Verification

To verify that the fix worked, you can check the chunk.choices[0].index value throughout the stream. It should remain consistent at 0.

Extra Tips

Make sure to test the modified code with different input scenarios to ensure that it works correctly in all cases. Additionally, you may want to consider adding error handling to handle any unexpected events or errors that may occur during the stream processing.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #callback error #memory management #API rate limit #retriever error #indexing error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: Bedrock Claude extended thinking streams mixed choice indices (0 and 1), breaking OpenAI compatibility [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #23248: fix(bedrock): normalise streaming choice index=0 for extended-thinkin…

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Changed files

PR #23249: fix(bedrock): add regression tests for extended-thinking streaming ch…