litellm - ✅(Solved) Fix [Bug]: Thinking blocks corrupted on round-trip when assistant performs multiple web searches [3 pull requests, 4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23047Fetched 2026-04-08 00:38:42
View on GitHub
Comments
4
Participants
2
Timeline
15
Reactions
0
Author
Participants
Timeline (top)
commented ×4cross-referenced ×4labeled ×3referenced ×3

Error Message

from litellm import completion

m = 'claude-sonnet-4-6' msgs = [{'role': 'user', 'content': 'Search the web for the latest news about fast.ai and answer.ai'}] r = completion(m, msgs, web_search_options={"search_context_size": "low"}, reasoning_effort='low')

Confirm thinking + multiple tool calls present

m1 = r.choices[0].message assert m1.thinking_blocks, "No thinking blocks — retry until model thinks" assert len(m1.tool_calls) >= 2, f"Need 2+ web searches, got {len(m1.tool_calls or [])}"

Round-trip: pass message back unmodified

msgs.append(m1) msgs.append({'role': 'user', 'content': 'Now search for news about solveit'}) r2 = completion(m, msgs, web_search_options={"search_context_size": "low"}, reasoning_effort='low')

^^^ raises BadRequestError: thinking blocks cannot be modified

Root Cause

In litellm/litellm_core_utils/prompt_templates/factory.py, anthropic_messages_pt rebuilds the assistant content array by:

  1. Prepending all thinking_blocks first
  2. Then appending text blocks
  3. Then appending server_tool_use + web_search_tool_result blocks

But the original Anthropic response interleaves thinking blocks between tool use/result blocks (e.g. [thinking_1, server_tool_use_1, web_search_result_1, thinking_2, text, server_tool_use_2, web_search_result_2]). The reconstructed order differs, which breaks Anthropic's thinking block signature verification.

With a single web search, thinking blocks happen to end up in the right relative position. With 2+, the reordering is detected.

Fix Action

Fixed

PR fix notes

PR #23093: fix: preserve thinking block order with multiple web searches

Description (problem / solution / changelog)

Note: This PR was authored by Claude (AI), operated by @maxwellcalkin.

Summary

Fixes #23047

When using Claude with extended thinking and web search, if the model performs 2+ web searches in a single turn, the next completion() call fails with:

thinking or redacted_thinking blocks in the latest assistant message cannot be modified

Root cause

In anthropic_messages_pt, assistant content was reconstructed by:

  1. Prepending all thinking_blocks first
  2. Then appending text blocks
  3. Then appending server_tool_use + web_search_tool_result blocks

But Anthropic's original response interleaves thinking blocks between tool use/result blocks:

[thinking_1, server_tool_use_1, web_search_result_1, thinking_2, text, server_tool_use_2, web_search_result_2]

The reconstructed order differs, which breaks Anthropic's thinking block signature verification.

Fix

When both thinking_blocks and server tool calls (srvtoolu_*) are present on an assistant message, the code now interleaves them instead of separating them:

  • Each thinking block is paired with its corresponding server tool use group (server_tool_use + tool_result)
  • Extra thinking blocks (if more than tool groups) are emitted before the text block
  • Extra tool groups (if more than thinking blocks) are emitted without a preceding thinking block
  • Regular (non-server) tool calls are appended at the end
  • When no server tool calls are present, the existing sequential behavior is preserved

Changes

  • litellm/litellm_core_utils/prompt_templates/factory.py: Added interleaved mode for thinking blocks + server tool calls in anthropic_messages_pt
  • tests/llm_translation/test_prompt_factory.py: Added 3 tests covering the interleaving fix, backward compatibility, and edge cases

Test plan

  • test_anthropic_messages_pt_interleave_thinking_with_server_tool_calls — verifies correct interleaved order with 2 web searches
  • test_anthropic_messages_pt_thinking_blocks_no_server_tools_unchanged — verifies existing behavior preserved when only regular tool_use
  • test_anthropic_messages_pt_interleave_more_thinking_than_tool_groups — verifies handling of more thinking blocks than tool groups

Changed files

  • litellm/litellm_core_utils/prompt_templates/factory.py (modified, +241/-58)
  • tests/llm_translation/test_prompt_factory.py (modified, +343/-0)

PR #23137: fix(anthropic): preserve interleaved thinking block order on round-trip (#23047)

Description (problem / solution / changelog)

Summary

Fixes https://github.com/BerriAI/litellm/issues/23047

When Anthropic responses contain interleaved thinking blocks, tool use, and web search results, the anthropic_messages_pt function was corrupting the block order by unconditionally prepending thinking_blocks before processing the content list. This caused duplicate thinking blocks and incorrect ordering on multi-turn round-trips.

Problem

The anthropic_messages_pt function in factory.py always prepended thinking_blocks (from the top-level field) before iterating over the content list. When the content list already contained interleaved thinking, server_tool_use, and *_tool_result blocks (as returned by Anthropic for web search / tool use with extended thinking), this resulted in:

  1. Duplicate thinking blocks — the same thinking block appeared twice (once from thinking_blocks field, once from content list)
  2. Corrupted ordering — all thinking blocks were moved to the front, breaking the original interleaved sequence

Solution

  • Added a check: if the content list already contains type: "thinking" blocks, skip prepending thinking_blocks to avoid duplication and preserve the original interleaved order.
  • Added pass-through handling for server_tool_use and *_tool_result content blocks (e.g. web_search_tool_result, tool_search_tool_result, bash_code_execution_tool_result).
  • Added handling for redacted_thinking blocks.

Changes

  • litellm/litellm_core_utils/prompt_templates/factory.py: Modified anthropic_messages_pt to detect and skip duplicate thinking block insertion; added pass-through for server_tool_use, *_tool_result, and redacted_thinking block types.
  • tests/test_litellm/litellm_core_utils/prompt_templates/test_litellm_core_utils_prompt_templates_factory.py: Added 3 new unit tests covering interleaved thinking + tool use ordering, thinking_blocks-only fallback, and single web search with thinking.

Testing

All 3 new tests pass:

pytest tests/test_litellm/litellm_core_utils/prompt_templates/test_litellm_core_utils_prompt_templates_factory.py::test_anthropic_messages_pt_preserves_interleaved_thinking_and_tool_use_order
pytest tests/test_litellm/litellm_core_utils/prompt_templates/test_litellm_core_utils_prompt_templates_factory.py::test_anthropic_messages_pt_thinking_blocks_field_only_no_content_list
pytest tests/test_litellm/litellm_core_utils/prompt_templates/test_litellm_core_utils_prompt_templates_factory.py::test_anthropic_messages_pt_single_web_search_with_thinking

Changed files

  • litellm/litellm_core_utils/prompt_templates/factory.py (modified, +22/-1)
  • tests/test_litellm/litellm_core_utils/prompt_templates/test_litellm_core_utils_prompt_templates_factory.py (modified, +257/-0)

PR #23276: Litellm oss staging 03 10 2026

Description (problem / solution / changelog)

Relevant issues

<!-- e.g. "Fixes #000" -->

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

<!-- Select the type of Pull Request --> <!-- Keep only the necessary ones -->

🆕 New Feature 🐛 Bug Fix 🧹 Refactoring 📖 Documentation 🚄 Infrastructure ✅ Test

Changes

Changed files

  • CLAUDE.md (modified, +7/-0)
  • docs/my-website/docs/apply_guardrail.md (modified, +1/-0)
  • docs/my-website/docs/completion/output.md (modified, +22/-0)
  • docs/my-website/docs/contributing/adding_openai_compatible_providers.md (modified, +40/-2)
  • docs/my-website/docs/mcp_guardrail.md (modified, +1/-0)
  • docs/my-website/docs/provider_registration/add_model_pricing.md (modified, +26/-1)
  • docs/my-website/docs/proxy/guardrails/panw_prisma_airs.md (modified, +129/-457)
  • litellm-proxy-extras/litellm_proxy_extras/migrations/20260309115809_add_missing_indexes/migration.sql (added, +13/-0)
  • litellm-proxy-extras/litellm_proxy_extras/schema.prisma (modified, +6/-0)
  • litellm/caching/dual_cache.py (modified, +0/-4)
  • litellm/completion_extras/litellm_responses_transformation/transformation.py (modified, +1/-21)
  • litellm/constants.py (modified, +1/-5)
  • litellm/google_genai/adapters/transformation.py (modified, +0/-2)
  • litellm/litellm_core_utils/core_helpers.py (modified, +50/-40)
  • litellm/litellm_core_utils/duration_parser.py (modified, +6/-4)
  • litellm/litellm_core_utils/get_model_cost_map.py (modified, +60/-5)
  • litellm/litellm_core_utils/prompt_templates/factory.py (modified, +241/-58)
  • litellm/litellm_core_utils/redact_messages.py (modified, +0/-70)
  • litellm/llms/azure/chat/gpt_5_transformation.py (modified, +5/-19)
  • litellm/llms/bedrock/chat/converse_transformation.py (modified, +2/-26)
  • litellm/llms/fireworks_ai/chat/transformation.py (modified, +1/-4)
  • litellm/llms/openai/chat/gpt_5_transformation.py (modified, +18/-61)
  • litellm/llms/openai/image_edit/transformation.py (modified, +1/-0)
  • litellm/llms/openai_like/README.md (modified, +35/-3)
  • litellm/llms/openai_like/dynamic_config.py (modified, +60/-0)
  • litellm/llms/openai_like/json_loader.py (modified, +9/-0)
  • litellm/llms/openai_like/responses/__init__.py (added, +5/-0)
  • litellm/llms/openai_like/responses/transformation.py (added, +51/-0)
  • litellm/llms/perplexity/responses/transformation.py (modified, +63/-427)
  • litellm/llms/sagemaker/completion/handler.py (modified, +36/-20)
  • litellm/llms/snowflake/chat/transformation.py (modified, +11/-10)
  • litellm/llms/vertex_ai/common_utils.py (modified, +23/-0)
  • litellm/llms/vertex_ai/gemini/transformation.py (modified, +0/-6)
  • litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py (modified, +36/-27)
  • litellm/proxy/_experimental/mcp_server/openapi_to_mcp_generator.py (modified, +1/-18)
  • litellm/proxy/_experimental/mcp_server/server.py (modified, +4/-7)
  • litellm/proxy/auth/model_checks.py (modified, +6/-16)
  • litellm/proxy/credential_endpoints/endpoints.py (modified, +45/-52)
  • litellm/proxy/guardrails/guardrail_hooks/panw_prisma_airs/__init__.py (modified, +1/-1)
  • litellm/proxy/guardrails/guardrail_hooks/panw_prisma_airs/panw_prisma_airs.py (modified, +943/-96)
  • litellm/proxy/management_endpoints/team_endpoints.py (modified, +18/-0)
  • litellm/proxy/pass_through_endpoints/pass_through_endpoints.py (modified, +1/-2)
  • litellm/proxy/schema.prisma (modified, +6/-0)
  • litellm/proxy/spend_tracking/spend_management_endpoints.py (modified, +5/-15)
  • litellm/responses/litellm_completion_transformation/transformation.py (modified, +21/-89)
  • litellm/responses/main.py (modified, +6/-6)
  • litellm/router.py (modified, +0/-22)
  • litellm/router_strategy/lowest_latency.py (modified, +3/-5)
  • litellm/types/images/main.py (modified, +1/-0)
  • litellm/types/llms/openai.py (modified, +9/-1)
  • litellm/types/proxy/guardrails/guardrail_hooks/panw_prisma_airs.py (modified, +7/-0)
  • litellm/types/utils.py (modified, +5/-1)
  • litellm/utils.py (modified, +61/-21)
  • provider_endpoints_support.json (modified, +0/-18)
  • schema.prisma (modified, +6/-0)
  • tests/llm_translation/test_prompt_factory.py (modified, +343/-0)
  • tests/llm_translation/test_skills_api.py (modified, +5/-11)
  • tests/local_testing/test_custom_callback_input.py (modified, +3/-5)
  • tests/logging_callback_tests/test_logging_redaction_e2e_test.py (modified, +5/-10)
  • tests/test_litellm/caching/test_dual_cache.py (modified, +0/-103)
  • tests/test_litellm/completion_extras/litellm_responses_transformation/test_completion_extras_litellm_responses_transformation_transformation.py (modified, +1/-216)
  • tests/test_litellm/litellm_core_utils/test_core_helpers.py (modified, +106/-1)
  • tests/test_litellm/llms/azure/chat/test_azure_gpt5_transformation.py (modified, +0/-17)
  • tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py (modified, +0/-50)
  • tests/test_litellm/llms/fireworks_ai/chat/test_fireworks_ai_chat_transformation.py (modified, +0/-54)
  • tests/test_litellm/llms/openai/chat/test_openai_gpt_transformation.py (modified, +0/-193)
  • tests/test_litellm/llms/openai/test_gpt5_transformation.py (modified, +6/-71)
  • tests/test_litellm/llms/openai/test_openai_image_edit_transformation.py (modified, +48/-0)
  • tests/test_litellm/llms/openai_like/responses/__init__.py (added, +0/-0)
  • tests/test_litellm/llms/openai_like/responses/test_openai_like_responses.py (added, +341/-0)
  • tests/test_litellm/llms/perplexity/responses/test_perplexity_responses_transformation.py (modified, +257/-46)
  • tests/test_litellm/llms/sagemaker/test_sagemaker_embedding_role_assumption.py (removed, +0/-243)
  • tests/test_litellm/llms/snowflake/chat/test_snowflake_chat_transformation.py (modified, +6/-2)
  • tests/test_litellm/llms/vertex_ai/gemini/test_vertex_ai_gemini_transformation.py (modified, +0/-69)
  • tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py (modified, +14/-16)
  • tests/test_litellm/llms/vertex_ai/test_vertex_ai_common_utils.py (modified, +91/-0)
  • tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server.py (modified, +0/-147)
  • tests/test_litellm/proxy/auth/test_model_checks.py (modified, +0/-134)
  • tests/test_litellm/proxy/guardrails/guardrail_hooks/test_panw_prisma_airs.py (modified, +4112/-294)
  • tests/test_litellm/proxy/pass_through_endpoints/test_pass_through_endpoints.py (modified, +0/-39)
  • tests/test_litellm/proxy/spend_tracking/test_spend_tracking_utils.py (modified, +2/-3)
  • tests/test_litellm/proxy/test_openapi_schema_validation.py (removed, +0/-142)
  • tests/test_litellm/responses/litellm_completion_transformation/test_litellm_completion_responses.py (modified, +0/-125)
  • tests/test_litellm/test_model_cost_aliases.py (added, +238/-0)
  • tests/test_litellm/test_router_retry_non_retryable_errors.py (removed, +0/-251)
  • tests/test_litellm/types/test_types_utils.py (modified, +54/-0)
  • ui/litellm-dashboard/src/components/VirtualKeysPage/VirtualKeysTable.test.tsx (modified, +3/-79)
  • ui/litellm-dashboard/src/components/VirtualKeysPage/VirtualKeysTable.tsx (modified, +13/-35)

Code Example

messages.N.content.1: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response.

---

from litellm import completion

m = 'claude-sonnet-4-6'
msgs = [{'role': 'user', 'content': 'Search the web for the latest news about fast.ai and answer.ai'}]
r = completion(m, msgs, web_search_options={"search_context_size": "low"}, reasoning_effort='low')

# Confirm thinking + multiple tool calls present
m1 = r.choices[0].message
assert m1.thinking_blocks, "No thinking blocks — retry until model thinks"
assert len(m1.tool_calls) >= 2, f"Need 2+ web searches, got {len(m1.tool_calls or [])}"

# Round-trip: pass message back unmodified
msgs.append(m1)
msgs.append({'role': 'user', 'content': 'Now search for news about solveit'})
r2 = completion(m, msgs, web_search_options={"search_context_size": "low"}, reasoning_effort='low')
# ^^^ raises BadRequestError: thinking blocks cannot be modified

---

BadRequestError: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"messages.1.content.1: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response."},"request_id":"req_011CYpHNUZA6pBuJhf3r4uPa"}
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When using Claude with extended thinking (reasoning_effort) and web search (web_search_options), if the model performs 2+ web searches in a single turn, the next completion() call fails with:

messages.N.content.1: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response.

This happens even when passing r.choices[0].message back unmodified — litellm's internal anthropic_messages_pt reconstructs the content array in the wrong order.

Root cause

In litellm/litellm_core_utils/prompt_templates/factory.py, anthropic_messages_pt rebuilds the assistant content array by:

  1. Prepending all thinking_blocks first
  2. Then appending text blocks
  3. Then appending server_tool_use + web_search_tool_result blocks

But the original Anthropic response interleaves thinking blocks between tool use/result blocks (e.g. [thinking_1, server_tool_use_1, web_search_result_1, thinking_2, text, server_tool_use_2, web_search_result_2]). The reconstructed order differs, which breaks Anthropic's thinking block signature verification.

With a single web search, thinking blocks happen to end up in the right relative position. With 2+, the reordering is detected.

Environment

  • litellm version: 1.82.0
  • Model: claude-sonnet-4-6 (also affects other Claude models with thinking + web search)
  • Python 3.12

Steps to Reproduce

from litellm import completion

m = 'claude-sonnet-4-6'
msgs = [{'role': 'user', 'content': 'Search the web for the latest news about fast.ai and answer.ai'}]
r = completion(m, msgs, web_search_options={"search_context_size": "low"}, reasoning_effort='low')

# Confirm thinking + multiple tool calls present
m1 = r.choices[0].message
assert m1.thinking_blocks, "No thinking blocks — retry until model thinks"
assert len(m1.tool_calls) >= 2, f"Need 2+ web searches, got {len(m1.tool_calls or [])}"

# Round-trip: pass message back unmodified
msgs.append(m1)
msgs.append({'role': 'user', 'content': 'Now search for news about solveit'})
r2 = completion(m, msgs, web_search_options={"search_context_size": "low"}, reasoning_effort='low')
# ^^^ raises BadRequestError: thinking blocks cannot be modified

Expected behavior

Round-tripping r.choices[0].message should preserve the original content array ordering so Anthropic accepts it.

Relevant log output

BadRequestError: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"messages.1.content.1: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response."},"request_id":"req_011CYpHNUZA6pBuJhf3r4uPa"}

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

v1.82.0

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To resolve the issue, we need to modify the anthropic_messages_pt function in litellm/litellm_core_utils/prompt_templates/factory.py to preserve the original order of thinking blocks and tool use/result blocks.

  • Modify the anthropic_messages_pt function to interleave thinking blocks between tool use/result blocks:
def anthropic_messages_pt(messages):
    # ...
    content = []
    for block in original_message.content:
        if block.type == 'thinking':
            content.append(block)
        elif block.type == 'server_tool_use' or block.type == 'web_search_tool_result':
            # Interleave thinking blocks between tool use/result blocks
            thinking_blocks = [b for b in original_message.content if b.type == 'thinking' and b.index > len(content)]
            content.extend(thinking_blocks)
            content.append(block)
        else:
            content.append(block)
    # ...
  • Alternatively, you can also use a more straightforward approach by sorting the blocks based on their original index:
def anthropic_messages_pt(messages):
    # ...
    content = sorted(original_message.content, key=lambda x: x.index)
    # ...

Verification

To verify that the fix worked, you can run the same test code that reproduces the issue:

from litellm import completion

m = 'claude-sonnet-4-6'
msgs = [{'role': 'user', 'content': 'Search the web for the latest news about fast.ai and answer.ai'}]
r = completion(m, msgs, web_search_options={"search_context_size": "low"}, reasoning_effort='low')

# Confirm thinking + multiple tool calls present
m1 = r.choices[0].message
assert m1.thinking_blocks, "No thinking blocks — retry until model thinks"
assert len(m1.tool_calls) >= 2, f"Need 2+ web searches, got {len(m1.tool_calls or [])}"

# Round-trip: pass message back unmodified
msgs.append(m1)
msgs.append({'role': 'user', 'content': 'Now search for news about solveit'})
r2 = completion(m, msgs, web_search_options={"search_context_size": "low"}, reasoning_effort='low')

If the fix is correct, the code should no longer raise a BadRequestError.

Extra Tips

  • Make sure to test the fix thoroughly to ensure that it works for all possible scenarios.
  • Consider submitting a pull request to the LiteLLM repository to share the fix with the community.
  • If you encounter any issues or have further questions, don't hesitate to reach out to the LiteLLM support team.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Round-tripping r.choices[0].message should preserve the original content array ordering so Anthropic accepts it.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: Thinking blocks corrupted on round-trip when assistant performs multiple web searches [3 pull requests, 4 comments, 2 participants]