litellm - ✅(Solved) Fix [Bug]: `anthropic_messages()` converts thinking blocks to text blocks in requests to OpenAI Responses API [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26916Fetched 2026-05-01 05:34:24
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×3cross-referenced ×1

Error Message

from future import annotations

import asyncio import json from importlib.metadata import version as package_version from typing import Any

import httpx import litellm from litellm import anthropic_messages from litellm.types.llms.anthropic_messages.anthropic_response import AnthropicResponseThinkingBlock

_CAPTURED_BODY: object | None = None

class CapturedRequestError(RuntimeError): """Stops the request after LiteLLM has serialized the OpenAI body."""

async def _request_body(request: httpx.Request) -> bytes: try: return request.content except httpx.RequestNotRead: return await request.aread()

def _install_httpx_capture() -> None: original_send = httpx.AsyncClient.send

async def hooked_send(client: httpx.AsyncClient, request: httpx.Request, *args: Any, **kwargs: Any):
    global _CAPTURED_BODY  # noqa: PLW0603 -- script-level capture state
    if request.url.path.rstrip("/").endswith(("/messages", "/responses")):
        body = await _request_body(request)
        try:
            _CAPTURED_BODY = json.loads(body.decode("utf-8", errors="replace"))
        except json.JSONDecodeError:
            _CAPTURED_BODY = body.decode("utf-8", errors="replace")
        raise CapturedRequestError("captured OpenAI /responses request")
    return await original_send(client, request, *args, **kwargs)

httpx.AsyncClient.send = hooked_send  # ty: ignore[invalid-assignment] -- intentional request capture hook

async def main() -> None: litellm.suppress_debug_info = True # ty: ignore[invalid-assignment] -- litellm annotates this as Literal[False] _install_httpx_capture()

system = "Some system prompt"
messages = [
    {
        "role": "user",
        "content": "Some user message",
    },
    {
        "role": "assistant",
        "content": [
            # This simulates a response block from a previous call to `anthropic_messages`
            AnthropicResponseThinkingBlock(thinking="**THINKING MESSAGE**", type="thinking"),
            {
                "type": "text",
                "text": "Some other AI message",
            },
        ],
    },
    {
        "role": "user",
        "content": "Another user message",
    },
]

print(f"litellm {package_version('litellm')}")

print("ANTHROPIC MODEL: WORKS -> thinking block is sent as a thinking block")
global _CAPTURED_BODY  # noqa: PLW0603 -- script-level capture state
_CAPTURED_BODY = None
try:
    await anthropic_messages(
        model="anthropic/claude-sonnet-4-6",
        api_key="sk-capture-only",
        system=system,
        messages=messages,
        max_tokens=1024,
        thinking={"type": "enabled", "budget_tokens": 5000, "summary": "detailed"},
    )
except Exception:
    if _CAPTURED_BODY is None:
        raise
    print(json.dumps(_CAPTURED_BODY, indent=2, sort_keys=True))

print("GPT MODEL: DOES NOT WORK -> thinking block is sent as a text block instead of OpenAI reasoning block")
_CAPTURED_BODY = None
try:
    await anthropic_messages(
        model="openai/gpt-5.5",
        api_key="sk-capture-only",
        system=system,
        messages=messages,
        max_tokens=1024,
        thinking={"type": "enabled", "budget_tokens": 5000, "summary": "detailed"},
    )
except Exception:
    if _CAPTURED_BODY is None:
        raise
    print(json.dumps(_CAPTURED_BODY, indent=2, sort_keys=True))

if name == "main": asyncio.run(main())

Root Cause

Impact This can leak reasoning/thinking content into normal conversation history and send it back to OpenAI as assistant-visible text. It may also change model behavior, because the model receives prior private reasoning as if it were ordinary assistant output.

Fix Action

Fixed

PR fix notes

PR #26927: Omit Anthropic thinking blocks from Responses input

Description (problem / solution / changelog)

Summary

  • stop replaying Anthropic assistant thinking blocks as OpenAI Responses output_text
  • also omit redacted_thinking blocks from the Responses input history
  • preserve adjacent visible assistant text and existing tool-use translation
  • update adapter regression coverage for thinking-only, thinking+text, and redacted-thinking histories

Why

Anthropic thinking blocks are hidden reasoning state. The Responses adapter was serializing prior assistant thinking content as normal assistant-visible output_text, which can leak reasoning text back into a later OpenAI request and alter model behavior.

Fixes #26916.

Validation

  • PYTHONPATH=. /tmp/litellm-26913-venv/bin/python -m pytest tests/test_litellm/llms/anthropic/experimental_pass_through/responses_adapters/test_responses_adapters_transformation.py -q
    • 83 passed, 2 warnings
  • UV_NO_PROJECT=1 uvx black litellm/llms/anthropic/experimental_pass_through/responses_adapters/transformation.py tests/test_litellm/llms/anthropic/experimental_pass_through/responses_adapters/test_responses_adapters_transformation.py --check --diff
    • 2 files would be left unchanged

Note: project uv run is blocked locally because installed uv is 0.10.4 and this branch requires >=0.10.9; targeted tests ran with Python 3.13 via an existing local venv.

Changed files

  • litellm/llms/anthropic/experimental_pass_through/responses_adapters/transformation.py (modified, +4/-6)
  • litellm/proxy/_lazy_openapi_snapshot.py (modified, +30/-2)
  • tests/test_litellm/llms/anthropic/experimental_pass_through/responses_adapters/test_responses_adapters_transformation.py (modified, +42/-4)
  • tests/test_litellm/proxy/test_lazy_openapi_snapshot.py (added, +53/-0)

Code Example

{"type": "thinking", "thinking": "...", "signature": ""}

---

{
  "type": "output_text",
  "text": "**THINKING MESSAGE**"
}

---

{
  "input": [
    {
      "content": [
        {
          "text": "Some user message",
          "type": "input_text"
        }
      ],
      "role": "user",
      "type": "message"
    },
    {
      "content": [
        {
          "text": "**THINKING MESSAGE**",
          "type": "output_text"
        },
        {
          "text": "Some other AI message",
          "type": "output_text"
        }
      ],
      "role": "assistant",
      "type": "message"
    },
    {
      "content": [
        {
          "text": "Another user message",
          "type": "input_text"
        }
      ],
      "role": "user",
      "type": "message"
    }
  ],
  "instructions": "Some system prompt",
  "max_output_tokens": 1024,
  "model": "gpt-5.5",
  "reasoning": {
    "effort": "medium",
    "summary": "detailed"
  }
}

---

from __future__ import annotations

import asyncio
import json
from importlib.metadata import version as package_version
from typing import Any

import httpx
import litellm
from litellm import anthropic_messages
from litellm.types.llms.anthropic_messages.anthropic_response import AnthropicResponseThinkingBlock

_CAPTURED_BODY: object | None = None


class CapturedRequestError(RuntimeError):
    """Stops the request after LiteLLM has serialized the OpenAI body."""


async def _request_body(request: httpx.Request) -> bytes:
    try:
        return request.content
    except httpx.RequestNotRead:
        return await request.aread()


def _install_httpx_capture() -> None:
    original_send = httpx.AsyncClient.send

    async def hooked_send(client: httpx.AsyncClient, request: httpx.Request, *args: Any, **kwargs: Any):
        global _CAPTURED_BODY  # noqa: PLW0603 -- script-level capture state
        if request.url.path.rstrip("/").endswith(("/messages", "/responses")):
            body = await _request_body(request)
            try:
                _CAPTURED_BODY = json.loads(body.decode("utf-8", errors="replace"))
            except json.JSONDecodeError:
                _CAPTURED_BODY = body.decode("utf-8", errors="replace")
            raise CapturedRequestError("captured OpenAI /responses request")
        return await original_send(client, request, *args, **kwargs)

    httpx.AsyncClient.send = hooked_send  # ty: ignore[invalid-assignment] -- intentional request capture hook


async def main() -> None:
    litellm.suppress_debug_info = True  # ty: ignore[invalid-assignment] -- litellm annotates this as Literal[False]
    _install_httpx_capture()

    system = "Some system prompt"
    messages = [
        {
            "role": "user",
            "content": "Some user message",
        },
        {
            "role": "assistant",
            "content": [
                # This simulates a response block from a previous call to `anthropic_messages`
                AnthropicResponseThinkingBlock(thinking="**THINKING MESSAGE**", type="thinking"),
                {
                    "type": "text",
                    "text": "Some other AI message",
                },
            ],
        },
        {
            "role": "user",
            "content": "Another user message",
        },
    ]

    print(f"litellm {package_version('litellm')}")

    print("ANTHROPIC MODEL: WORKS -> thinking block is sent as a thinking block")
    global _CAPTURED_BODY  # noqa: PLW0603 -- script-level capture state
    _CAPTURED_BODY = None
    try:
        await anthropic_messages(
            model="anthropic/claude-sonnet-4-6",
            api_key="sk-capture-only",
            system=system,
            messages=messages,
            max_tokens=1024,
            thinking={"type": "enabled", "budget_tokens": 5000, "summary": "detailed"},
        )
    except Exception:
        if _CAPTURED_BODY is None:
            raise
        print(json.dumps(_CAPTURED_BODY, indent=2, sort_keys=True))

    print("GPT MODEL: DOES NOT WORK -> thinking block is sent as a text block instead of OpenAI reasoning block")
    _CAPTURED_BODY = None
    try:
        await anthropic_messages(
            model="openai/gpt-5.5",
            api_key="sk-capture-only",
            system=system,
            messages=messages,
            max_tokens=1024,
            thinking={"type": "enabled", "budget_tokens": 5000, "summary": "detailed"},
        )
    except Exception:
        if _CAPTURED_BODY is None:
            raise
        print(json.dumps(_CAPTURED_BODY, indent=2, sort_keys=True))


if __name__ == "__main__":
    asyncio.run(main())

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Title OpenAI Responses adapter converts Anthropic thinking blocks to output_text when replaying message history

Summary When using anthropic_messages() with an OpenAI model, prior assistant thinking blocks in the conversation history are serialized into regular output_text blocks instead of reasoning blocks.

This causes hidden/internal thinking content returned by LiteLLM as an Anthropic-style thinking block to be sent back to OpenAI as normal assistant-visible text on the next request.

Expected Behavior An Anthropic-style assistant content block like:

{"type": "thinking", "thinking": "...", "signature": ""}

should either:

  1. Be converted into the appropriate OpenAI Responses reasoning representation, or
  2. Be omitted/redacted from the next OpenAI request (I think this was done in 1.81.x versions)

It should not be sent as regular assistant text.

Actual Behavior LiteLLM serializes the thinking block as:

{
  "type": "output_text",
  "text": "**THINKING MESSAGE**"
}

inside the assistant message in the OpenAI Responses request body.

Minimal Reproduction I created a minimal standalone repro that hooks httpx.AsyncClient.send, calls anthropic_messages() a previous assistant thinking block, and prints the final OpenAI /responses request body before network I/O.

Observed Request Body Relevant part of the captured OpenAI request:

{
  "input": [
    {
      "content": [
        {
          "text": "Some user message",
          "type": "input_text"
        }
      ],
      "role": "user",
      "type": "message"
    },
    {
      "content": [
        {
          "text": "**THINKING MESSAGE**",
          "type": "output_text"
        },
        {
          "text": "Some other AI message",
          "type": "output_text"
        }
      ],
      "role": "assistant",
      "type": "message"
    },
    {
      "content": [
        {
          "text": "Another user message",
          "type": "input_text"
        }
      ],
      "role": "user",
      "type": "message"
    }
  ],
  "instructions": "Some system prompt",
  "max_output_tokens": 1024,
  "model": "gpt-5.5",
  "reasoning": {
    "effort": "medium",
    "summary": "detailed"
  }
}

Impact This can leak reasoning/thinking content into normal conversation history and send it back to OpenAI as assistant-visible text. It may also change model behavior, because the model receives prior private reasoning as if it were ordinary assistant output.

Steps to Reproduce


from __future__ import annotations

import asyncio
import json
from importlib.metadata import version as package_version
from typing import Any

import httpx
import litellm
from litellm import anthropic_messages
from litellm.types.llms.anthropic_messages.anthropic_response import AnthropicResponseThinkingBlock

_CAPTURED_BODY: object | None = None


class CapturedRequestError(RuntimeError):
    """Stops the request after LiteLLM has serialized the OpenAI body."""


async def _request_body(request: httpx.Request) -> bytes:
    try:
        return request.content
    except httpx.RequestNotRead:
        return await request.aread()


def _install_httpx_capture() -> None:
    original_send = httpx.AsyncClient.send

    async def hooked_send(client: httpx.AsyncClient, request: httpx.Request, *args: Any, **kwargs: Any):
        global _CAPTURED_BODY  # noqa: PLW0603 -- script-level capture state
        if request.url.path.rstrip("/").endswith(("/messages", "/responses")):
            body = await _request_body(request)
            try:
                _CAPTURED_BODY = json.loads(body.decode("utf-8", errors="replace"))
            except json.JSONDecodeError:
                _CAPTURED_BODY = body.decode("utf-8", errors="replace")
            raise CapturedRequestError("captured OpenAI /responses request")
        return await original_send(client, request, *args, **kwargs)

    httpx.AsyncClient.send = hooked_send  # ty: ignore[invalid-assignment] -- intentional request capture hook


async def main() -> None:
    litellm.suppress_debug_info = True  # ty: ignore[invalid-assignment] -- litellm annotates this as Literal[False]
    _install_httpx_capture()

    system = "Some system prompt"
    messages = [
        {
            "role": "user",
            "content": "Some user message",
        },
        {
            "role": "assistant",
            "content": [
                # This simulates a response block from a previous call to `anthropic_messages`
                AnthropicResponseThinkingBlock(thinking="**THINKING MESSAGE**", type="thinking"),
                {
                    "type": "text",
                    "text": "Some other AI message",
                },
            ],
        },
        {
            "role": "user",
            "content": "Another user message",
        },
    ]

    print(f"litellm {package_version('litellm')}")

    print("ANTHROPIC MODEL: WORKS -> thinking block is sent as a thinking block")
    global _CAPTURED_BODY  # noqa: PLW0603 -- script-level capture state
    _CAPTURED_BODY = None
    try:
        await anthropic_messages(
            model="anthropic/claude-sonnet-4-6",
            api_key="sk-capture-only",
            system=system,
            messages=messages,
            max_tokens=1024,
            thinking={"type": "enabled", "budget_tokens": 5000, "summary": "detailed"},
        )
    except Exception:
        if _CAPTURED_BODY is None:
            raise
        print(json.dumps(_CAPTURED_BODY, indent=2, sort_keys=True))

    print("GPT MODEL: DOES NOT WORK -> thinking block is sent as a text block instead of OpenAI reasoning block")
    _CAPTURED_BODY = None
    try:
        await anthropic_messages(
            model="openai/gpt-5.5",
            api_key="sk-capture-only",
            system=system,
            messages=messages,
            max_tokens=1024,
            thinking={"type": "enabled", "budget_tokens": 5000, "summary": "detailed"},
        )
    except Exception:
        if _CAPTURED_BODY is None:
            raise
        print(json.dumps(_CAPTURED_BODY, indent=2, sort_keys=True))


if __name__ == "__main__":
    asyncio.run(main())

Relevant log output

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

1.83.13

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The issue can be fixed by modifying the anthropic_messages function to correctly handle Anthropic-style thinking blocks when sending requests to OpenAI models.

Guidance

  • Check the anthropic_messages function to see how it serializes the conversation history into the OpenAI request body.
  • Modify the function to either convert thinking blocks into the appropriate OpenAI reasoning representation or omit them from the request body.
  • Verify that the modified function correctly handles thinking blocks by running the provided minimal reproduction code.
  • Test the fix with different OpenAI models and conversation histories to ensure it works as expected.

Example

# Example of how to modify the anthropic_messages function to omit thinking blocks
def anthropic_messages(...):
    # ...
    messages = [...]
    filtered_messages = []
    for message in messages:
        if message["role"] == "assistant" and any(block["type"] == "thinking" for block in message["content"]):
            # Omit thinking blocks from the request body
            filtered_message = {
                "role": message["role"],
                "content": [block for block in message["content"] if block["type"] != "thinking"],
            }
            filtered_messages.append(filtered_message)
        else:
            filtered_messages.append(message)
    # ...
    return await _send_request(filtered_messages, ...)

Notes

The provided fix is a workaround and may not be the optimal solution. The root cause of the issue is likely due to the difference in how Anthropic and OpenAI models handle thinking blocks. A more permanent fix would require modifying the underlying library to correctly handle these blocks.

Recommendation

Apply the workaround by modifying the anthropic_messages function to omit thinking blocks from the request body. This will prevent the leakage of reasoning content into normal conversation history and ensure that the model receives the correct input.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: `anthropic_messages()` converts thinking blocks to text blocks in requests to OpenAI Responses API [1 pull requests, 1 participants]