litellm - 💡(How to fix) Fix bug: Responses API drops cache_control on input_text content blocks (inconsistent with tool cache_control)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When using the OpenAI Responses API endpoint (/v1/responses) with cache_control on input_text content blocks, the cache_control field is silently dropped during the Responses → Chat Completions transformation. This means prompt caching directives sent via the Responses API never reach the underlying provider (e.g. Anthropic/Bedrock).

This is inconsistent: cache_control on tools is correctly preserved through the same transformation.

Root Cause

File: litellm/responses/litellm_completion_transformation/transformation.py

In _transform_responses_api_content_to_chat_completion_content() (around line 1292), the input_text handler builds the output dict with only type and text, ignoring any cache_control on the block:

# BUGGY (current):
content_list.append({
    "type": LiteLLMCompletionResponsesConfig._get_chat_completion_request_content_type(
        item.get("type") or "text"
    ),
    "text": text_value,
    # cache_control from item is never read
})

By contrast, tools preserve cache_control explicitly at lines ~1400–1401:

if tool.get("cache_control"):
    chat_completion_tool["cache_control"] = tool.get("cache_control")

Code Example

from litellm.responses.litellm_completion_transformation.transformation import (
    LiteLLMCompletionResponsesConfig,
)

# Case 1: input_text with cache_control — DROPPED
result = LiteLLMCompletionResponsesConfig.transform_responses_api_request_to_chat_completion_request(
    model="claude-sonnet-4-6",
    input=[{
        "type": "message",
        "role": "user",
        "content": [{
            "type": "input_text",
            "text": "Hello",
            "cache_control": {"type": "ephemeral", "ttl": "1h"},
        }],
    }],
    responses_api_request={},
)
messages = result.get("messages", [])
content = messages[-1].get("content", [])
print(content[0].get("cache_control"))  # None — cache_control was silently dropped

# Case 2: tool with cache_control — PRESERVED (works correctly)
result2 = LiteLLMCompletionResponsesConfig.transform_responses_api_request_to_chat_completion_request(
    model="claude-sonnet-4-6",
    input=[],
    responses_api_request={
        "tools": [{"type": "function", "name": "f", "cache_control": {"type": "ephemeral"}}]
    },
)
print(result2["tools"][0].get("cache_control"))  # {"type": "ephemeral"} — preserved correctly

---

# BUGGY (current):
content_list.append({
    "type": LiteLLMCompletionResponsesConfig._get_chat_completion_request_content_type(
        item.get("type") or "text"
    ),
    "text": text_value,
    # cache_control from item is never read
})

---

if tool.get("cache_control"):
    chat_completion_tool["cache_control"] = tool.get("cache_control")

---

# In _transform_responses_api_content_to_chat_completion_content():
text_block = {
    "type": LiteLLMCompletionResponsesConfig._get_chat_completion_request_content_type(
        item.get("type") or "text"
    ),
    "text": text_value,
}
if item.get("cache_control") is not None:
    text_block["cache_control"] = item["cache_control"]
content_list.append(text_block)
RAW_BUFFERClick to expand / collapse

Description

When using the OpenAI Responses API endpoint (/v1/responses) with cache_control on input_text content blocks, the cache_control field is silently dropped during the Responses → Chat Completions transformation. This means prompt caching directives sent via the Responses API never reach the underlying provider (e.g. Anthropic/Bedrock).

This is inconsistent: cache_control on tools is correctly preserved through the same transformation.

Reproduction (unit test — no API call needed)

from litellm.responses.litellm_completion_transformation.transformation import (
    LiteLLMCompletionResponsesConfig,
)

# Case 1: input_text with cache_control — DROPPED
result = LiteLLMCompletionResponsesConfig.transform_responses_api_request_to_chat_completion_request(
    model="claude-sonnet-4-6",
    input=[{
        "type": "message",
        "role": "user",
        "content": [{
            "type": "input_text",
            "text": "Hello",
            "cache_control": {"type": "ephemeral", "ttl": "1h"},
        }],
    }],
    responses_api_request={},
)
messages = result.get("messages", [])
content = messages[-1].get("content", [])
print(content[0].get("cache_control"))  # None — cache_control was silently dropped

# Case 2: tool with cache_control — PRESERVED (works correctly)
result2 = LiteLLMCompletionResponsesConfig.transform_responses_api_request_to_chat_completion_request(
    model="claude-sonnet-4-6",
    input=[],
    responses_api_request={
        "tools": [{"type": "function", "name": "f", "cache_control": {"type": "ephemeral"}}]
    },
)
print(result2["tools"][0].get("cache_control"))  # {"type": "ephemeral"} — preserved correctly

Root Cause

File: litellm/responses/litellm_completion_transformation/transformation.py

In _transform_responses_api_content_to_chat_completion_content() (around line 1292), the input_text handler builds the output dict with only type and text, ignoring any cache_control on the block:

# BUGGY (current):
content_list.append({
    "type": LiteLLMCompletionResponsesConfig._get_chat_completion_request_content_type(
        item.get("type") or "text"
    ),
    "text": text_value,
    # cache_control from item is never read
})

By contrast, tools preserve cache_control explicitly at lines ~1400–1401:

if tool.get("cache_control"):
    chat_completion_tool["cache_control"] = tool.get("cache_control")

Expected Behavior

cache_control on input_text content blocks should be forwarded to the transformed chat completion message content block, the same way the direct Anthropic adapter does in litellm/litellm_core_utils/prompt_templates/factory.py via add_cache_control_to_content().

Suggested Fix

# In _transform_responses_api_content_to_chat_completion_content():
text_block = {
    "type": LiteLLMCompletionResponsesConfig._get_chat_completion_request_content_type(
        item.get("type") or "text"
    ),
    "text": text_value,
}
if item.get("cache_control") is not None:
    text_block["cache_control"] = item["cache_control"]
content_list.append(text_block)

The same fix should be applied to the input_image handler (lines ~1284–1291), which has the same omission.

Environment

  • LiteLLM version: 1.83.14
  • Provider: Anthropic (via AWS Bedrock)
  • Verified against: v1.83.14-stable tag

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix bug: Responses API drops cache_control on input_text content blocks (inconsistent with tool cache_control)