litellm - ✅(Solved) Fix [Bug] Multiple system messages sent to non-OpenAI providers when using Responses API with developer role messages [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26879Fetched 2026-05-01 05:34:35
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Timeline (top)
cross-referenced ×2labeled ×2

Error Message

from typing import Literal, Optional, Union, Any from litellm.integrations.custom_logger import CustomLogger from litellm.proxy._types import UserAPIKeyAuth from litellm.caching.caching import DualCache

class MergeSystemMessagesHook(CustomLogger): def _extract_text(self, content: Any) -> str: if isinstance(content, str): return content if isinstance(content, list): parts = [] for block in content: if isinstance(block, dict): t = block.get("type", "") if t in ("text", "input_text"): parts.append(block.get("text", "")) return "\n\n".join(p for p in parts if p) return ""

async def async_pre_call_hook(
    self,
    user_api_key_dict: UserAPIKeyAuth,
    cache: DualCache,
    data: dict,
    call_type: Literal[
        "completion", "text_completion", "embeddings", "image_generation",
        "moderation", "audio_transcription", "pass_through_endpoint",
        "rerank", "mcp_call", "anthropic_messages",
    ],
) -> Optional[Union[Exception, str, dict]]:
    if call_type == "aresponses":
        return self._merge_responses_api(data)
    if call_type == "completion":
        return self._merge_completion_messages(data)
    return data

def _merge_responses_api(self, data: dict) -> dict:
    input_items = data.get("input") or []
    if not isinstance(input_items, list):
        return data
    developer_parts, new_input = [], []
    for item in input_items:
        if isinstance(item, dict) and item.get("role") == "developer":
            text = self._extract_text(item.get("content", ""))
            if text:
                developer_parts.append(text)
        else:
            new_input.append(item)
    if not developer_parts:
        return data
    existing = data.get("instructions", "") or ""
    all_parts = ([existing] if existing else []) + developer_parts
    data["instructions"] = "\n\n".join(all_parts)
    data["input"] = new_input
    return data

def _merge_completion_messages(self, data: dict) -> dict:
    messages = data.get("messages")
    if not messages:
        return data
    system_parts, rest = [], []
    for msg in messages:
        if msg.get("role") == "system":
            text = self._extract_text(msg.get("content", ""))
            if text:
                system_parts.append(text)
        else:
            rest.append(msg)
    if len(system_parts) <= 1:
        return data
    data["messages"] = [{"role": "system", "content": "\n\n".join(system_parts)}] + rest
    return data

merge_system_messages_hook = MergeSystemMessagesHook()

Root Cause

Two separate code paths each produce a system message, and they are never merged:

  1. LiteLLMCompletionResponsesConfig.transform_instructions_to_system_message() in transformation.py converts the instructions field into messages[0] as a system role.
  2. map_developer_role_to_system_role() in base_utils.py iterates over all messages and converts each developer role one-by-one into additional system messages.

The result is a message list like:

system (from instructions)
system (from developer message 1)
system (from developer message 2)
...
user
user

This is invalid for any non-OpenAI provider that enforces a single leading system message.

Affected files (verified on 1.82.6 and 1.83.14, both identical):

  • litellm/responses/litellm_completion_transformation/transformation.pytransform_instructions_to_system_message()
  • litellm/llms/base_llm/base_utils.pymap_developer_role_to_system_role()

Fix Action

Workaround

A custom callback hook can be used as a temporary fix. Place merge_system_messages.py in the proxy working directory:

from typing import Literal, Optional, Union, Any
from litellm.integrations.custom_logger import CustomLogger
from litellm.proxy._types import UserAPIKeyAuth
from litellm.caching.caching import DualCache


class MergeSystemMessagesHook(CustomLogger):
    def _extract_text(self, content: Any) -> str:
        if isinstance(content, str):
            return content
        if isinstance(content, list):
            parts = []
            for block in content:
                if isinstance(block, dict):
                    t = block.get("type", "")
                    if t in ("text", "input_text"):
                        parts.append(block.get("text", ""))
            return "\n\n".join(p for p in parts if p)
        return ""

    async def async_pre_call_hook(
        self,
        user_api_key_dict: UserAPIKeyAuth,
        cache: DualCache,
        data: dict,
        call_type: Literal[
            "completion", "text_completion", "embeddings", "image_generation",
            "moderation", "audio_transcription", "pass_through_endpoint",
            "rerank", "mcp_call", "anthropic_messages",
        ],
    ) -> Optional[Union[Exception, str, dict]]:
        if call_type == "aresponses":
            return self._merge_responses_api(data)
        if call_type == "completion":
            return self._merge_completion_messages(data)
        return data

    def _merge_responses_api(self, data: dict) -> dict:
        input_items = data.get("input") or []
        if not isinstance(input_items, list):
            return data
        developer_parts, new_input = [], []
        for item in input_items:
            if isinstance(item, dict) and item.get("role") == "developer":
                text = self._extract_text(item.get("content", ""))
                if text:
                    developer_parts.append(text)
            else:
                new_input.append(item)
        if not developer_parts:
            return data
        existing = data.get("instructions", "") or ""
        all_parts = ([existing] if existing else []) + developer_parts
        data["instructions"] = "\n\n".join(all_parts)
        data["input"] = new_input
        return data

    def _merge_completion_messages(self, data: dict) -> dict:
        messages = data.get("messages")
        if not messages:
            return data
        system_parts, rest = [], []
        for msg in messages:
            if msg.get("role") == "system":
                text = self._extract_text(msg.get("content", ""))
                if text:
                    system_parts.append(text)
            else:
                rest.append(msg)
        if len(system_parts) <= 1:
            return data
        data["messages"] = [{"role": "system", "content": "\n\n".join(system_parts)}] + rest
        return data


merge_system_messages_hook = MergeSystemMessagesHook()

Then add to config.yaml:

litellm_settings:
  callbacks:
    - merge_system_messages.merge_system_messages_hook

PR fix notes

PR #26884: fix: merge responses developer messages into system prompt

Description (problem / solution / changelog)

Relevant issues

Fixes #26879

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Screenshots / Proof of Fix

uv run pytest tests/test_litellm/responses/litellm_completion_transformation/test_litellm_completion_responses.py -q
# 65 passed in 0.24s

uv run pytest tests/test_litellm/responses/litellm_completion_transformation/test_litellm_completion_responses.py --cov=litellm.responses.litellm_completion_transformation.transformation --cov-report=term-missing -q
# 65 passed in 0.30s
# The new merge helper lines are not listed as missing.

make lint-ruff
# All checks passed!

uv run black --check litellm/responses/litellm_completion_transformation/transformation.py tests/test_litellm/responses/litellm_completion_transformation/test_litellm_completion_responses.py
# 2 files would be left unchanged.

Type

🐛 Bug Fix ✅ Test

Changes

  • Merge Responses API instructions and developer role input items into one leading chat-completion system message before the Responses-to-chat bridge forwards the request.
  • Extract text from string and input_text/text content blocks so Codex-style developer messages are preserved in the merged system prompt.
  • Add regression tests covering instructions plus multiple developer messages, requests without system content, and supported/ignored text block extraction so non-OpenAI providers no longer receive multiple system messages.

Notes

make format-check currently reports unrelated formatting drift in existing enterprise files outside this PR's diff. The two changed files pass Black directly.

Changed files

  • litellm/responses/litellm_completion_transformation/transformation.py (modified, +75/-1)
  • tests/test_litellm/responses/litellm_completion_transformation/test_litellm_completion_responses.py (modified, +92/-0)

PR #26888: Fix Responses API developer/system message merging

Description (problem / solution / changelog)

Summary

  • merge developer-role messages into the leading system message for non-OpenAI providers
  • preserve existing behavior when no developer role is present
  • cover the Responses API bridge path where instructions and developer input previously produced multiple system messages

Fixes #26879

Tests

  • uv run --extra proxy pytest tests/llm_translation/test_base_llm_base_utils.py -q
  • uv run --extra proxy black --check litellm/llms/base_llm/base_utils.py tests/llm_translation/test_base_llm_base_utils.py
  • uv run --extra proxy ruff check litellm/llms/base_llm/base_utils.py tests/llm_translation/test_base_llm_base_utils.py

Changed files

  • litellm/llms/base_llm/base_utils.py (modified, +66/-8)
  • tests/test_litellm/litellm_core_utils/test_base_llm_base_utils.py (added, +115/-0)

Code Example

system (from instructions)
system (from developer message 1)
system (from developer message 2)
...
user
user

---

from typing import Literal, Optional, Union, Any
from litellm.integrations.custom_logger import CustomLogger
from litellm.proxy._types import UserAPIKeyAuth
from litellm.caching.caching import DualCache


class MergeSystemMessagesHook(CustomLogger):
    def _extract_text(self, content: Any) -> str:
        if isinstance(content, str):
            return content
        if isinstance(content, list):
            parts = []
            for block in content:
                if isinstance(block, dict):
                    t = block.get("type", "")
                    if t in ("text", "input_text"):
                        parts.append(block.get("text", ""))
            return "\n\n".join(p for p in parts if p)
        return ""

    async def async_pre_call_hook(
        self,
        user_api_key_dict: UserAPIKeyAuth,
        cache: DualCache,
        data: dict,
        call_type: Literal[
            "completion", "text_completion", "embeddings", "image_generation",
            "moderation", "audio_transcription", "pass_through_endpoint",
            "rerank", "mcp_call", "anthropic_messages",
        ],
    ) -> Optional[Union[Exception, str, dict]]:
        if call_type == "aresponses":
            return self._merge_responses_api(data)
        if call_type == "completion":
            return self._merge_completion_messages(data)
        return data

    def _merge_responses_api(self, data: dict) -> dict:
        input_items = data.get("input") or []
        if not isinstance(input_items, list):
            return data
        developer_parts, new_input = [], []
        for item in input_items:
            if isinstance(item, dict) and item.get("role") == "developer":
                text = self._extract_text(item.get("content", ""))
                if text:
                    developer_parts.append(text)
            else:
                new_input.append(item)
        if not developer_parts:
            return data
        existing = data.get("instructions", "") or ""
        all_parts = ([existing] if existing else []) + developer_parts
        data["instructions"] = "\n\n".join(all_parts)
        data["input"] = new_input
        return data

    def _merge_completion_messages(self, data: dict) -> dict:
        messages = data.get("messages")
        if not messages:
            return data
        system_parts, rest = [], []
        for msg in messages:
            if msg.get("role") == "system":
                text = self._extract_text(msg.get("content", ""))
                if text:
                    system_parts.append(text)
            else:
                rest.append(msg)
        if len(system_parts) <= 1:
            return data
        data["messages"] = [{"role": "system", "content": "\n\n".join(system_parts)}] + rest
        return data


merge_system_messages_hook = MergeSystemMessagesHook()

---

litellm_settings:
  callbacks:
    - merge_system_messages.merge_system_messages_hook

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Bug Description

When using LiteLLM Proxy's /responses endpoint with a client that sends both an instructions field and developer role messages (e.g., Codex CLI), LiteLLM generates multiple system messages in the final chat completion request. Non-OpenAI backends (e.g., Qwen, Llama) that only accept a single system message at the beginning will reject the request with a 400 error: System message must be at the beginning.

Root Cause

Two separate code paths each produce a system message, and they are never merged:

  1. LiteLLMCompletionResponsesConfig.transform_instructions_to_system_message() in transformation.py converts the instructions field into messages[0] as a system role.
  2. map_developer_role_to_system_role() in base_utils.py iterates over all messages and converts each developer role one-by-one into additional system messages.

The result is a message list like:

system (from instructions)
system (from developer message 1)
system (from developer message 2)
...
user
user

This is invalid for any non-OpenAI provider that enforces a single leading system message.

Affected files (verified on 1.82.6 and 1.83.14, both identical):

  • litellm/responses/litellm_completion_transformation/transformation.pytransform_instructions_to_system_message()
  • litellm/llms/base_llm/base_utils.pymap_developer_role_to_system_role()

Steps to Reproduce

  1. Configure LiteLLM Proxy with a non-OpenAI backend (e.g., custom_openai/Qwen3.5-27B)
  2. Use Codex CLI (or any client using the Responses API) to send a request with both instructions and multiple developer role messages in input
  3. Observe the request forwarded to the backend contains multiple system messages
  4. Backend returns 400: System message must be at the beginning.

Expected Behavior

All system-equivalent content (instructions + developer role messages) should be merged into a single system message before being sent to the backend.

Workaround

A custom callback hook can be used as a temporary fix. Place merge_system_messages.py in the proxy working directory:

from typing import Literal, Optional, Union, Any
from litellm.integrations.custom_logger import CustomLogger
from litellm.proxy._types import UserAPIKeyAuth
from litellm.caching.caching import DualCache


class MergeSystemMessagesHook(CustomLogger):
    def _extract_text(self, content: Any) -> str:
        if isinstance(content, str):
            return content
        if isinstance(content, list):
            parts = []
            for block in content:
                if isinstance(block, dict):
                    t = block.get("type", "")
                    if t in ("text", "input_text"):
                        parts.append(block.get("text", ""))
            return "\n\n".join(p for p in parts if p)
        return ""

    async def async_pre_call_hook(
        self,
        user_api_key_dict: UserAPIKeyAuth,
        cache: DualCache,
        data: dict,
        call_type: Literal[
            "completion", "text_completion", "embeddings", "image_generation",
            "moderation", "audio_transcription", "pass_through_endpoint",
            "rerank", "mcp_call", "anthropic_messages",
        ],
    ) -> Optional[Union[Exception, str, dict]]:
        if call_type == "aresponses":
            return self._merge_responses_api(data)
        if call_type == "completion":
            return self._merge_completion_messages(data)
        return data

    def _merge_responses_api(self, data: dict) -> dict:
        input_items = data.get("input") or []
        if not isinstance(input_items, list):
            return data
        developer_parts, new_input = [], []
        for item in input_items:
            if isinstance(item, dict) and item.get("role") == "developer":
                text = self._extract_text(item.get("content", ""))
                if text:
                    developer_parts.append(text)
            else:
                new_input.append(item)
        if not developer_parts:
            return data
        existing = data.get("instructions", "") or ""
        all_parts = ([existing] if existing else []) + developer_parts
        data["instructions"] = "\n\n".join(all_parts)
        data["input"] = new_input
        return data

    def _merge_completion_messages(self, data: dict) -> dict:
        messages = data.get("messages")
        if not messages:
            return data
        system_parts, rest = [], []
        for msg in messages:
            if msg.get("role") == "system":
                text = self._extract_text(msg.get("content", ""))
                if text:
                    system_parts.append(text)
            else:
                rest.append(msg)
        if len(system_parts) <= 1:
            return data
        data["messages"] = [{"role": "system", "content": "\n\n".join(system_parts)}] + rest
        return data


merge_system_messages_hook = MergeSystemMessagesHook()

Then add to config.yaml:

litellm_settings:
  callbacks:
    - merge_system_messages.merge_system_messages_hook

Suggested Fix

In map_developer_role_to_system_role() or transform_responses_api_input_to_messages(), after all role translations are complete, merge consecutive/multiple system messages into one before returning.

Environment

  • LiteLLM version: 1.82.6 (also verified identical behavior on 1.83.14)
  • Backend: custom_openai/Qwen3.5-27B
  • Client: Codex CLI codex-tui/0.122.0
  • Endpoint: /responses

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

No response

What LiteLLM version are you on ?

1.82.6

Twitter / LinkedIn details

No response

extent analysis

TL;DR

Merge consecutive system messages into one before sending the request to the backend to fix the issue with non-OpenAI providers.

Guidance

  • Identify the code paths that produce system messages, specifically transform_instructions_to_system_message() and map_developer_role_to_system_role().
  • Modify these functions to merge multiple system messages into a single message.
  • Alternatively, use the provided custom callback hook MergeSystemMessagesHook as a temporary fix.
  • Verify the fix by checking the request sent to the backend and ensuring it only contains a single system message.

Example

The provided MergeSystemMessagesHook class can be used as an example of how to merge system messages:

def _merge_responses_api(self, data: dict) -> dict:
    # ...
    data["instructions"] = "\n\n".join(all_parts)
    # ...

This code merges the instructions field and developer role messages into a single system message.

Notes

This fix assumes that the issue is caused by the multiple system messages being sent to the backend. If the issue persists after applying this fix, further investigation may be needed.

Recommendation

Apply the workaround using the MergeSystemMessagesHook class, as it provides a temporary fix for the issue. This will allow you to continue using the LiteLLM Proxy with non-OpenAI backends until a permanent fix is implemented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug] Multiple system messages sent to non-OpenAI providers when using Responses API with developer role messages [2 pull requests, 1 participants]