litellm - ✅(Solved) Fix [Bug] Multiple system messages sent to non-OpenAI providers when using Responses API with developer role messages [2 pull requests, 1 participants]

litellm2026-04-30 10:13:38

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26879•Fetched 2026-05-01 05:34:35

View on GitHub

Comments

Participants

Timeline

Reactions

Author

goldgengilgamesh-cloud

Participants

goldgengilgamesh-cloud

Timeline (top)

cross-referenced ×2labeled ×2

Error Message

from typing import Literal, Optional, Union, Any from litellm.integrations.custom_logger import CustomLogger from litellm.proxy._types import UserAPIKeyAuth from litellm.caching.caching import DualCache

class MergeSystemMessagesHook(CustomLogger): def _extract_text(self, content: Any) -> str: if isinstance(content, str): return content if isinstance(content, list): parts = [] for block in content: if isinstance(block, dict): t = block.get("type", "") if t in ("text", "input_text"): parts.append(block.get("text", "")) return "\n\n".join(p for p in parts if p) return ""

async def async_pre_call_hook(
    self,
    user_api_key_dict: UserAPIKeyAuth,
    cache: DualCache,
    data: dict,
    call_type: Literal[
        "completion", "text_completion", "embeddings", "image_generation",
        "moderation", "audio_transcription", "pass_through_endpoint",
        "rerank", "mcp_call", "anthropic_messages",
    ],
) -> Optional[Union[Exception, str, dict]]:
    if call_type == "aresponses":
        return self._merge_responses_api(data)
    if call_type == "completion":
        return self._merge_completion_messages(data)
    return data

def _merge_responses_api(self, data: dict) -> dict:
    input_items = data.get("input") or []
    if not isinstance(input_items, list):
        return data
    developer_parts, new_input = [], []
    for item in input_items:
        if isinstance(item, dict) and item.get("role") == "developer":
            text = self._extract_text(item.get("content", ""))
            if text:
                developer_parts.append(text)
        else:
            new_input.append(item)
    if not developer_parts:
        return data
    existing = data.get("instructions", "") or ""
    all_parts = ([existing] if existing else []) + developer_parts
    data["instructions"] = "\n\n".join(all_parts)
    data["input"] = new_input
    return data

def _merge_completion_messages(self, data: dict) -> dict:
    messages = data.get("messages")
    if not messages:
        return data
    system_parts, rest = [], []
    for msg in messages:
        if msg.get("role") == "system":
            text = self._extract_text(msg.get("content", ""))
            if text:
                system_parts.append(text)
        else:
            rest.append(msg)
    if len(system_parts) <= 1:
        return data
    data["messages"] = [{"role": "system", "content": "\n\n".join(system_parts)}] + rest
    return data

merge_system_messages_hook = MergeSystemMessagesHook()

Root Cause

Two separate code paths each produce a system message, and they are never merged:

LiteLLMCompletionResponsesConfig.transform_instructions_to_system_message() in transformation.py converts the instructions field into messages[0] as a system role.
map_developer_role_to_system_role() in base_utils.py iterates over all messages and converts each developer role one-by-one into additional system messages.

The result is a message list like:

system (from instructions)
system (from developer message 1)
system (from developer message 2)
...
user
user

This is invalid for any non-OpenAI provider that enforces a single leading system message.

Affected files (verified on 1.82.6 and 1.83.14, both identical):

litellm/responses/litellm_completion_transformation/transformation.py — transform_instructions_to_system_message()
litellm/llms/base_llm/base_utils.py — map_developer_role_to_system_role()

Fix Action

Workaround

A custom callback hook can be used as a temporary fix. Place merge_system_messages.py in the proxy working directory:

from typing import Literal, Optional, Union, Any
from litellm.integrations.custom_logger import CustomLogger
from litellm.proxy._types import UserAPIKeyAuth
from litellm.caching.caching import DualCache


class MergeSystemMessagesHook(CustomLogger):
    def _extract_text(self, content: Any) -> str:
        if isinstance(content, str):
            return content
        if isinstance(content, list):
            parts = []
            for block in content:
                if isinstance(block, dict):
                    t = block.get("type", "")
                    if t in ("text", "input_text"):
                        parts.append(block.get("text", ""))
            return "\n\n".join(p for p in parts if p)
        return ""

    async def async_pre_call_hook(
        self,
        user_api_key_dict: UserAPIKeyAuth,
        cache: DualCache,
        data: dict,
        call_type: Literal[
            "completion", "text_completion", "embeddings", "image_generation",
            "moderation", "audio_transcription", "pass_through_endpoint",
            "rerank", "mcp_call", "anthropic_messages",
        ],
    ) -> Optional[Union[Exception, str, dict]]:
        if call_type == "aresponses":
            return self._merge_responses_api(data)
        if call_type == "completion":
            return self._merge_completion_messages(data)
        return data

    def _merge_responses_api(self, data: dict) -> dict:
        input_items = data.get("input") or []
        if not isinstance(input_items, list):
            return data
        developer_parts, new_input = [], []
        for item in input_items:
            if isinstance(item, dict) and item.get("role") == "developer":
                text = self._extract_text(item.get("content", ""))
                if text:
                    developer_parts.append(text)
            else:
                new_input.append(item)
        if not developer_parts:
            return data
        existing = data.get("instructions", "") or ""
        all_parts = ([existing] if existing else []) + developer_parts
        data["instructions"] = "\n\n".join(all_parts)
        data["input"] = new_input
        return data

    def _merge_completion_messages(self, data: dict) -> dict:
        messages = data.get("messages")
        if not messages:
            return data
        system_parts, rest = [], []
        for msg in messages:
            if msg.get("role") == "system":
                text = self._extract_text(msg.get("content", ""))
                if text:
                    system_parts.append(text)
            else:
                rest.append(msg)
        if len(system_parts) <= 1:
            return data
        data["messages"] = [{"role": "system", "content": "\n\n".join(system_parts)}] + rest
        return data


merge_system_messages_hook = MergeSystemMessagesHook()

Then add to config.yaml:

litellm_settings:
  callbacks:
    - merge_system_messages.merge_system_messages_hook

PR fix notes

PR #26884: fix: merge responses developer messages into system prompt

Repository: BerriAI/litellm
Author: samrusani
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/26884

Description (problem / solution / changelog)

Relevant issues

Fixes #26879

Pre-Submission checklist

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Screenshots / Proof of Fix

uv run pytest tests/test_litellm/responses/litellm_completion_transformation/test_litellm_completion_responses.py -q
# 65 passed in 0.24s

uv run pytest tests/test_litellm/responses/litellm_completion_transformation/test_litellm_completion_responses.py --cov=litellm.responses.litellm_completion_transformation.transformation --cov-report=term-missing -q
# 65 passed in 0.30s
# The new merge helper lines are not listed as missing.

make lint-ruff
# All checks passed!

uv run black --check litellm/responses/litellm_completion_transformation/transformation.py tests/test_litellm/responses/litellm_completion_transformation/test_litellm_completion_responses.py
# 2 files would be left unchanged.

Type

🐛 Bug Fix ✅ Test

Changes

Merge Responses API instructions and developer role input items into one leading chat-completion system message before the Responses-to-chat bridge forwards the request.
Extract text from string and input_text/text content blocks so Codex-style developer messages are preserved in the merged system prompt.
Add regression tests covering instructions plus multiple developer messages, requests without system content, and supported/ignored text block extraction so non-OpenAI providers no longer receive multiple system messages.

Notes

make format-check currently reports unrelated formatting drift in existing enterprise files outside this PR's diff. The two changed files pass Black directly.

Changed files

litellm/responses/litellm_completion_transformation/transformation.py (modified, +75/-1)
tests/test_litellm/responses/litellm_completion_transformation/test_litellm_completion_responses.py (modified, +92/-0)

PR #26888: Fix Responses API developer/system message merging

Repository: BerriAI/litellm
Author: Genmin
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/26888

Description (problem / solution / changelog)

Summary

merge developer-role messages into the leading system message for non-OpenAI providers
preserve existing behavior when no developer role is present
cover the Responses API bridge path where instructions and developer input previously produced multiple system messages

Fixes #26879

Tests

uv run --extra proxy pytest tests/llm_translation/test_base_llm_base_utils.py -q
uv run --extra proxy black --check litellm/llms/base_llm/base_utils.py tests/llm_translation/test_base_llm_base_utils.py
uv run --extra proxy ruff check litellm/llms/base_llm/base_utils.py tests/llm_translation/test_base_llm_base_utils.py

Changed files

litellm/llms/base_llm/base_utils.py (modified, +66/-8)
tests/test_litellm/litellm_core_utils/test_base_llm_base_utils.py (added, +115/-0)

Code Example

system (from instructions)
system (from developer message 1)
system (from developer message 2)
...
user
user

---

from typing import Literal, Optional, Union, Any
from litellm.integrations.custom_logger import CustomLogger
from litellm.proxy._types import UserAPIKeyAuth
from litellm.caching.caching import DualCache


class MergeSystemMessagesHook(CustomLogger):
    def _extract_text(self, content: Any) -> str:
        if isinstance(content, str):
            return content
        if isinstance(content, list):
            parts = []
            for block in content:
                if isinstance(block, dict):
                    t = block.get("type", "")
                    if t in ("text", "input_text"):
                        parts.append(block.get("text", ""))
            return "\n\n".join(p for p in parts if p)
        return ""

    async def async_pre_call_hook(
        self,
        user_api_key_dict: UserAPIKeyAuth,
        cache: DualCache,
        data: dict,
        call_type: Literal[
            "completion", "text_completion", "embeddings", "image_generation",
            "moderation", "audio_transcription", "pass_through_endpoint",
            "rerank", "mcp_call", "anthropic_messages",
        ],
    ) -> Optional[Union[Exception, str, dict]]:
        if call_type == "aresponses":
            return self._merge_responses_api(data)
        if call_type == "completion":
            return self._merge_completion_messages(data)
        return data

    def _merge_responses_api(self, data: dict) -> dict:
        input_items = data.get("input") or []
        if not isinstance(input_items, list):
            return data
        developer_parts, new_input = [], []
        for item in input_items:
            if isinstance(item, dict) and item.get("role") == "developer":
                text = self._extract_text(item.get("content", ""))
                if text:
                    developer_parts.append(text)
            else:
                new_input.append(item)
        if not developer_parts:
            return data
        existing = data.get("instructions", "") or ""
        all_parts = ([existing] if existing else []) + developer_parts
        data["instructions"] = "\n\n".join(all_parts)
        data["input"] = new_input
        return data

    def _merge_completion_messages(self, data: dict) -> dict:
        messages = data.get("messages")
        if not messages:
            return data
        system_parts, rest = [], []
        for msg in messages:
            if msg.get("role") == "system":
                text = self._extract_text(msg.get("content", ""))
                if text:
                    system_parts.append(text)
            else:
                rest.append(msg)
        if len(system_parts) <= 1:
            return data
        data["messages"] = [{"role": "system", "content": "\n\n".join(system_parts)}] + rest
        return data


merge_system_messages_hook = MergeSystemMessagesHook()

---

litellm_settings:
  callbacks:
    - merge_system_messages.merge_system_messages_hook

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Bug Description

When using LiteLLM Proxy's /responses endpoint with a client that sends both an instructions field and developer role messages (e.g., Codex CLI), LiteLLM generates multiple system messages in the final chat completion request. Non-OpenAI backends (e.g., Qwen, Llama) that only accept a single system message at the beginning will reject the request with a 400 error: System message must be at the beginning.

Root Cause

Two separate code paths each produce a system message, and they are never merged:

LiteLLMCompletionResponsesConfig.transform_instructions_to_system_message() in transformation.py converts the instructions field into messages[0] as a system role.
map_developer_role_to_system_role() in base_utils.py iterates over all messages and converts each developer role one-by-one into additional system messages.

The result is a message list like:

system (from instructions)
system (from developer message 1)
system (from developer message 2)
...
user
user

This is invalid for any non-OpenAI provider that enforces a single leading system message.

Affected files (verified on 1.82.6 and 1.83.14, both identical):

litellm/responses/litellm_completion_transformation/transformation.py — transform_instructions_to_system_message()
litellm/llms/base_llm/base_utils.py — map_developer_role_to_system_role()

Steps to Reproduce

Configure LiteLLM Proxy with a non-OpenAI backend (e.g., custom_openai/Qwen3.5-27B)
Use Codex CLI (or any client using the Responses API) to send a request with both instructions and multiple developer role messages in input
Observe the request forwarded to the backend contains multiple system messages
Backend returns 400: System message must be at the beginning.

Expected Behavior

All system-equivalent content (instructions + developer role messages) should be merged into a single system message before being sent to the backend.

Workaround

A custom callback hook can be used as a temporary fix. Place merge_system_messages.py in the proxy working directory:

from typing import Literal, Optional, Union, Any
from litellm.integrations.custom_logger import CustomLogger
from litellm.proxy._types import UserAPIKeyAuth
from litellm.caching.caching import DualCache


class MergeSystemMessagesHook(CustomLogger):
    def _extract_text(self, content: Any) -> str:
        if isinstance(content, str):
            return content
        if isinstance(content, list):
            parts = []
            for block in content:
                if isinstance(block, dict):
                    t = block.get("type", "")
                    if t in ("text", "input_text"):
                        parts.append(block.get("text", ""))
            return "\n\n".join(p for p in parts if p)
        return ""

    async def async_pre_call_hook(
        self,
        user_api_key_dict: UserAPIKeyAuth,
        cache: DualCache,
        data: dict,
        call_type: Literal[
            "completion", "text_completion", "embeddings", "image_generation",
            "moderation", "audio_transcription", "pass_through_endpoint",
            "rerank", "mcp_call", "anthropic_messages",
        ],
    ) -> Optional[Union[Exception, str, dict]]:
        if call_type == "aresponses":
            return self._merge_responses_api(data)
        if call_type == "completion":
            return self._merge_completion_messages(data)
        return data

    def _merge_responses_api(self, data: dict) -> dict:
        input_items = data.get("input") or []
        if not isinstance(input_items, list):
            return data
        developer_parts, new_input = [], []
        for item in input_items:
            if isinstance(item, dict) and item.get("role") == "developer":
                text = self._extract_text(item.get("content", ""))
                if text:
                    developer_parts.append(text)
            else:
                new_input.append(item)
        if not developer_parts:
            return data
        existing = data.get("instructions", "") or ""
        all_parts = ([existing] if existing else []) + developer_parts
        data["instructions"] = "\n\n".join(all_parts)
        data["input"] = new_input
        return data

    def _merge_completion_messages(self, data: dict) -> dict:
        messages = data.get("messages")
        if not messages:
            return data
        system_parts, rest = [], []
        for msg in messages:
            if msg.get("role") == "system":
                text = self._extract_text(msg.get("content", ""))
                if text:
                    system_parts.append(text)
            else:
                rest.append(msg)
        if len(system_parts) <= 1:
            return data
        data["messages"] = [{"role": "system", "content": "\n\n".join(system_parts)}] + rest
        return data


merge_system_messages_hook = MergeSystemMessagesHook()

Then add to config.yaml:

litellm_settings:
  callbacks:
    - merge_system_messages.merge_system_messages_hook

Suggested Fix

In map_developer_role_to_system_role() or transform_responses_api_input_to_messages(), after all role translations are complete, merge consecutive/multiple system messages into one before returning.

Environment

LiteLLM version: 1.82.6 (also verified identical behavior on 1.83.14)
Backend: custom_openai/Qwen3.5-27B
Client: Codex CLI codex-tui/0.122.0
Endpoint: /responses

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

No response

What LiteLLM version are you on ?

1.82.6

Twitter / LinkedIn details

No response

extent analysis

TL;DR

Merge consecutive system messages into one before sending the request to the backend to fix the issue with non-OpenAI providers.

Guidance

Identify the code paths that produce system messages, specifically transform_instructions_to_system_message() and map_developer_role_to_system_role().
Modify these functions to merge multiple system messages into a single message.
Alternatively, use the provided custom callback hook MergeSystemMessagesHook as a temporary fix.
Verify the fix by checking the request sent to the backend and ensuring it only contains a single system message.

Example

The provided MergeSystemMessagesHook class can be used as an example of how to merge system messages:

def _merge_responses_api(self, data: dict) -> dict:
    # ...
    data["instructions"] = "\n\n".join(all_parts)
    # ...

This code merges the instructions field and developer role messages into a single system message.

Notes

This fix assumes that the issue is caused by the multiple system messages being sent to the backend. If the issue persists after applying this fix, further investigation may be needed.

Recommendation

Apply the workaround using the MergeSystemMessagesHook class, as it provides a temporary fix for the issue. This will allow you to continue using the LiteLLM Proxy with non-OpenAI backends until a permanent fix is implemented.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #database connection #vector store #embedding generation #cache error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug] Multiple system messages sent to non-OpenAI providers when using Responses API with developer role messages [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

PR fix notes

PR #26884: fix: merge responses developer messages into system prompt

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Screenshots / Proof of Fix

Type

Changes

Notes

Changed files

PR #26888: Fix Responses API developer/system message merging

Description (problem / solution / changelog)

Summary

Tests

Changed files

Code Example

Check for existing issues

What happened?

Bug Description

Root Cause

Steps to Reproduce

Expected Behavior

Workaround

Suggested Fix

Environment

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING