One of the following: - **(A)** LiteLLM normalizes/repairs the `contents` sequence before forwarding to Gemini (e.g. merges consecutive same-role turns, reorders orphan `functionResponse` parts), **or** - **(B)** LiteLLM returns a more actionable error message indicating *which* turn index violates the ordering rule, instead of forwarding Gemini's generic message.

litellm - 💡(How to fix) Fix [Bug]: Gemini CLI error when calling to LiteLLM - "Please ensure that function response turn comes immediately after a function call turn" [1 participants]

litellm2026-04-29 06:42:34

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26755•Fetched 2026-04-30 06:20:20

View on GitHub

Comments

Participants

Timeline

Reactions

Author

boatkabig

Participants

boatkabig

Timeline (top)

labeled ×3

Error Message

{"message": "Exception in ASGI application\n", "level": "ERROR", "timestamp": "2026-04-29T06:25:29.222656", "component": "uvicorn.error", "logger": "h11_impl.py:408", "stacktrace": "Traceback (most recent call last): File "/app/.venv/lib/python3.13/site-packages/litellm/llms/custom_httpx/llm_http_handler.py", line 10232, in async_generate_content_handler response = await async_httpx_client.post(...) ... litellm.llms.custom_httpx.http_handler.MaskedHTTPStatusError: Client error '400 Bad Request' for url 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent'

During handling of the above exception, another exception occurred:

File "/app/.venv/lib/python3.13/site-packages/litellm/google_genai/main.py", line 263, in agenerate_content response = await init_response litellm.llms.base_llm.chat.transformation.BaseLLMException: { "error": { "code": 400, "message": "Please ensure that function response turn comes immediately after a function call turn.", "status": "INVALID_ARGUMENT" } }

During handling of the above exception, another exception occurred:

File "/app/.venv/lib/python3.13/site-packages/litellm/proxy/google_endpoints/endpoints.py", line 75, in google_generate_content response = await llm_router.agenerate_content(**data) ... File "/app/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 1416, in exception_type raise BadRequestError(...) litellm.exceptions.BadRequestError: litellm.BadRequestError: GeminiException BadRequestError - { "error": { "code": 400, "message": "Please ensure that function response turn comes immediately after a function call turn.", "status": "INVALID_ARGUMENT" } } . Received Model Group=gemini-2.5-flash-lite Available Model Group Fallbacks=None"}

Fix Action

Fix / Workaround

In the Gemini provider transformation layer (likely under litellm/llms/vertex_ai/google_genai/ or the generate-content transformation path), add a normalization step before dispatching to the upstream API:

This would bring the Gemini provider in line with the more forgiving behavior already seen in the OpenAI / Anthropic providers, where conversation history is normalized before dispatch.

Code Example

"Please ensure that function response turn comes immediately after a function call turn."

---

curl --location 'http://localhost:4000/v1beta/models/gemini-2.5-flash-lite:generateContent' \
  --header 'Content-Type: application/json' \
  --header 'x-goog-api-key: sk-123456' \
  --header 'x-gemini-api-privileged-user-id: xxx-xxx-xxx-xxx-xxx' \
  --data-raw '{
    "contents": [
      {
        "role": "user",
        "parts": [
          {
            "functionResponse": {
              "id": "call_001",
              "name": "read_file",
              "response": {
                "output": "def hello():\n    return \"world\""
              }
            }
          }
        ]
      },
      {
        "role": "model",
        "parts": [
          {
            "text": "Analyzing the code...",
            "thought": true
          },
          {
            "functionCall": {
              "id": "call_002",
              "name": "read_file",
              "args": {
                "file_path": "src/main.py"
              }
            }
          }
        ]
      },
      {
        "role": "user",
        "parts": [
          {
            "functionResponse": {
              "id": "call_002",
              "name": "read_file",
              "response": {
                "output": "print(\"hello\")"
              }
            }
          }
        ]
      },
      {
        "role": "user",
        "parts": [
          {
            "text": "create plan to review"
          }
        ]
      }
    ],
    "systemInstruction": {
      "role": "user",
      "parts": [
        {
          "text": "You are a Task Routing AI. Return a JSON object with complexity_reasoning (string) and complexity_score (integer 1-100)."
        }
      ]
    },
    "generationConfig": {
      "temperature": 0,
      "topP": 1,
      "maxOutputTokens": 1024,
      "responseMimeType": "application/json",
      "responseJsonSchema": {
        "type": "OBJECT",
        "properties": {
          "complexity_reasoning": { "type": "STRING" },
          "complexity_score": { "type": "INTEGER" }
        },
        "required": ["complexity_reasoning", "complexity_score"]
      },
      "thinkingConfig": {
        "thinkingBudget": 512
      }
    }
  }'

---

import os
import requests

PROXY_URL = "http://localhost:4000/v1beta/models/gemini-2.5-flash-lite:generateContent"

headers = {
    "Content-Type": "application/json",
    "x-goog-api-key": os.environ["LITELLM_API_KEY"],
    "x-gemini-api-privileged-user-id": os.environ["PRIVILEGED_USER_ID"],
}

payload = {
    "contents": [
        {
            "role": "user",
            "parts": [
                {
                    "functionResponse": {
                        "id": "call_001",
                        "name": "read_file",
                        "response": {"output": "def hello():\n    return \"world\""},
                    }
                }
            ],
        },
        {
            "role": "model",
            "parts": [
                {"text": "Analyzing the code...", "thought": True},
                {
                    "functionCall": {
                        "id": "call_002",
                        "name": "read_file",
                        "args": {"file_path": "src/main.py"},
                    }
                },
            ],
        },
        {
            "role": "user",
            "parts": [
                {
                    "functionResponse": {
                        "id": "call_002",
                        "name": "read_file",
                        "response": {"output": "print(\"hello\")"},
                    }
                }
            ],
        },
        {
            "role": "user",
            "parts": [{"text": "create plan to review"}],
        },
    ],
    "systemInstruction": {
        "role": "user",
        "parts": [
            {
                "text": (
                    "You are a Task Routing AI. Return a JSON object with "
                    "complexity_reasoning (string) and complexity_score (integer 1-100)."
                )
            }
        ],
    },
    "generationConfig": {
        "temperature": 0,
        "topP": 1,
        "maxOutputTokens": 1024,
        "responseMimeType": "application/json",
        "responseJsonSchema": {
            "type": "OBJECT",
            "properties": {
                "complexity_reasoning": {"type": "STRING"},
                "complexity_score": {"type": "INTEGER"},
            },
            "required": ["complexity_reasoning", "complexity_score"],
        },
        "thinkingConfig": {"thinkingBudget": 512},
    },
}

response = requests.post(PROXY_URL, headers=headers, json=payload, timeout=30)
print("Status:", response.status_code)
print("Body:", response.text)

---

Status: 400
Body: {"error":{"code":400,"message":"Please ensure that function response turn comes immediately after a function call turn.","status":"INVALID_ARGUMENT"}}

---

{"message": "Exception in ASGI application\n", "level": "ERROR", "timestamp": "2026-04-29T06:25:29.222656", "component": "uvicorn.error", "logger": "h11_impl.py:408", "stacktrace": "Traceback (most recent call last):
  File \"/app/.venv/lib/python3.13/site-packages/litellm/llms/custom_httpx/llm_http_handler.py\", line 10232, in async_generate_content_handler
    response = await async_httpx_client.post(...)
  ...
litellm.llms.custom_httpx.http_handler.MaskedHTTPStatusError: Client error '400 Bad Request' for url 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent'

During handling of the above exception, another exception occurred:

  File \"/app/.venv/lib/python3.13/site-packages/litellm/google_genai/main.py\", line 263, in agenerate_content
    response = await init_response
litellm.llms.base_llm.chat.transformation.BaseLLMException: {
  \"error\": {
    \"code\": 400,
    \"message\": \"Please ensure that function response turn comes immediately after a function call turn.\",
    \"status\": \"INVALID_ARGUMENT\"
  }
}

During handling of the above exception, another exception occurred:

  File \"/app/.venv/lib/python3.13/site-packages/litellm/proxy/google_endpoints/endpoints.py\", line 75, in google_generate_content
    response = await llm_router.agenerate_content(**data)
  ...
  File \"/app/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 1416, in exception_type
    raise BadRequestError(...)
litellm.exceptions.BadRequestError: litellm.BadRequestError: GeminiException BadRequestError - {
  \"error\": {
    \"code\": 400,
    \"message\": \"Please ensure that function response turn comes immediately after a function call turn.\",
    \"status\": \"INVALID_ARGUMENT\"
  }
}
. Received Model Group=gemini-2.5-flash-lite
Available Model Group Fallbacks=None"}

RAW_BUFFERClick to expand / collapse

[Bug]: Gemini `generateContent` returns 400 when payload violates `functionCall` → `functionResponse` ordering

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When calling Gemini models through the LiteLLM proxy's /v1beta/models/{model}:generateContent endpoint, requests fail with a 400 Bad Request if the contents array does not strictly follow Gemini's tool-calling sequence rules.

Gemini requires that:

A functionResponse part must appear in a user turn that comes immediately after a model turn containing the matching functionCall.
Two consecutive turns with the same role (e.g. two user turns in a row) are not permitted when one of them carries a functionResponse.

When these rules are violated — which can happen organically in agentic workflows where tool history accumulates across multiple turns — Gemini rejects the request with:

"Please ensure that function response turn comes immediately after a function call turn."

LiteLLM forwards this 400 directly to the client without normalizing the message sequence, so any client that builds Gemini-formatted payloads incrementally (e.g. agent loops with interleaved reasoning) will hit this error.

Expected behavior

One of the following:

(A) LiteLLM normalizes/repairs the contents sequence before forwarding to Gemini (e.g. merges consecutive same-role turns, reorders orphan functionResponse parts), or
(B) LiteLLM returns a more actionable error message indicating which turn index violates the ordering rule, instead of forwarding Gemini's generic message.

Actual behavior

The proxy forwards the request as-is and surfaces a 400 from Gemini wrapped in litellm.BadRequestError, with no indication of which turn caused the violation.

Steps to Reproduce

1. Prerequisites

LiteLLM proxy v1.83.10 running on http://localhost:4000
gemini-2.5-flash-lite configured as a model in config.yaml
A valid LiteLLM virtual key

2. Reproduction payload

The payload below intentionally violates Gemini's ordering rules in two ways:

The first turn contains a functionResponse with no preceding functionCall.
The last two turns are both role: "user" (a functionResponse followed by a plain text message).

3. cURL

curl --location 'http://localhost:4000/v1beta/models/gemini-2.5-flash-lite:generateContent' \
  --header 'Content-Type: application/json' \
  --header 'x-goog-api-key: sk-123456' \
  --header 'x-gemini-api-privileged-user-id: xxx-xxx-xxx-xxx-xxx' \
  --data-raw '{
    "contents": [
      {
        "role": "user",
        "parts": [
          {
            "functionResponse": {
              "id": "call_001",
              "name": "read_file",
              "response": {
                "output": "def hello():\n    return \"world\""
              }
            }
          }
        ]
      },
      {
        "role": "model",
        "parts": [
          {
            "text": "Analyzing the code...",
            "thought": true
          },
          {
            "functionCall": {
              "id": "call_002",
              "name": "read_file",
              "args": {
                "file_path": "src/main.py"
              }
            }
          }
        ]
      },
      {
        "role": "user",
        "parts": [
          {
            "functionResponse": {
              "id": "call_002",
              "name": "read_file",
              "response": {
                "output": "print(\"hello\")"
              }
            }
          }
        ]
      },
      {
        "role": "user",
        "parts": [
          {
            "text": "create plan to review"
          }
        ]
      }
    ],
    "systemInstruction": {
      "role": "user",
      "parts": [
        {
          "text": "You are a Task Routing AI. Return a JSON object with complexity_reasoning (string) and complexity_score (integer 1-100)."
        }
      ]
    },
    "generationConfig": {
      "temperature": 0,
      "topP": 1,
      "maxOutputTokens": 1024,
      "responseMimeType": "application/json",
      "responseJsonSchema": {
        "type": "OBJECT",
        "properties": {
          "complexity_reasoning": { "type": "STRING" },
          "complexity_score": { "type": "INTEGER" }
        },
        "required": ["complexity_reasoning", "complexity_score"]
      },
      "thinkingConfig": {
        "thinkingBudget": 512
      }
    }
  }'

4. Python (requests)

import os
import requests

PROXY_URL = "http://localhost:4000/v1beta/models/gemini-2.5-flash-lite:generateContent"

headers = {
    "Content-Type": "application/json",
    "x-goog-api-key": os.environ["LITELLM_API_KEY"],
    "x-gemini-api-privileged-user-id": os.environ["PRIVILEGED_USER_ID"],
}

payload = {
    "contents": [
        {
            "role": "user",
            "parts": [
                {
                    "functionResponse": {
                        "id": "call_001",
                        "name": "read_file",
                        "response": {"output": "def hello():\n    return \"world\""},
                    }
                }
            ],
        },
        {
            "role": "model",
            "parts": [
                {"text": "Analyzing the code...", "thought": True},
                {
                    "functionCall": {
                        "id": "call_002",
                        "name": "read_file",
                        "args": {"file_path": "src/main.py"},
                    }
                },
            ],
        },
        {
            "role": "user",
            "parts": [
                {
                    "functionResponse": {
                        "id": "call_002",
                        "name": "read_file",
                        "response": {"output": "print(\"hello\")"},
                    }
                }
            ],
        },
        {
            "role": "user",
            "parts": [{"text": "create plan to review"}],
        },
    ],
    "systemInstruction": {
        "role": "user",
        "parts": [
            {
                "text": (
                    "You are a Task Routing AI. Return a JSON object with "
                    "complexity_reasoning (string) and complexity_score (integer 1-100)."
                )
            }
        ],
    },
    "generationConfig": {
        "temperature": 0,
        "topP": 1,
        "maxOutputTokens": 1024,
        "responseMimeType": "application/json",
        "responseJsonSchema": {
            "type": "OBJECT",
            "properties": {
                "complexity_reasoning": {"type": "STRING"},
                "complexity_score": {"type": "INTEGER"},
            },
            "required": ["complexity_reasoning", "complexity_score"],
        },
        "thinkingConfig": {"thinkingBudget": 512},
    },
}

response = requests.post(PROXY_URL, headers=headers, json=payload, timeout=30)
print("Status:", response.status_code)
print("Body:", response.text)

5. Observed result

Status: 400
Body: {"error":{"code":400,"message":"Please ensure that function response turn comes immediately after a function call turn.","status":"INVALID_ARGUMENT"}}

Relevant log output

{"message": "Exception in ASGI application\n", "level": "ERROR", "timestamp": "2026-04-29T06:25:29.222656", "component": "uvicorn.error", "logger": "h11_impl.py:408", "stacktrace": "Traceback (most recent call last):
  File \"/app/.venv/lib/python3.13/site-packages/litellm/llms/custom_httpx/llm_http_handler.py\", line 10232, in async_generate_content_handler
    response = await async_httpx_client.post(...)
  ...
litellm.llms.custom_httpx.http_handler.MaskedHTTPStatusError: Client error '400 Bad Request' for url 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent'

During handling of the above exception, another exception occurred:

  File \"/app/.venv/lib/python3.13/site-packages/litellm/google_genai/main.py\", line 263, in agenerate_content
    response = await init_response
litellm.llms.base_llm.chat.transformation.BaseLLMException: {
  \"error\": {
    \"code\": 400,
    \"message\": \"Please ensure that function response turn comes immediately after a function call turn.\",
    \"status\": \"INVALID_ARGUMENT\"
  }
}

During handling of the above exception, another exception occurred:

  File \"/app/.venv/lib/python3.13/site-packages/litellm/proxy/google_endpoints/endpoints.py\", line 75, in google_generate_content
    response = await llm_router.agenerate_content(**data)
  ...
  File \"/app/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 1416, in exception_type
    raise BadRequestError(...)
litellm.exceptions.BadRequestError: litellm.BadRequestError: GeminiException BadRequestError - {
  \"error\": {
    \"code\": 400,
    \"message\": \"Please ensure that function response turn comes immediately after a function call turn.\",
    \"status\": \"INVALID_ARGUMENT\"
  }
}
. Received Model Group=gemini-2.5-flash-lite
Available Model Group Fallbacks=None"}

Suggested fix

Merge consecutive user turns into a single turn whose parts are concatenated.
If a functionResponse part appears in a user turn that does not directly follow a model turn containing a matching functionCall (matched by id), either:
- reorder the turns so the pairing is adjacent, or
- return a structured 400 from the proxy itself with a message like Orphan functionResponse at turn index N (id=call_xxx) so callers can debug their conversation builder.

This would bring the Gemini provider in line with the more forgiving behavior already seen in the OpenAI / Anthropic providers, where conversation history is normalized before dispatch.

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on?

v1.83.10

Twitter / LinkedIn details

No response

extent analysis

TL;DR

To fix the Gemini generateContent 400 error, normalize the contents sequence by merging consecutive same-role turns and reordering orphan functionResponse parts before forwarding to Gemini.

Guidance

Verify that the contents array in the request payload follows Gemini's tool-calling sequence rules.
Implement a normalization step in the Gemini provider transformation layer to merge consecutive user turns and reorder functionResponse parts.
Consider returning a structured 400 error from the proxy with a message indicating the turn index that violates the ordering rule.
Review the conversation builder to ensure it constructs the contents array in a way that adheres to Gemini's sequence rules.

Example

def normalize_contents(contents):
    # Merge consecutive user turns
    merged_contents = []
    current_turn = None
    for turn in contents:
        if turn['role'] == 'user' and current_turn and current_turn['role'] == 'user':
            current_turn['parts'].extend(turn['parts'])
        else:
            if current_turn:
                merged_contents.append(current_turn)
            current_turn = turn
    if current_turn:
        merged_contents.append(current_turn)

    # Reorder functionResponse parts
    for i, turn in enumerate(merged_contents):
        if turn['role'] == 'user' and any(part.get('functionResponse') for part in turn['parts']):
            # Check if the previous turn is a model turn with a matching functionCall
            if i > 0 and merged_contents[i-1]['role'] == 'model':
                prev_turn = merged_contents[i-1]
                for part in prev_turn['parts']:
                    if part.get('functionCall') and part['functionCall']['id'] == turn['parts'][0]['functionResponse']['id']:
                        # Reorder the turns
                        merged_contents[i-1], merged_contents[i] = merged_contents[i], merged_contents[i-1

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

One of the following:

(A) LiteLLM normalizes/repairs the contents sequence before forwarding to Gemini (e.g. merges consecutive same-role turns, reorders orphan functionResponse parts), or
(B) LiteLLM returns a more actionable error message indicating which turn index violates the ordering rule, instead of forwarding Gemini's generic message.

#api #conversation history #prompt template #agent execution #callback error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - 💡(How to fix) Fix [Bug]: Gemini CLI error when calling to LiteLLM - "Please ensure that function response turn comes immediately after a function call turn" [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

[Bug]: Gemini generateContent returns 400 when payload violates functionCall → functionResponse ordering

Check for existing issues

What happened?

Expected behavior

Actual behavior

Steps to Reproduce

1. Prerequisites

2. Reproduction payload

3. cURL

4. Python (requests)

5. Observed result

Relevant log output

Suggested fix

What part of LiteLLM is this about?

What LiteLLM version are you on?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

[Bug]: Gemini `generateContent` returns 400 when payload violates `functionCall` → `functionResponse` ordering