litellm - 💡(How to fix) Fix [Bug]: Gemini CLI error when calling to LiteLLM - "Please ensure that function response turn comes immediately after a function call turn" [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26755Fetched 2026-04-30 06:20:20
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×3

Error Message

{"message": "Exception in ASGI application\n", "level": "ERROR", "timestamp": "2026-04-29T06:25:29.222656", "component": "uvicorn.error", "logger": "h11_impl.py:408", "stacktrace": "Traceback (most recent call last): File "/app/.venv/lib/python3.13/site-packages/litellm/llms/custom_httpx/llm_http_handler.py", line 10232, in async_generate_content_handler response = await async_httpx_client.post(...) ... litellm.llms.custom_httpx.http_handler.MaskedHTTPStatusError: Client error '400 Bad Request' for url 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent'

During handling of the above exception, another exception occurred:

File "/app/.venv/lib/python3.13/site-packages/litellm/google_genai/main.py", line 263, in agenerate_content response = await init_response litellm.llms.base_llm.chat.transformation.BaseLLMException: { "error": { "code": 400, "message": "Please ensure that function response turn comes immediately after a function call turn.", "status": "INVALID_ARGUMENT" } }

During handling of the above exception, another exception occurred:

File "/app/.venv/lib/python3.13/site-packages/litellm/proxy/google_endpoints/endpoints.py", line 75, in google_generate_content response = await llm_router.agenerate_content(**data) ... File "/app/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 1416, in exception_type raise BadRequestError(...) litellm.exceptions.BadRequestError: litellm.BadRequestError: GeminiException BadRequestError - { "error": { "code": 400, "message": "Please ensure that function response turn comes immediately after a function call turn.", "status": "INVALID_ARGUMENT" } } . Received Model Group=gemini-2.5-flash-lite Available Model Group Fallbacks=None"}

Fix Action

Fix / Workaround

In the Gemini provider transformation layer (likely under litellm/llms/vertex_ai/google_genai/ or the generate-content transformation path), add a normalization step before dispatching to the upstream API:

This would bring the Gemini provider in line with the more forgiving behavior already seen in the OpenAI / Anthropic providers, where conversation history is normalized before dispatch.

Code Example

"Please ensure that function response turn comes immediately after a function call turn."

---

curl --location 'http://localhost:4000/v1beta/models/gemini-2.5-flash-lite:generateContent' \
  --header 'Content-Type: application/json' \
  --header 'x-goog-api-key: sk-123456' \
  --header 'x-gemini-api-privileged-user-id: xxx-xxx-xxx-xxx-xxx' \
  --data-raw '{
    "contents": [
      {
        "role": "user",
        "parts": [
          {
            "functionResponse": {
              "id": "call_001",
              "name": "read_file",
              "response": {
                "output": "def hello():\n    return \"world\""
              }
            }
          }
        ]
      },
      {
        "role": "model",
        "parts": [
          {
            "text": "Analyzing the code...",
            "thought": true
          },
          {
            "functionCall": {
              "id": "call_002",
              "name": "read_file",
              "args": {
                "file_path": "src/main.py"
              }
            }
          }
        ]
      },
      {
        "role": "user",
        "parts": [
          {
            "functionResponse": {
              "id": "call_002",
              "name": "read_file",
              "response": {
                "output": "print(\"hello\")"
              }
            }
          }
        ]
      },
      {
        "role": "user",
        "parts": [
          {
            "text": "create plan to review"
          }
        ]
      }
    ],
    "systemInstruction": {
      "role": "user",
      "parts": [
        {
          "text": "You are a Task Routing AI. Return a JSON object with complexity_reasoning (string) and complexity_score (integer 1-100)."
        }
      ]
    },
    "generationConfig": {
      "temperature": 0,
      "topP": 1,
      "maxOutputTokens": 1024,
      "responseMimeType": "application/json",
      "responseJsonSchema": {
        "type": "OBJECT",
        "properties": {
          "complexity_reasoning": { "type": "STRING" },
          "complexity_score": { "type": "INTEGER" }
        },
        "required": ["complexity_reasoning", "complexity_score"]
      },
      "thinkingConfig": {
        "thinkingBudget": 512
      }
    }
  }'

---

import os
import requests

PROXY_URL = "http://localhost:4000/v1beta/models/gemini-2.5-flash-lite:generateContent"

headers = {
    "Content-Type": "application/json",
    "x-goog-api-key": os.environ["LITELLM_API_KEY"],
    "x-gemini-api-privileged-user-id": os.environ["PRIVILEGED_USER_ID"],
}

payload = {
    "contents": [
        {
            "role": "user",
            "parts": [
                {
                    "functionResponse": {
                        "id": "call_001",
                        "name": "read_file",
                        "response": {"output": "def hello():\n    return \"world\""},
                    }
                }
            ],
        },
        {
            "role": "model",
            "parts": [
                {"text": "Analyzing the code...", "thought": True},
                {
                    "functionCall": {
                        "id": "call_002",
                        "name": "read_file",
                        "args": {"file_path": "src/main.py"},
                    }
                },
            ],
        },
        {
            "role": "user",
            "parts": [
                {
                    "functionResponse": {
                        "id": "call_002",
                        "name": "read_file",
                        "response": {"output": "print(\"hello\")"},
                    }
                }
            ],
        },
        {
            "role": "user",
            "parts": [{"text": "create plan to review"}],
        },
    ],
    "systemInstruction": {
        "role": "user",
        "parts": [
            {
                "text": (
                    "You are a Task Routing AI. Return a JSON object with "
                    "complexity_reasoning (string) and complexity_score (integer 1-100)."
                )
            }
        ],
    },
    "generationConfig": {
        "temperature": 0,
        "topP": 1,
        "maxOutputTokens": 1024,
        "responseMimeType": "application/json",
        "responseJsonSchema": {
            "type": "OBJECT",
            "properties": {
                "complexity_reasoning": {"type": "STRING"},
                "complexity_score": {"type": "INTEGER"},
            },
            "required": ["complexity_reasoning", "complexity_score"],
        },
        "thinkingConfig": {"thinkingBudget": 512},
    },
}

response = requests.post(PROXY_URL, headers=headers, json=payload, timeout=30)
print("Status:", response.status_code)
print("Body:", response.text)

---

Status: 400
Body: {"error":{"code":400,"message":"Please ensure that function response turn comes immediately after a function call turn.","status":"INVALID_ARGUMENT"}}

---

{"message": "Exception in ASGI application\n", "level": "ERROR", "timestamp": "2026-04-29T06:25:29.222656", "component": "uvicorn.error", "logger": "h11_impl.py:408", "stacktrace": "Traceback (most recent call last):
  File \"/app/.venv/lib/python3.13/site-packages/litellm/llms/custom_httpx/llm_http_handler.py\", line 10232, in async_generate_content_handler
    response = await async_httpx_client.post(...)
  ...
litellm.llms.custom_httpx.http_handler.MaskedHTTPStatusError: Client error '400 Bad Request' for url 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent'

During handling of the above exception, another exception occurred:

  File \"/app/.venv/lib/python3.13/site-packages/litellm/google_genai/main.py\", line 263, in agenerate_content
    response = await init_response
litellm.llms.base_llm.chat.transformation.BaseLLMException: {
  \"error\": {
    \"code\": 400,
    \"message\": \"Please ensure that function response turn comes immediately after a function call turn.\",
    \"status\": \"INVALID_ARGUMENT\"
  }
}

During handling of the above exception, another exception occurred:

  File \"/app/.venv/lib/python3.13/site-packages/litellm/proxy/google_endpoints/endpoints.py\", line 75, in google_generate_content
    response = await llm_router.agenerate_content(**data)
  ...
  File \"/app/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 1416, in exception_type
    raise BadRequestError(...)
litellm.exceptions.BadRequestError: litellm.BadRequestError: GeminiException BadRequestError - {
  \"error\": {
    \"code\": 400,
    \"message\": \"Please ensure that function response turn comes immediately after a function call turn.\",
    \"status\": \"INVALID_ARGUMENT\"
  }
}
. Received Model Group=gemini-2.5-flash-lite
Available Model Group Fallbacks=None"}
RAW_BUFFERClick to expand / collapse

[Bug]: Gemini generateContent returns 400 when payload violates functionCallfunctionResponse ordering

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When calling Gemini models through the LiteLLM proxy's /v1beta/models/{model}:generateContent endpoint, requests fail with a 400 Bad Request if the contents array does not strictly follow Gemini's tool-calling sequence rules.

Gemini requires that:

  1. A functionResponse part must appear in a user turn that comes immediately after a model turn containing the matching functionCall.
  2. Two consecutive turns with the same role (e.g. two user turns in a row) are not permitted when one of them carries a functionResponse.

When these rules are violated — which can happen organically in agentic workflows where tool history accumulates across multiple turns — Gemini rejects the request with:

"Please ensure that function response turn comes immediately after a function call turn."

LiteLLM forwards this 400 directly to the client without normalizing the message sequence, so any client that builds Gemini-formatted payloads incrementally (e.g. agent loops with interleaved reasoning) will hit this error.

Expected behavior

One of the following:

  • (A) LiteLLM normalizes/repairs the contents sequence before forwarding to Gemini (e.g. merges consecutive same-role turns, reorders orphan functionResponse parts), or
  • (B) LiteLLM returns a more actionable error message indicating which turn index violates the ordering rule, instead of forwarding Gemini's generic message.

Actual behavior

The proxy forwards the request as-is and surfaces a 400 from Gemini wrapped in litellm.BadRequestError, with no indication of which turn caused the violation.


Steps to Reproduce

1. Prerequisites

  • LiteLLM proxy v1.83.10 running on http://localhost:4000
  • gemini-2.5-flash-lite configured as a model in config.yaml
  • A valid LiteLLM virtual key

2. Reproduction payload

The payload below intentionally violates Gemini's ordering rules in two ways:

  • The first turn contains a functionResponse with no preceding functionCall.
  • The last two turns are both role: "user" (a functionResponse followed by a plain text message).

3. cURL

curl --location 'http://localhost:4000/v1beta/models/gemini-2.5-flash-lite:generateContent' \
  --header 'Content-Type: application/json' \
  --header 'x-goog-api-key: sk-123456' \
  --header 'x-gemini-api-privileged-user-id: xxx-xxx-xxx-xxx-xxx' \
  --data-raw '{
    "contents": [
      {
        "role": "user",
        "parts": [
          {
            "functionResponse": {
              "id": "call_001",
              "name": "read_file",
              "response": {
                "output": "def hello():\n    return \"world\""
              }
            }
          }
        ]
      },
      {
        "role": "model",
        "parts": [
          {
            "text": "Analyzing the code...",
            "thought": true
          },
          {
            "functionCall": {
              "id": "call_002",
              "name": "read_file",
              "args": {
                "file_path": "src/main.py"
              }
            }
          }
        ]
      },
      {
        "role": "user",
        "parts": [
          {
            "functionResponse": {
              "id": "call_002",
              "name": "read_file",
              "response": {
                "output": "print(\"hello\")"
              }
            }
          }
        ]
      },
      {
        "role": "user",
        "parts": [
          {
            "text": "create plan to review"
          }
        ]
      }
    ],
    "systemInstruction": {
      "role": "user",
      "parts": [
        {
          "text": "You are a Task Routing AI. Return a JSON object with complexity_reasoning (string) and complexity_score (integer 1-100)."
        }
      ]
    },
    "generationConfig": {
      "temperature": 0,
      "topP": 1,
      "maxOutputTokens": 1024,
      "responseMimeType": "application/json",
      "responseJsonSchema": {
        "type": "OBJECT",
        "properties": {
          "complexity_reasoning": { "type": "STRING" },
          "complexity_score": { "type": "INTEGER" }
        },
        "required": ["complexity_reasoning", "complexity_score"]
      },
      "thinkingConfig": {
        "thinkingBudget": 512
      }
    }
  }'

4. Python (requests)

import os
import requests

PROXY_URL = "http://localhost:4000/v1beta/models/gemini-2.5-flash-lite:generateContent"

headers = {
    "Content-Type": "application/json",
    "x-goog-api-key": os.environ["LITELLM_API_KEY"],
    "x-gemini-api-privileged-user-id": os.environ["PRIVILEGED_USER_ID"],
}

payload = {
    "contents": [
        {
            "role": "user",
            "parts": [
                {
                    "functionResponse": {
                        "id": "call_001",
                        "name": "read_file",
                        "response": {"output": "def hello():\n    return \"world\""},
                    }
                }
            ],
        },
        {
            "role": "model",
            "parts": [
                {"text": "Analyzing the code...", "thought": True},
                {
                    "functionCall": {
                        "id": "call_002",
                        "name": "read_file",
                        "args": {"file_path": "src/main.py"},
                    }
                },
            ],
        },
        {
            "role": "user",
            "parts": [
                {
                    "functionResponse": {
                        "id": "call_002",
                        "name": "read_file",
                        "response": {"output": "print(\"hello\")"},
                    }
                }
            ],
        },
        {
            "role": "user",
            "parts": [{"text": "create plan to review"}],
        },
    ],
    "systemInstruction": {
        "role": "user",
        "parts": [
            {
                "text": (
                    "You are a Task Routing AI. Return a JSON object with "
                    "complexity_reasoning (string) and complexity_score (integer 1-100)."
                )
            }
        ],
    },
    "generationConfig": {
        "temperature": 0,
        "topP": 1,
        "maxOutputTokens": 1024,
        "responseMimeType": "application/json",
        "responseJsonSchema": {
            "type": "OBJECT",
            "properties": {
                "complexity_reasoning": {"type": "STRING"},
                "complexity_score": {"type": "INTEGER"},
            },
            "required": ["complexity_reasoning", "complexity_score"],
        },
        "thinkingConfig": {"thinkingBudget": 512},
    },
}

response = requests.post(PROXY_URL, headers=headers, json=payload, timeout=30)
print("Status:", response.status_code)
print("Body:", response.text)

5. Observed result

Status: 400
Body: {"error":{"code":400,"message":"Please ensure that function response turn comes immediately after a function call turn.","status":"INVALID_ARGUMENT"}}

Relevant log output

{"message": "Exception in ASGI application\n", "level": "ERROR", "timestamp": "2026-04-29T06:25:29.222656", "component": "uvicorn.error", "logger": "h11_impl.py:408", "stacktrace": "Traceback (most recent call last):
  File \"/app/.venv/lib/python3.13/site-packages/litellm/llms/custom_httpx/llm_http_handler.py\", line 10232, in async_generate_content_handler
    response = await async_httpx_client.post(...)
  ...
litellm.llms.custom_httpx.http_handler.MaskedHTTPStatusError: Client error '400 Bad Request' for url 'https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-lite:generateContent'

During handling of the above exception, another exception occurred:

  File \"/app/.venv/lib/python3.13/site-packages/litellm/google_genai/main.py\", line 263, in agenerate_content
    response = await init_response
litellm.llms.base_llm.chat.transformation.BaseLLMException: {
  \"error\": {
    \"code\": 400,
    \"message\": \"Please ensure that function response turn comes immediately after a function call turn.\",
    \"status\": \"INVALID_ARGUMENT\"
  }
}

During handling of the above exception, another exception occurred:

  File \"/app/.venv/lib/python3.13/site-packages/litellm/proxy/google_endpoints/endpoints.py\", line 75, in google_generate_content
    response = await llm_router.agenerate_content(**data)
  ...
  File \"/app/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py\", line 1416, in exception_type
    raise BadRequestError(...)
litellm.exceptions.BadRequestError: litellm.BadRequestError: GeminiException BadRequestError - {
  \"error\": {
    \"code\": 400,
    \"message\": \"Please ensure that function response turn comes immediately after a function call turn.\",
    \"status\": \"INVALID_ARGUMENT\"
  }
}
. Received Model Group=gemini-2.5-flash-lite
Available Model Group Fallbacks=None"}

Suggested fix

In the Gemini provider transformation layer (likely under litellm/llms/vertex_ai/google_genai/ or the generate-content transformation path), add a normalization step before dispatching to the upstream API:

  1. Merge consecutive user turns into a single turn whose parts are concatenated.
  2. If a functionResponse part appears in a user turn that does not directly follow a model turn containing a matching functionCall (matched by id), either:
    • reorder the turns so the pairing is adjacent, or
    • return a structured 400 from the proxy itself with a message like Orphan functionResponse at turn index N (id=call_xxx) so callers can debug their conversation builder.

This would bring the Gemini provider in line with the more forgiving behavior already seen in the OpenAI / Anthropic providers, where conversation history is normalized before dispatch.


What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on?

v1.83.10

Twitter / LinkedIn details

No response

extent analysis

TL;DR

To fix the Gemini generateContent 400 error, normalize the contents sequence by merging consecutive same-role turns and reordering orphan functionResponse parts before forwarding to Gemini.

Guidance

  • Verify that the contents array in the request payload follows Gemini's tool-calling sequence rules.
  • Implement a normalization step in the Gemini provider transformation layer to merge consecutive user turns and reorder functionResponse parts.
  • Consider returning a structured 400 error from the proxy with a message indicating the turn index that violates the ordering rule.
  • Review the conversation builder to ensure it constructs the contents array in a way that adheres to Gemini's sequence rules.

Example

def normalize_contents(contents):
    # Merge consecutive user turns
    merged_contents = []
    current_turn = None
    for turn in contents:
        if turn['role'] == 'user' and current_turn and current_turn['role'] == 'user':
            current_turn['parts'].extend(turn['parts'])
        else:
            if current_turn:
                merged_contents.append(current_turn)
            current_turn = turn
    if current_turn:
        merged_contents.append(current_turn)

    # Reorder functionResponse parts
    for i, turn in enumerate(merged_contents):
        if turn['role'] == 'user' and any(part.get('functionResponse') for part in turn['parts']):
            # Check if the previous turn is a model turn with a matching functionCall
            if i > 0 and merged_contents[i-1]['role'] == 'model':
                prev_turn = merged_contents[i-1]
                for part in prev_turn['parts']:
                    if part.get('functionCall') and part['functionCall']['id'] == turn['parts'][0]['functionResponse']['id']:
                        # Reorder the turns
                        merged_contents[i-1], merged_contents[i] = merged_contents[i], merged_contents[i-1

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

One of the following:

  • (A) LiteLLM normalizes/repairs the contents sequence before forwarding to Gemini (e.g. merges consecutive same-role turns, reorders orphan functionResponse parts), or
  • (B) LiteLLM returns a more actionable error message indicating which turn index violates the ordering rule, instead of forwarding Gemini's generic message.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING