litellm - ✅(Solved) Fix [Bug]: Gemini models degenerate in multi-turn tool-calling via /v1/messages — thoughtSignature not propagated from thought parts [1 pull requests, 1 participants]

litellm2026-04-08 05:16:01

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25322•Fetched 2026-04-09 07:52:51

View on GitHub

Comments

Participants

Timeline

Reactions

Author

josh900

Participants

josh900

Timeline (top)

labeled ×4cross-referenced ×1referenced ×1

Root Cause

When using Gemini models through LiteLLM's /v1/messages endpoint for multi-turn tool-calling conversations (e.g. Claude Code), the model degenerates after several turns — producing garbage short responses ("ja"), repetition loops ("Executing. Done. Going. Now."), or stopping mid-task. This happens because Gemini's thoughtSignature is not properly round-tripped between turns.

Fix Action

Fix / Workaround

v1.82.3-stable.patch.2 (also reproduces on v1.81.14-stable and v1.83.4-nightly)

PR fix notes

PR #25357: fix(gemini): capture thoughtSignature from sibling thought parts

Repository: BerriAI/litellm
Author: Bahtya
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/25357

Description (problem / solution / changelog)

Problem

When using Gemini models for multi-turn tool-calling, thoughtSignature may be placed on a separate thought part (thought: true) rather than the functionCall part. The current _transform_parts() only checks the functionCall part for thoughtSignature, so it is lost when placed on a sibling thought part.

Without the signature, Gemini loses multi-turn thinking coherence and degenerates (garbage responses, repetition loops).

Example Gemini response:

{
  "parts": [
    {"thought": true, "text": "I need to list files...", "thoughtSignature": "ErcNCrQN..."},
    {"functionCall": {"name": "Bash", "args": {"command": "ls -R"}}}
  ]
}

The thoughtSignature is on parts[0] but _transform_parts only checks parts[1].

Fix

Collect all thoughtSignature values from any part before iterating. When a functionCall part has no thoughtSignature of its own, fall back to the first available collected signature.

Addresses Failure 1 from #25322

Changed files

litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py (modified, +56/-40)
tests/test_litellm/llms/vertex_ai/gemini/test_thought_signature_in_tool_call_id.py (modified, +62/-6)

Code Example

# Current code in _transform_parts:
if "functionCall" in part:
    _function_chunk = { ... }
    thought_signature = part.get("thoughtSignature")  # ← only checks THIS part

---

{
  "candidates": [{
    "content": {
      "parts": [
        {"thought": true, "text": "I need to list files...", "thoughtSignature": "ErcNCrQN..."},
        {"functionCall": {"name": "Bash", "args": {"command": "ls -R"}}}
      ]
    }
  }]
}

---

# Collect all signatures from any part in the response
_all_signatures = [
    p["thoughtSignature"] for p in parts if p.get("thoughtSignature")
]

for part in parts:
    if "functionCall" in part:
        _function_chunk = { ... }
        # Fall back to signature from thought parts if functionCall has none
        thought_signature = part.get("thoughtSignature") or (
            _all_signatures[0] if _all_signatures else None
        )

---

elif content.get("type") == "thinking":
    thinking_block = ChatCompletionThinkingBlock(
        type="thinking",
        thinking=content.get("thinking") or "",
        signature=content.get("signature") or "",  # ← empty string for unsigned blocks
    )
    thinking_blocks.append(thinking_block)  # ← always appended, even without signature

---

elif content.get("type") == "thinking":
    _thinking_sig = content.get("signature") or ""
    if _thinking_sig:  # Only include signed thinking blocks
        thinking_block = ChatCompletionThinkingBlock(
            type="thinking",
            thinking=content.get("thinking") or "",
            signature=_thinking_sig,
        )
        thinking_blocks.append(thinking_block)

---

model_list:
     - model_name: gemini-3-flash-preview
       litellm_params:
         model: gemini/gemini-3-flash-preview
         api_key: os.environ/GEMINI_API_KEY
   litellm_settings:
     drop_params: true
     modify_params: true

---

export ANTHROPIC_BASE_URL="http://localhost:4000"
   export ANTHROPIC_AUTH_TOKEN="$LITELLM_MASTER_KEY"

---

claude --model gemini-3-flash-preview -p "create a hello world web app with index.html, styles.css, and script.js"

---

# Gemini response showing thoughtSignature on thought part (NOT on functionCall part):
parts: [
  {"thought": true, "text": "I need to...", "thoughtSignature": "ErcNCrQN..."},
  {"functionCall": {"name": "Bash", "args": {"command": "ls"}}}
]

# LiteLLM _transform_parts only checks functionCall part for signature:
thought_signature = part.get("thoughtSignature")  # → None (it's on the other part)

# Result: tool_call_id has no embedded signature, next turn sends no signature back
# Gemini loses thinking coherence and degenerates

# Failure 2 — unsigned thinking block in conversation history causes repetition:
# First thinking turn has signature="" (empty), gets echoed back on turn 2+
# Gemini sees unsigned thinking block and outputs reasoning as plain text:
"Executing. Done. Going. Now. Executing. Done. Going. Now..."
# (repeats until token budget exhausted)

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

There are two related failures:

Failure 1: thoughtSignature on thought parts is never captured

Gemini's API requires that the thoughtSignature from the previous assistant turn be echoed back on the next turn for multi-turn thinking coherence. LiteLLM's _transform_parts() in vertex_and_google_ai_studio_gemini.py only reads thoughtSignature from the part that contains functionCall:

# Current code in _transform_parts:
if "functionCall" in part:
    _function_chunk = { ... }
    thought_signature = part.get("thoughtSignature")  # ← only checks THIS part

But Gemini often places thoughtSignature on a separate thought part (thought: true), not on the functionCall part. When the signature lives on the thought part, it's never captured, never embedded in the tool_call_id, and never sent back on the following turn. Without the signature, Gemini loses coherence and degenerates.

Example Gemini response with signature on thought part (not functionCall part):

{
  "candidates": [{
    "content": {
      "parts": [
        {"thought": true, "text": "I need to list files...", "thoughtSignature": "ErcNCrQN..."},
        {"functionCall": {"name": "Bash", "args": {"command": "ls -R"}}}
      ]
    }
  }]
}

The thoughtSignature is on parts[0] (thought part), but _transform_parts only checks parts[1] (the functionCall part) — so the signature is lost.

Fix: Before iterating parts, collect all thoughtSignature values from any part. When a functionCall part has no signature of its own, fall back to the first available collected signature:

# Collect all signatures from any part in the response
_all_signatures = [
    p["thoughtSignature"] for p in parts if p.get("thoughtSignature")
]

for part in parts:
    if "functionCall" in part:
        _function_chunk = { ... }
        # Fall back to signature from thought parts if functionCall has none
        thought_signature = part.get("thoughtSignature") or (
            _all_signatures[0] if _all_signatures else None
        )

Failure 2: Unsigned thinking blocks echoed back cause repetition loops

The very first thinking turn Gemini produces often has no thoughtSignature yet (the signature appears starting from the second thinking turn). When this unsigned thinking block is included in conversation history on subsequent turns, Gemini outputs its reasoning as visible plain text and then enters repetition loops — consuming the entire token budget with patterns like "Executing. Done. Going. Now." before making any tool calls.

This happens in adapters/transformation.py → translate_anthropic_messages_to_openai(), in the request path where conversation history is converted for Gemini. The thinking block handler unconditionally includes all thinking blocks:

elif content.get("type") == "thinking":
    thinking_block = ChatCompletionThinkingBlock(
        type="thinking",
        thinking=content.get("thinking") or "",
        signature=content.get("signature") or "",  # ← empty string for unsigned blocks
    )
    thinking_blocks.append(thinking_block)  # ← always appended, even without signature

Fix: Skip thinking blocks that have an empty or null signature:

elif content.get("type") == "thinking":
    _thinking_sig = content.get("signature") or ""
    if _thinking_sig:  # Only include signed thinking blocks
        thinking_block = ChatCompletionThinkingBlock(
            type="thinking",
            thinking=content.get("thinking") or "",
            signature=_thinking_sig,
        )
        thinking_blocks.append(thinking_block)

Impact

Any user routing Claude Code (or any Anthropic-format client with thinking enabled) through LiteLLM to Gemini models will experience progressive degeneration in multi-turn tool-calling conversations. The model works fine for the first 1-3 turns, then starts producing incoherent or repetitive output. This is particularly visible in agentic coding workflows where 10+ tool-calling turns are common.

Files affected

litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py — _transform_parts() (Failure 1)
litellm/llms/anthropic/experimental_pass_through/adapters/transformation.py — translate_anthropic_messages_to_openai() (Failure 2)

Steps to Reproduce

Configure LiteLLM with a Gemini model that supports thinking:

model_list:
  - model_name: gemini-3-flash-preview
    litellm_params:
      model: gemini/gemini-3-flash-preview
      api_key: os.environ/GEMINI_API_KEY
litellm_settings:
  drop_params: true
  modify_params: true

Point Claude Code at the proxy:

export ANTHROPIC_BASE_URL="http://localhost:4000"
export ANTHROPIC_AUTH_TOKEN="$LITELLM_MASTER_KEY"

Run a multi-step coding task that requires 5+ tool calls:

claude --model gemini-3-flash-preview -p "create a hello world web app with index.html, styles.css, and script.js"

Observe: the model works for the first few turns, then starts producing:
- Very short garbage responses ("ja", single characters)
- Repetition loops ("Executing. Done. Going. Now. Executing. Done...")
- Visible reasoning text in the output instead of tool calls
- Stops making progress and consumes tokens without producing useful output
Inspect the request payloads sent to Gemini — the thoughtSignature from thought parts is missing in the conversation history, and unsigned thinking blocks are being echoed back.

Relevant log output

# Gemini response showing thoughtSignature on thought part (NOT on functionCall part):
parts: [
  {"thought": true, "text": "I need to...", "thoughtSignature": "ErcNCrQN..."},
  {"functionCall": {"name": "Bash", "args": {"command": "ls"}}}
]

# LiteLLM _transform_parts only checks functionCall part for signature:
thought_signature = part.get("thoughtSignature")  # → None (it's on the other part)

# Result: tool_call_id has no embedded signature, next turn sends no signature back
# Gemini loses thinking coherence and degenerates

# Failure 2 — unsigned thinking block in conversation history causes repetition:
# First thinking turn has signature="" (empty), gets echoed back on turn 2+
# Gemini sees unsigned thinking block and outputs reasoning as plain text:
"Executing. Done. Going. Now. Executing. Done. Going. Now..."
# (repeats until token budget exhausted)

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.3-stable.patch.2 (also reproduces on v1.81.14-stable and v1.83.4-nightly)

Twitter / LinkedIn details

No response

extent analysis

TL;DR

To fix the issue of Gemini models degenerating after several turns in multi-turn tool-calling conversations, update the _transform_parts function in vertex_and_google_ai_studio_gemini.py to collect all thoughtSignature values from any part and fall back to the first available collected signature when a functionCall part has none, and modify translate_anthropic_messages_to_openai in adapters/transformation.py to skip thinking blocks with empty or null signatures.

Guidance

Update _transform_parts: Collect all thoughtSignature values from any part in the response and fall back to the first available collected signature when a functionCall part has none.
Modify translate_anthropic_messages_to_openai: Skip thinking blocks that have an empty or null signature to prevent repetition loops.
Verify the fix: Run a multi-step coding task that requires 5+ tool calls and observe that the model no longer produces garbage responses, repetition loops, or stops making progress.
Check request payloads: Inspect the request payloads sent to Gemini to ensure that the thoughtSignature from thought parts is included in the conversation history and unsigned thinking blocks are not echoed back.

Example

# Collect all signatures from any part in the response
_all_signatures = [
    p["thoughtSignature"] for p in parts if p.get("thoughtSignature")
]

for part in parts:
    if "functionCall" in part:
        _function_chunk = { ... }
        # Fall back to signature from thought parts if functionCall has none
        thought_signature = part.get("thoughtSignature") or (
            _all_signatures[0] if _all_signatures else None
        )

Notes

The provided fixes assume that the issue is caused by the missing thoughtSignature in the conversation history and the inclusion of unsigned thinking blocks. If the issue persists after applying these fixes, further investigation may be necessary.

Recommendation

Apply the workaround by updating the _transform_parts function and modifying translate_anthropic_messages_to_openai to fix the issue, as the root cause is identified and a clear solution is provided.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #conversation history #API rate limit #retriever error #indexing error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.