litellm - 💡(How to fix) Fix [Bug]: Ollama Gemma 4 Infinite Tool Loop: Role mismatch ("tool" vs "tool_responses")

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When using Gemma 4 (cloud) model (gemma4:31b-cloud) via the Ollama provider in LiteLLM, the model enters an infinite tool-calling loop. Even after the tool result is provided in the conversation history, the model fails to recognize the result and repeatedly triggers the same tool call.

This occurs because Gemma 4 (via Ollama) expects the tool result role to be tool_responses, but LiteLLM standardizes it to the OpenAI-compatible tool role. Because of this mismatch, the model ignores the result and hallucinates that the tool was never executed.

Root Cause

This occurs because Gemma 4 (via Ollama) expects the tool result role to be tool_responses, but LiteLLM standardizes it to the OpenAI-compatible tool role. Because of this mismatch, the model ignores the result and hallucinates that the tool was never executed.

Fix Action

Fix / Workaround

ollama_message = OllamaChatCompletionMessage( role=role, )

I have verified this fix locally by patching the library in my `.venv`, and it completely resolves the infinite loop, allowing the model to correctly recognize tool results and provide the final answer after a single execution.

*I have attached a minimal reproduction script. When running this against the unpatched litellm version, the model enters an infinite loop calling the add tool. After patching the role from tool to tool_responses in ollama_chat.py, the loop is resolved and the model provides the final answer.*

*After applying the patch in the local `litellm\llms\ollama\chat\transformation.py`, within the `transform_request`*

Code Example

# In transform_request loop:
role = cast(str, m.get("role"))

# Fix for Gemma 4 Tool Calling Loop: 
# Map 'tool' role to 'tool_responses' for Gemma models to prevent infinite loops.
if role == "tool" and "gemma4" in model.lower():
    role = "tool_responses"

ollama_message = OllamaChatCompletionMessage(
    role=role,
)

---

### 🔄 Steps to Reproduce
1. **Setup:** Ensure a local or cloud Ollama instance is running with the `gemma4:31b-cloud` model pulled.
2. **Run:** Execute a minimal reproduction script (provided above) using `uv run` or a standard Python environment.
3. **Trigger:** Use a prompt that requires a function call (e.g., `"What is 2 + 2?"`).
4. **The Loop Sequence:**
    - **Turn 1:** Model emits `tool_calls` $\rightarrow$ Tool is executed $\rightarrow$ Result is appended as `role: tool`.
    - **Turn 2:** Model ignores the `role: tool` message and emits the **same** `tool_calls` again.
    - **Turns 3-N:** This cycle repeats indefinitely, resulting in an infinite loop until the `MAX_ITERATIONS` limit is hit.


### Relevant log output
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Description

When using Gemma 4 (cloud) model (gemma4:31b-cloud) via the Ollama provider in LiteLLM, the model enters an infinite tool-calling loop. Even after the tool result is provided in the conversation history, the model fails to recognize the result and repeatedly triggers the same tool call.

This occurs because Gemma 4 (via Ollama) expects the tool result role to be tool_responses, but LiteLLM standardizes it to the OpenAI-compatible tool role. Because of this mismatch, the model ignores the result and hallucinates that the tool was never executed.

Root Cause & Evidence

This is a known semantic mismatch in how Gemma 4 handles tool roles compared to the OpenAI standard. The model specifically looks for tool_responses to terminate the tool-use phase. When it receives the standardized tool role, it does not recognize the output as a valid result, leading it to believe the tool was never called.

Reference: This issue was identified and fixed in the Google ADK project here: https://github.com/google/adk-python/pull/5655

Proposed Fix

In litellm\llms\ollama\chat\transformation.py, within the transform_request method, the role should be dynamically mapped to tool_responses when a Gemma model is detected to ensure compatibility with the local backend.

Suggested Code Change:

# In transform_request loop:
role = cast(str, m.get("role"))

# Fix for Gemma 4 Tool Calling Loop: 
# Map 'tool' role to 'tool_responses' for Gemma models to prevent infinite loops.
if role == "tool" and "gemma4" in model.lower():
    role = "tool_responses"

ollama_message = OllamaChatCompletionMessage(
    role=role,
)

I have verified this fix locally by patching the library in my .venv, and it completely resolves the infinite loop, allowing the model to correctly recognize tool results and provide the final answer after a single execution.

🛠️ Environment

  • Model: gemma4:31b-cloud (via Ollama)
  • LiteLLM Version: [v1.83.0 & v1.83.7]
  • OS: Windows
  • Backend: Ollama (Local to cloud)

Steps to Reproduce

I have attached a minimal reproduction script. When running this against the unpatched litellm version, the model enters an infinite loop calling the add tool. After patching the role from tool to tool_responses in ollama_chat.py, the loop is resolved and the model provides the final answer.

import litellm
import json

# 1. Setup - Use the exact model that failed
MODEL = "ollama_chat/gemma4:31b-cloud"

# 2. Define the tool
tools = [
    {
        "type": "function",
        "function": {
            "name": "add",
            "description": "Add two numbers together.",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {"type": "number", "description": "First number"},
                    "b": {"type": "number", "description": "Second number"},
                },
                "required": ["a", "b"],
                "strict": True,
            },
        },
    }
]


def add_numbers(a, b):
    return {"result": a + b}


# 3. The Conversation
messages = [{"role": "user", "content": "What is 2 + 2?"}]

print(f"Starting reproduction loop with model: {MODEL}\n")

# We limit to 5 turns. If the bug exists, it will loop 5 times.
# If fixed, it will stop after the first tool result.
for i in range(5):
    print(f"--- Turn {i+1} ---")

    response = litellm.completion(
        model=MODEL,
        messages=messages,
        tools=tools,
        tool_choice="auto",
        think=True,
    )

    message = response.choices[0].message

    # Check for tool calls
    if message.tool_calls:
        print(f"Model requested tool: {message.tool_calls[0].function.name}")

        # Add assistant message to history
        messages.append(message)

        # Execute and add tool result to history
        for tool in message.tool_calls:
            args = json.loads(tool.function.arguments)
            result = add_numbers(args["a"], args["b"])

            messages.append(
                {
                    "role": "tool",
                    "tool_call_id": tool.id,
                    "name": tool.function.name,
                    "content": json.dumps(result),
                }
            )
            print(f"Tool result provided: {result}")
    else:
        print(f"Final Answer: {message.content}")
        break  # Success! The loop stopped.

if i == 4:
    print("\n❌ BUG REPRODUCED: The model entered an infinite tool loop.")
else:
    print("\n✅ NO LOOP: The model responded correctly.")

🔄 Steps to Reproduce

  1. Setup: Ensure a local or cloud Ollama instance is running with the gemma4:31b-cloud model pulled.
  2. Run: Execute a minimal reproduction script (provided above) using uv run or a standard Python environment.
  3. Trigger: Use a prompt that requires a function call (e.g., "What is 2 + 2?").
  4. The Loop Sequence:
    • Turn 1: Model emits tool_calls $\rightarrow$ Tool is executed $\rightarrow$ Result is appended as role: tool.
    • Turn 2: Model ignores the role: tool message and emits the same tool_calls again.
    • Turns 3-N: This cycle repeats indefinitely, resulting in an infinite loop until the MAX_ITERATIONS limit is hit.

Relevant log output

Starting reproduction loop with model: ollama_chat/gemma4:31b-cloud

--- Turn 1 ---
Model requested tool: add
Tool result provided: {'result': 4}
--- Turn 2 ---
Model requested tool: add
Tool result provided: {'result': 4}
--- Turn 3 ---
Model requested tool: add
Tool result provided: {'result': 4}
--- Turn 4 ---
Model requested tool: add
Tool result provided: {'result': 4}
--- Turn 5 ---
Model requested tool: add
Tool result provided: {'result': 4}

❌ BUG REPRODUCED: The model entered an infinite tool loop.

*After applying the patch in the local `litellm\llms\ollama\chat\transformation.py`, within the `transform_request`*

Starting reproduction loop with model: ollama_chat/gemma4:31b-cloud

--- Turn 1 ---
Model requested tool: add
Tool result provided: {'result': 4}
--- Turn 2 ---
Final Answer: 2 + 2 is 4.

✅ NO LOOP: The model responded correctly.

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

v1.83.7

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING