ollama - ✅(Solved) Fix Bug Report: gemma4:26b infinite tool call loop via LiteLLM proxy — Ollama 0.20.6/0.20.7 [2 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

After upgrading Ollama from 0.20.5 → 0.20.6/0.20.7, gemma4:26b enters an infinite loop when tool results are sent via a LiteLLM proxy 1.82.3 The model ignores the tool result and re-issues the identical tool call on every iteration.

The bug does not reproduce when calling Ollama directly (bypassing LiteLLM). Downgrading to Ollama 0.20.5 resolves the issue.


Root Cause

After upgrading Ollama from 0.20.5 → 0.20.6/0.20.7, gemma4:26b enters an infinite loop when tool results are sent via a LiteLLM proxy 1.82.3 The model ignores the tool result and re-issues the identical tool call on every iteration.

The bug does not reproduce when calling Ollama directly (bypassing LiteLLM). Downgrading to Ollama 0.20.5 resolves the issue.


Fix Action

Workaround

Downgrade Ollama to 0.20.5.


PR fix notes

PR #26121: fix(ollama): forward tool_calls and tool_call_id in transform_request

Description (problem / solution / changelog)

Summary

  • transform_request translated tool_calls on assistant messages to OllamaToolCall format but never copied them into the outgoing OllamaChatCompletionMessage — Ollama received {role: assistant, content: ''} with no tool_calls
  • The model had no record of having made a tool call and re-issued the same call on every turn, causing an infinite loop
  • tool_call_id on role: tool messages was also silently dropped; Ollama uses this field to resolve the tool function name from conversation history
  • Added tool_call_id to OllamaChatCompletionMessage TypedDict

Fixes #26094 (reported via https://github.com/ollama/ollama/issues/15719)

Test plan

  • TestOllamaToolCallTransformation::test_transform_request_preserves_tool_calls — asserts tool_calls survive the transform on assistant messages
  • TestOllamaToolCallTransformation::test_transform_request_forwards_tool_call_id — asserts tool_call_id is forwarded on tool response messages
  • Full test_ollama_chat_transformation.py suite: 24/24 pass

Changed files

  • litellm/llms/ollama/chat/transformation.py (modified, +5/-0)
  • litellm/types/llms/ollama.py (modified, +1/-0)
  • tests/test_litellm/llms/ollama/test_ollama_chat_transformation.py (modified, +95/-1)

PR #26122: fix(ollama): forward tool_calls and tool_call_id in transform_request

Description (problem / solution / changelog)

Summary

  • transform_request translated tool_calls on assistant messages to OllamaToolCall format but never copied them into the outgoing OllamaChatCompletionMessage — Ollama received {role: assistant, content: ''} with no tool_calls
  • The model had no record of having made a tool call and re-issued the same call on every turn, causing an infinite loop
  • tool_call_id on role: tool messages was also silently dropped; Ollama uses this field to resolve the tool function name from conversation history
  • Added tool_call_id to OllamaChatCompletionMessage TypedDict

Fixes #26094 (reported via https://github.com/ollama/ollama/issues/15719)

Test plan

  • TestOllamaToolCallTransformation::test_transform_request_preserves_tool_calls — asserts tool_calls survive the transform on assistant messages
  • TestOllamaToolCallTransformation::test_transform_request_forwards_tool_call_id — asserts tool_call_id is forwarded on tool response messages
  • Full test_ollama_chat_transformation.py suite: 24/24 pass

Changed files

  • litellm/llms/ollama/chat/transformation.py (modified, +7/-2)
  • litellm/types/llms/ollama.py (modified, +1/-0)
  • tests/test_litellm/llms/ollama/test_ollama_chat_transformation.py (modified, +93/-0)

Code Example

import json, urllib.request

LITELLM_URL = 'http://<litellm-host>/v1/chat/completions'
LITELLM_KEY = '<your-key>'

tools = [{
    'type': 'function',
    'function': {
        'name': 'get_weather',
        'description': 'Get current weather for a city',
        'parameters': {
            'type': 'object',
            'properties': {'city': {'type': 'string'}},
            'required': ['city']
        }
    }
}]

def call(messages, tools=None):
    payload = {'model': 'ollama/gemma4:26b', 'messages': messages, 'stream': False}
    if tools:
        payload['tools'] = tools
    data = json.dumps(payload).encode()
    req = urllib.request.Request(LITELLM_URL, data=data, headers={
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {LITELLM_KEY}'
    })
    with urllib.request.urlopen(req, timeout=120) as r:
        return json.loads(r.read())['choices'][0]['message']

# Step 1: model generates tool call
m1 = call([{'role': 'user', 'content': 'Weather in Warsaw?'}], tools)
print('Step 1 tool_calls:', m1.get('tool_calls'))  # → [get_weather(city=Warsaw)]

# Step 2: send tool result, expect text answer
msgs = [
    {'role': 'user', 'content': 'Weather in Warsaw?'},
    {'role': 'assistant', 'content': '', 'tool_calls': m1['tool_calls']},
    {'role': 'tool', 'tool_call_id': m1['tool_calls'][0]['id'],
     'content': '{"temperature": "15C", "conditions": "sunny"}'}
]
m2 = call(msgs, tools)
print('Step 2 content:', m2.get('content'))        # → '' (BUG: should be text answer)
print('Step 2 tool_calls:', m2.get('tool_calls'))  # → [get_weather(city=Warsaw)] again!

---

import json, urllib.request

OLLAMA_URL = 'http://<ollama-host>:11434/api/chat'

def ollama_call(messages, tools):
    payload = {'model': 'gemma4:26b', 'stream': False, 'tools': tools, 'messages': messages}
    data = json.dumps(payload).encode()
    req = urllib.request.Request(OLLAMA_URL, data=data,
                                 headers={'Content-Type': 'application/json'})
    with urllib.request.urlopen(req, timeout=120) as r:
        return json.loads(r.read())['message']

# Same scenario — Ollama returns correct text answer on step 2
m1 = ollama_call([{'role': 'user', 'content': 'Weather in Warsaw?'}], tools)
msgs = [
    {'role': 'user', 'content': 'Weather in Warsaw?'},
    {'role': 'assistant', 'content': '', 'tool_calls': m1['tool_calls']},
    {'role': 'tool', 'content': '{"temperature": "15C", "conditions": "sunny"}'}
]
m2 = ollama_call(msgs, tools)
print('content:', m2['content'])  # → "The weather in Warsaw is 15°C and sunny."
---

Step 1: content='', tool_calls=[get_weather(city=Warsaw)]   ← model calls tool
Step 2: content='', tool_calls=[get_weather(city=Warsaw)]BUG: ignores result, loops
Step 3: content='', tool_calls=[get_weather(city=Warsaw)]   ← same
Step 4: content='', tool_calls=[get_weather(city=Warsaw)]   ← same
Step 5: content='', tool_calls=[get_weather(city=Warsaw)]   ← same

---

Step 1: content='', tool_calls=[get_weather(city=Warsaw)]   ← model calls tool
Step 2: content='The weather in Warsaw is 15°C and sunny.'  ← correct text answer ✓

---

Step 1: content='', tool_calls=[get_weather(city=Warsaw)]   ← model calls tool
Step 2: content='The weather in Warsaw is 15°C and sunny.'  ← correct text answer ✓

---

logID  promptTokens  completionTokens  tool_called
2358   5232          18                calendar_read(overdue=true)
2359   5237          18                calendar_read(overdue=true)+5 tokens only
2360   5239          18                calendar_read(overdue=true)+2 tokens
2361   5241          18                calendar_read(overdue=true)+2 tokens
2362   5243          18                calendar_read(overdue=true)+2 tokens

---
RAW_BUFFERClick to expand / collapse

What is the issue?

Bug Report: gemma4:26b infinite tool call loop via LiteLLM proxy — Ollama 0.20.6/0.20.7

Date: 2026-04-17
Reporter: legalos.ai project
Severity: Critical — production regression, renders tool calling unusable

Summary

After upgrading Ollama from 0.20.5 → 0.20.6/0.20.7, gemma4:26b enters an infinite loop when tool results are sent via a LiteLLM proxy 1.82.3 The model ignores the tool result and re-issues the identical tool call on every iteration.

The bug does not reproduce when calling Ollama directly (bypassing LiteLLM). Downgrading to Ollama 0.20.5 resolves the issue.


Environment

ComponentVersion
Ollama0.20.7 (also 0.20.6)
Ollama modelgemma4:26b (pulled 2026-04-10)
LiteLLM1.82.3
OSLinux

Steps to Reproduce

Minimal reproducer (Python, via LiteLLM proxy)

import json, urllib.request

LITELLM_URL = 'http://<litellm-host>/v1/chat/completions'
LITELLM_KEY = '<your-key>'

tools = [{
    'type': 'function',
    'function': {
        'name': 'get_weather',
        'description': 'Get current weather for a city',
        'parameters': {
            'type': 'object',
            'properties': {'city': {'type': 'string'}},
            'required': ['city']
        }
    }
}]

def call(messages, tools=None):
    payload = {'model': 'ollama/gemma4:26b', 'messages': messages, 'stream': False}
    if tools:
        payload['tools'] = tools
    data = json.dumps(payload).encode()
    req = urllib.request.Request(LITELLM_URL, data=data, headers={
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {LITELLM_KEY}'
    })
    with urllib.request.urlopen(req, timeout=120) as r:
        return json.loads(r.read())['choices'][0]['message']

# Step 1: model generates tool call
m1 = call([{'role': 'user', 'content': 'Weather in Warsaw?'}], tools)
print('Step 1 tool_calls:', m1.get('tool_calls'))  # → [get_weather(city=Warsaw)]

# Step 2: send tool result, expect text answer
msgs = [
    {'role': 'user', 'content': 'Weather in Warsaw?'},
    {'role': 'assistant', 'content': '', 'tool_calls': m1['tool_calls']},
    {'role': 'tool', 'tool_call_id': m1['tool_calls'][0]['id'],
     'content': '{"temperature": "15C", "conditions": "sunny"}'}
]
m2 = call(msgs, tools)
print('Step 2 content:', m2.get('content'))        # → '' (BUG: should be text answer)
print('Step 2 tool_calls:', m2.get('tool_calls'))  # → [get_weather(city=Warsaw)] again!

Direct Ollama call (works correctly)

import json, urllib.request

OLLAMA_URL = 'http://<ollama-host>:11434/api/chat'

def ollama_call(messages, tools):
    payload = {'model': 'gemma4:26b', 'stream': False, 'tools': tools, 'messages': messages}
    data = json.dumps(payload).encode()
    req = urllib.request.Request(OLLAMA_URL, data=data,
                                 headers={'Content-Type': 'application/json'})
    with urllib.request.urlopen(req, timeout=120) as r:
        return json.loads(r.read())['message']

# Same scenario — Ollama returns correct text answer on step 2
m1 = ollama_call([{'role': 'user', 'content': 'Weather in Warsaw?'}], tools)
msgs = [
    {'role': 'user', 'content': 'Weather in Warsaw?'},
    {'role': 'assistant', 'content': '', 'tool_calls': m1['tool_calls']},
    {'role': 'tool', 'content': '{"temperature": "15C", "conditions": "sunny"}'}
]
m2 = ollama_call(msgs, tools)
print('content:', m2['content'])  # → "The weather in Warsaw is 15°C and sunny." ✓

Observed vs Expected Behavior

Ollama 0.20.7 via LiteLLM — BUG

Step 1: content='', tool_calls=[get_weather(city=Warsaw)]   ← model calls tool
Step 2: content='', tool_calls=[get_weather(city=Warsaw)]   ← BUG: ignores result, loops
Step 3: content='', tool_calls=[get_weather(city=Warsaw)]   ← same
Step 4: content='', tool_calls=[get_weather(city=Warsaw)]   ← same
Step 5: content='', tool_calls=[get_weather(city=Warsaw)]   ← same

Ollama 0.20.5 via LiteLLM — OK

Step 1: content='', tool_calls=[get_weather(city=Warsaw)]   ← model calls tool
Step 2: content='The weather in Warsaw is 15°C and sunny.'  ← correct text answer ✓

Ollama 0.20.7 directly (no proxy) — OK

Step 1: content='', tool_calls=[get_weather(city=Warsaw)]   ← model calls tool
Step 2: content='The weather in Warsaw is 15°C and sunny.'  ← correct text answer ✓

Evidence from Production Logs

Real production data showing the loop pattern (LLM call log, n8n.llm_log):

logID  promptTokens  completionTokens  tool_called
2358   5232          18                calendar_read(overdue=true)
2359   5237          18                calendar_read(overdue=true)   ← +5 tokens only
2360   5239          18                calendar_read(overdue=true)   ← +2 tokens
2361   5241          18                calendar_read(overdue=true)   ← +2 tokens
2362   5243          18                calendar_read(overdue=true)   ← +2 tokens

Key observation: promptTokens grows by only 2–5 per iteration, but the tool result JSON is ~80 tokens. This confirms the tool result content was not being injected into the model's context — only the tool_call metadata was appended.

The same loop pattern was observed across multiple tools:

  • calendar_read — identical args, empty result {events:[], tasks:[]}
  • law_lookup — identical args, non-empty result (model ignored rich legal articles)
  • dokuwiki_search — identical args, empty result (no wiki pages found)
  • deep_research — identical args (multi-step internal tool)

Hypothesis

Ollama 0.20.6 changelog states:

"Gemma 4 tool calling ability is improved and updated to use Google's latest post-launch fixes"

This likely changed the chat template used to format tool result messages for the model. LiteLLM 1.82.3 transforms OpenAI-format tool messages before forwarding to Ollama, and this transformation is now incompatible with the updated template.

When calling Ollama directly, the native /api/chat format is used correctly. When going through LiteLLM, the tool result message format sent to Ollama no longer matches what gemma4:26b's updated template expects — causing the model to not "see" the result and re-issue the same tool call.

Additional note: LiteLLM's model info for gemma4:26b reports "supports_function_calling": false, which may cause LiteLLM to apply a fallback prompt-injection path instead of passing tool messages natively.


Workaround

Downgrade Ollama to 0.20.5.


Related

Relevant log output

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.20.7

extent analysis

TL;DR

Downgrade Ollama to version 0.20.5 to resolve the infinite tool call loop issue when using the LiteLLM proxy.

Guidance

  • Verify that the issue is specific to Ollama versions 0.20.6 and 0.20.7, and that downgrading to 0.20.5 resolves the issue.
  • Investigate the compatibility of LiteLLM's tool message transformation with the updated chat template in Ollama 0.20.6 and later.
  • Check the LiteLLM model info for gemma4:26b to confirm that it reports "supports_function_calling": false, which may be causing the fallback prompt-injection path.
  • Review the Ollama changelog and LiteLLM bug report for more information on the issue and potential fixes.

Example

No code snippet is provided as the issue is related to the compatibility of LiteLLM's tool message transformation with the updated chat template in Ollama.

Notes

The issue seems to be specific to the combination of Ollama 0.20.6 or 0.20.7 and LiteLLM 1.82.3. Downgrading Ollama to 0.20.5 is a confirmed workaround, but a more permanent fix may require updates to either Ollama or LiteLLM.

Recommendation

Apply the workaround by downgrading Ollama to version 0.20.5, as this is a confirmed fix for the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - ✅(Solved) Fix Bug Report: gemma4:26b infinite tool call loop via LiteLLM proxy — Ollama 0.20.6/0.20.7 [2 pull requests]