ollama - ✅(Solved) Fix Bug Report: gemma4:26b infinite tool call loop via LiteLLM proxy — Ollama 0.20.6/0.20.7 [2 pull requests]

ollama2026-04-20 10:09:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

After upgrading Ollama from 0.20.5 → 0.20.6/0.20.7, gemma4:26b enters an infinite loop when tool results are sent via a LiteLLM proxy 1.82.3 The model ignores the tool result and re-issues the identical tool call on every iteration.

The bug does not reproduce when calling Ollama directly (bypassing LiteLLM). Downgrading to Ollama 0.20.5 resolves the issue.

Root Cause

The bug does not reproduce when calling Ollama directly (bypassing LiteLLM). Downgrading to Ollama 0.20.5 resolves the issue.

Fix Action

Workaround

Downgrade Ollama to 0.20.5.

PR fix notes

PR #26121: fix(ollama): forward tool_calls and tool_call_id in transform_request

Repository: BerriAI/litellm
Author: mverrilli
State: closed | merged: False
Link: https://github.com/BerriAI/litellm/pull/26121

Description (problem / solution / changelog)

Summary

transform_request translated tool_calls on assistant messages to OllamaToolCall format but never copied them into the outgoing OllamaChatCompletionMessage — Ollama received {role: assistant, content: ''} with no tool_calls
The model had no record of having made a tool call and re-issued the same call on every turn, causing an infinite loop
tool_call_id on role: tool messages was also silently dropped; Ollama uses this field to resolve the tool function name from conversation history
Added tool_call_id to OllamaChatCompletionMessage TypedDict

Fixes #26094 (reported via https://github.com/ollama/ollama/issues/15719)

Test plan

TestOllamaToolCallTransformation::test_transform_request_preserves_tool_calls — asserts tool_calls survive the transform on assistant messages
TestOllamaToolCallTransformation::test_transform_request_forwards_tool_call_id — asserts tool_call_id is forwarded on tool response messages
Full test_ollama_chat_transformation.py suite: 24/24 pass

Changed files

litellm/llms/ollama/chat/transformation.py (modified, +5/-0)
litellm/types/llms/ollama.py (modified, +1/-0)
tests/test_litellm/llms/ollama/test_ollama_chat_transformation.py (modified, +95/-1)

PR #26122: fix(ollama): forward tool_calls and tool_call_id in transform_request

Repository: BerriAI/litellm
Author: mverrilli
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/26122

Description (problem / solution / changelog)

Summary

transform_request translated tool_calls on assistant messages to OllamaToolCall format but never copied them into the outgoing OllamaChatCompletionMessage — Ollama received {role: assistant, content: ''} with no tool_calls
The model had no record of having made a tool call and re-issued the same call on every turn, causing an infinite loop
tool_call_id on role: tool messages was also silently dropped; Ollama uses this field to resolve the tool function name from conversation history
Added tool_call_id to OllamaChatCompletionMessage TypedDict

Fixes #26094 (reported via https://github.com/ollama/ollama/issues/15719)

Test plan

TestOllamaToolCallTransformation::test_transform_request_preserves_tool_calls — asserts tool_calls survive the transform on assistant messages
TestOllamaToolCallTransformation::test_transform_request_forwards_tool_call_id — asserts tool_call_id is forwarded on tool response messages
Full test_ollama_chat_transformation.py suite: 24/24 pass

Changed files

litellm/llms/ollama/chat/transformation.py (modified, +7/-2)
litellm/types/llms/ollama.py (modified, +1/-0)
tests/test_litellm/llms/ollama/test_ollama_chat_transformation.py (modified, +93/-0)

Code Example

import json, urllib.request

LITELLM_URL = 'http://<litellm-host>/v1/chat/completions'
LITELLM_KEY = '<your-key>'

tools = [{
    'type': 'function',
    'function': {
        'name': 'get_weather',
        'description': 'Get current weather for a city',
        'parameters': {
            'type': 'object',
            'properties': {'city': {'type': 'string'}},
            'required': ['city']
        }
    }
}]

def call(messages, tools=None):
    payload = {'model': 'ollama/gemma4:26b', 'messages': messages, 'stream': False}
    if tools:
        payload['tools'] = tools
    data = json.dumps(payload).encode()
    req = urllib.request.Request(LITELLM_URL, data=data, headers={
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {LITELLM_KEY}'
    })
    with urllib.request.urlopen(req, timeout=120) as r:
        return json.loads(r.read())['choices'][0]['message']

# Step 1: model generates tool call
m1 = call([{'role': 'user', 'content': 'Weather in Warsaw?'}], tools)
print('Step 1 tool_calls:', m1.get('tool_calls'))  # → [get_weather(city=Warsaw)]

# Step 2: send tool result, expect text answer
msgs = [
    {'role': 'user', 'content': 'Weather in Warsaw?'},
    {'role': 'assistant', 'content': '', 'tool_calls': m1['tool_calls']},
    {'role': 'tool', 'tool_call_id': m1['tool_calls'][0]['id'],
     'content': '{"temperature": "15C", "conditions": "sunny"}'}
]
m2 = call(msgs, tools)
print('Step 2 content:', m2.get('content'))        # → '' (BUG: should be text answer)
print('Step 2 tool_calls:', m2.get('tool_calls'))  # → [get_weather(city=Warsaw)] again!

---

import json, urllib.request

OLLAMA_URL = 'http://<ollama-host>:11434/api/chat'

def ollama_call(messages, tools):
    payload = {'model': 'gemma4:26b', 'stream': False, 'tools': tools, 'messages': messages}
    data = json.dumps(payload).encode()
    req = urllib.request.Request(OLLAMA_URL, data=data,
                                 headers={'Content-Type': 'application/json'})
    with urllib.request.urlopen(req, timeout=120) as r:
        return json.loads(r.read())['message']

# Same scenario — Ollama returns correct text answer on step 2
m1 = ollama_call([{'role': 'user', 'content': 'Weather in Warsaw?'}], tools)
msgs = [
    {'role': 'user', 'content': 'Weather in Warsaw?'},
    {'role': 'assistant', 'content': '', 'tool_calls': m1['tool_calls']},
    {'role': 'tool', 'content': '{"temperature": "15C", "conditions": "sunny"}'}
]
m2 = ollama_call(msgs, tools)
print('content:', m2['content'])  # → "The weather in Warsaw is 15°C and sunny." ✓

---

Step 1: content='', tool_calls=[get_weather(city=Warsaw)]   ← model calls tool
Step 2: content='', tool_calls=[get_weather(city=Warsaw)]   ← BUG: ignores result, loops
Step 3: content='', tool_calls=[get_weather(city=Warsaw)]   ← same
Step 4: content='', tool_calls=[get_weather(city=Warsaw)]   ← same
Step 5: content='', tool_calls=[get_weather(city=Warsaw)]   ← same

---

Step 1: content='', tool_calls=[get_weather(city=Warsaw)]   ← model calls tool
Step 2: content='The weather in Warsaw is 15°C and sunny.'  ← correct text answer ✓

---

Step 1: content='', tool_calls=[get_weather(city=Warsaw)]   ← model calls tool
Step 2: content='The weather in Warsaw is 15°C and sunny.'  ← correct text answer ✓

---

logID  promptTokens  completionTokens  tool_called
2358   5232          18                calendar_read(overdue=true)
2359   5237          18                calendar_read(overdue=true)   ← +5 tokens only
2360   5239          18                calendar_read(overdue=true)   ← +2 tokens
2361   5241          18                calendar_read(overdue=true)   ← +2 tokens
2362   5243          18                calendar_read(overdue=true)   ← +2 tokens

---

RAW_BUFFERClick to expand / collapse

What is the issue?

Bug Report: gemma4:26b infinite tool call loop via LiteLLM proxy — Ollama 0.20.6/0.20.7

Date: 2026-04-17
Reporter: legalos.ai project
Severity: Critical — production regression, renders tool calling unusable

Summary

The bug does not reproduce when calling Ollama directly (bypassing LiteLLM). Downgrading to Ollama 0.20.5 resolves the issue.

Environment

Component	Version
Ollama	0.20.7 (also 0.20.6)
Ollama model	`gemma4:26b` (pulled 2026-04-10)
LiteLLM	1.82.3
OS	Linux

Steps to Reproduce

Minimal reproducer (Python, via LiteLLM proxy)

import json, urllib.request

LITELLM_URL = 'http://<litellm-host>/v1/chat/completions'
LITELLM_KEY = '<your-key>'

tools = [{
    'type': 'function',
    'function': {
        'name': 'get_weather',
        'description': 'Get current weather for a city',
        'parameters': {
            'type': 'object',
            'properties': {'city': {'type': 'string'}},
            'required': ['city']
        }
    }
}]

def call(messages, tools=None):
    payload = {'model': 'ollama/gemma4:26b', 'messages': messages, 'stream': False}
    if tools:
        payload['tools'] = tools
    data = json.dumps(payload).encode()
    req = urllib.request.Request(LITELLM_URL, data=data, headers={
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {LITELLM_KEY}'
    })
    with urllib.request.urlopen(req, timeout=120) as r:
        return json.loads(r.read())['choices'][0]['message']

# Step 1: model generates tool call
m1 = call([{'role': 'user', 'content': 'Weather in Warsaw?'}], tools)
print('Step 1 tool_calls:', m1.get('tool_calls'))  # → [get_weather(city=Warsaw)]

# Step 2: send tool result, expect text answer
msgs = [
    {'role': 'user', 'content': 'Weather in Warsaw?'},
    {'role': 'assistant', 'content': '', 'tool_calls': m1['tool_calls']},
    {'role': 'tool', 'tool_call_id': m1['tool_calls'][0]['id'],
     'content': '{"temperature": "15C", "conditions": "sunny"}'}
]
m2 = call(msgs, tools)
print('Step 2 content:', m2.get('content'))        # → '' (BUG: should be text answer)
print('Step 2 tool_calls:', m2.get('tool_calls'))  # → [get_weather(city=Warsaw)] again!

Direct Ollama call (works correctly)

import json, urllib.request

OLLAMA_URL = 'http://<ollama-host>:11434/api/chat'

def ollama_call(messages, tools):
    payload = {'model': 'gemma4:26b', 'stream': False, 'tools': tools, 'messages': messages}
    data = json.dumps(payload).encode()
    req = urllib.request.Request(OLLAMA_URL, data=data,
                                 headers={'Content-Type': 'application/json'})
    with urllib.request.urlopen(req, timeout=120) as r:
        return json.loads(r.read())['message']

# Same scenario — Ollama returns correct text answer on step 2
m1 = ollama_call([{'role': 'user', 'content': 'Weather in Warsaw?'}], tools)
msgs = [
    {'role': 'user', 'content': 'Weather in Warsaw?'},
    {'role': 'assistant', 'content': '', 'tool_calls': m1['tool_calls']},
    {'role': 'tool', 'content': '{"temperature": "15C", "conditions": "sunny"}'}
]
m2 = ollama_call(msgs, tools)
print('content:', m2['content'])  # → "The weather in Warsaw is 15°C and sunny." ✓

Observed vs Expected Behavior

Ollama 0.20.7 via LiteLLM — BUG

Step 1: content='', tool_calls=[get_weather(city=Warsaw)]   ← model calls tool
Step 2: content='', tool_calls=[get_weather(city=Warsaw)]   ← BUG: ignores result, loops
Step 3: content='', tool_calls=[get_weather(city=Warsaw)]   ← same
Step 4: content='', tool_calls=[get_weather(city=Warsaw)]   ← same
Step 5: content='', tool_calls=[get_weather(city=Warsaw)]   ← same

Ollama 0.20.5 via LiteLLM — OK

Step 1: content='', tool_calls=[get_weather(city=Warsaw)]   ← model calls tool
Step 2: content='The weather in Warsaw is 15°C and sunny.'  ← correct text answer ✓

Ollama 0.20.7 directly (no proxy) — OK

Step 1: content='', tool_calls=[get_weather(city=Warsaw)]   ← model calls tool
Step 2: content='The weather in Warsaw is 15°C and sunny.'  ← correct text answer ✓

Evidence from Production Logs

Real production data showing the loop pattern (LLM call log, n8n.llm_log):

logID  promptTokens  completionTokens  tool_called
2358   5232          18                calendar_read(overdue=true)
2359   5237          18                calendar_read(overdue=true)   ← +5 tokens only
2360   5239          18                calendar_read(overdue=true)   ← +2 tokens
2361   5241          18                calendar_read(overdue=true)   ← +2 tokens
2362   5243          18                calendar_read(overdue=true)   ← +2 tokens

Key observation: promptTokens grows by only 2–5 per iteration, but the tool result JSON is ~80 tokens. This confirms the tool result content was not being injected into the model's context — only the tool_call metadata was appended.

The same loop pattern was observed across multiple tools:

calendar_read — identical args, empty result {events:[], tasks:[]}
law_lookup — identical args, non-empty result (model ignored rich legal articles)
dokuwiki_search — identical args, empty result (no wiki pages found)
deep_research — identical args (multi-step internal tool)

Hypothesis

Ollama 0.20.6 changelog states:

"Gemma 4 tool calling ability is improved and updated to use Google's latest post-launch fixes"

This likely changed the chat template used to format tool result messages for the model. LiteLLM 1.82.3 transforms OpenAI-format tool messages before forwarding to Ollama, and this transformation is now incompatible with the updated template.

When calling Ollama directly, the native /api/chat format is used correctly. When going through LiteLLM, the tool result message format sent to Ollama no longer matches what gemma4:26b's updated template expects — causing the model to not "see" the result and re-issue the same tool call.

Additional note: LiteLLM's model info for gemma4:26b reports "supports_function_calling": false, which may cause LiteLLM to apply a fallback prompt-injection path instead of passing tool messages natively.

Workaround

Downgrade Ollama to 0.20.5.

LiteLLM bug report: see docs/bug-reports/litellm-gemma4-tool-messages.md
Ollama changelog: https://github.com/ollama/ollama/releases/tag/v0.20.6

Relevant log output

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.20.7

extent analysis

TL;DR

Downgrade Ollama to version 0.20.5 to resolve the infinite tool call loop issue when using the LiteLLM proxy.

Guidance

Verify that the issue is specific to Ollama versions 0.20.6 and 0.20.7, and that downgrading to 0.20.5 resolves the issue.
Investigate the compatibility of LiteLLM's tool message transformation with the updated chat template in Ollama 0.20.6 and later.
Check the LiteLLM model info for gemma4:26b to confirm that it reports "supports_function_calling": false, which may be causing the fallback prompt-injection path.
Review the Ollama changelog and LiteLLM bug report for more information on the issue and potential fixes.

Example

No code snippet is provided as the issue is related to the compatibility of LiteLLM's tool message transformation with the updated chat template in Ollama.

Notes

The issue seems to be specific to the combination of Ollama 0.20.6 or 0.20.7 and LiteLLM 1.82.3. Downgrading Ollama to 0.20.5 is a confirmed workaround, but a more permanent fix may require updates to either Ollama or LiteLLM.

Recommendation

Apply the workaround by downgrading Ollama to version 0.20.5, as this is a confirmed fix for the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #GPU compatibility #latency issue #model loading #dependency error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

ollama - ✅(Solved) Fix Bug Report: gemma4:26b infinite tool call loop via LiteLLM proxy — Ollama 0.20.6/0.20.7 [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

PR fix notes

PR #26121: fix(ollama): forward tool_calls and tool_call_id in transform_request

Description (problem / solution / changelog)

Summary

Test plan

Changed files

PR #26122: fix(ollama): forward tool_calls and tool_call_id in transform_request

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Code Example

What is the issue?

Bug Report: gemma4:26b infinite tool call loop via LiteLLM proxy — Ollama 0.20.6/0.20.7

Summary

Environment

Steps to Reproduce

Minimal reproducer (Python, via LiteLLM proxy)

Direct Ollama call (works correctly)

Observed vs Expected Behavior

Ollama 0.20.7 via LiteLLM — BUG

Ollama 0.20.5 via LiteLLM — OK

Ollama 0.20.7 directly (no proxy) — OK

Evidence from Production Logs

Hypothesis

Workaround

Related

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING