ollama - 💡(How to fix) Fix [Bug] gemma4 parser fails to extract tool_calls when combining system prompt + think:false + tools [6 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15539Fetched 2026-04-15 06:20:23
View on GitHub
Comments
6
Participants
2
Timeline
8
Reactions
0
Author
Participants
Assignees
Timeline (top)
commented ×6assigned ×1closed ×1

Fix Action

Fix / Workaround

<h3>Expected behavior</h3> <p>Test 3 should produce the same structured <code>tool_calls</code> output as Test 1, since the only difference is the addition of a system prompt. The <code>think: false</code> flag should disable thinking without breaking tool call parsing.</p> <h3>Impact</h3> <p>This bug makes <code>gemma4:e4b</code> unusable with any client that sends a system prompt alongside tools and <code>think: false</code>, including:</p> <ul> <li><strong>Home Assistant</strong> Ollama integration (always sends a system prompt with entity definitions)</li> <li>Any OpenAI-compatible client using system prompts with tool definitions</li> </ul> <p>The workaround of leaving thinking enabled (Test 2) works for tool calling but adds 10+ seconds of latency and causes thinking tokens to leak into streaming clients.</p> <h3>Possibly related issues</h3> <ul> <li>#15241 — gemma4 tool call parsing fails</li> <li>#15315 — gemma4:e4b tool parsing errors persist in 0.20.1</li> <li>#15254 — fix gemma4 arg parsing with quoted strings</li> <li>#15306 — rework gemma4 tool call handling</li> </ul></body></html><!--EndFragment--> </body> </html>
RAW_BUFFERClick to expand / collapse
<h3>What is the issue?</h3> <p>The <code>gemma4</code> parser in Ollama 0.20.6 fails to extract tool calls from the model response when a <strong>system prompt</strong> is combined with <strong><code>think: false</code></strong> and <strong>tools</strong>. The model correctly generates the tool call JSON, but the parser does not intercept it — the raw JSON leaks into the <code>content</code> field instead of being placed in the <code>tool_calls</code> field.</p> <p>This breaks Home Assistant's Ollama integration, which always sends a system prompt (containing assistant instructions and exposed entity definitions) along with tool definitions.</p> <h3>Environment</h3> <ul> <li><strong>Ollama version:</strong> 0.20.6</li> <li><strong>Model:</strong> <code>gemma4:e4b</code> (official, pulled via <code>ollama pull gemma4:e4b</code>)</li> <li><strong>OS:</strong> Ubuntu 24.04 (LXC container on Proxmox VE)</li> <li><strong>Hardware:</strong> AMD Ryzen 7 8745HS, Radeon 780M iGPU (ROCm), 32 GB RAM</li> <li><strong>Client:</strong> Home Assistant OS (Core 2026.4.2, Supervisor 2026.03.3, OS 17.2, Frontend 20260325.7) Ollama integration + direct curl testing</li> </ul> <h3>Reproduction steps</h3> <p>Run the following three curl commands against a fresh <code>gemma4:e4b</code> model. They demonstrate that the bug only occurs with a specific combination.</p> <p><strong>Test 1 — No system prompt + <code>think: false</code> → ✅ WORKS</strong></p> <pre><code class="language-bash">curl -s http://localhost:11434/api/chat -d '{ "model": "gemma4:e4b", "messages": [{"role": "user", "content": "What is the weather in Talence?"}], "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather info", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}}], "stream": false, "think": false }' | python3 -m json.tool </code></pre> <p><strong>Result:</strong> <code>content</code> is empty, <code>tool_calls</code> is correctly populated:</p> <pre><code class="language-json">{ "message": { "role": "assistant", "content": "", "tool_calls": [ { "id": "call_g76u5xbz", "function": { "index": 0, "name": "get_weather", "arguments": {"location": "Talence"} } } ] } } </code></pre> <p><strong>Test 2 — System prompt + thinking active (default) → ⚠️ tool_calls OK but thinking leaks</strong></p> <pre><code class="language-bash">curl -s http://localhost:11434/api/chat -d '{ "model": "gemma4:e4b", "messages": [ {"role": "system", "content": "Tu es Jarvis, assistant domotique. Réponds en français."}, {"role": "user", "content": "Je veux la météo"} ], "tools": [{"type": "function", "function": {"name": "GetLiveContext", "description": "Get live context", "parameters": {"type": "object", "properties": {}, "required": []}}}], "stream": false }' | python3 -m json.tool </code></pre> <p><strong>Result:</strong> <code>tool_calls</code> is correctly populated, but <code>thinking</code> field contains a long reasoning chain (14 seconds). The tool calling itself works:</p> <pre><code class="language-json">{ "message": { "role": "assistant", "content": "", "thinking": "1. **Analyze the Request:** ... (long reasoning) ... 6. **Generate the tool call:** Call GetLiveContext.", "tool_calls": [ { "id": "call_bgl0bmz2", "function": { "index": 0, "name": "GetLiveContext", "arguments": {} } } ] } } </code></pre> <p><strong>Test 3 — System prompt + <code>think: false</code> → ❌ BUG — tool_calls not parsed</strong></p> <pre><code class="language-bash">curl -s http://localhost:11434/api/chat -d '{ "model": "gemma4:e4b", "messages": [ {"role": "system", "content": "Tu es Jarvis, assistant domotique. Réponds en français."}, {"role": "user", "content": "Je veux la météo"} ], "tools": [{"type": "function", "function": {"name": "GetLiveContext", "description": "Get live context", "parameters": {"type": "object", "properties": {}, "required": []}}}], "stream": false, "think": false }' | python3 -m json.tool </code></pre> <p><strong>Result:</strong> The model generates the correct tool call JSON, but the parser does NOT intercept it. The raw JSON leaks into <code>content</code> with a trailing <code>&lt;channel|&gt;</code> token:</p> <pre><code class="language-json">{ "message": { "role": "assistant", "content": "{\n \"tool_calls\": [\n {\n \"function\": \"GetLiveContext\",\n \"args\": {}\n }\n ]\n}\n&lt;channel|&gt;" } } </code></pre> <p>No <code>tool_calls</code> field is present. No <code>thinking</code> field.</p> <h3>Summary</h3>
TestSystem promptthink: falsetool_calls parsedDuration
1❌ No✅ Yes✅ Yes~2s
2✅ Yes❌ No✅ Yes (but thinking leaks)~14s
3✅ Yes✅ Yes❌ No — JSON in content~2s
<h3>Expected behavior</h3> <p>Test 3 should produce the same structured <code>tool_calls</code> output as Test 1, since the only difference is the addition of a system prompt. The <code>think: false</code> flag should disable thinking without breaking tool call parsing.</p> <h3>Impact</h3> <p>This bug makes <code>gemma4:e4b</code> unusable with any client that sends a system prompt alongside tools and <code>think: false</code>, including:</p> <ul> <li><strong>Home Assistant</strong> Ollama integration (always sends a system prompt with entity definitions)</li> <li>Any OpenAI-compatible client using system prompts with tool definitions</li> </ul> <p>The workaround of leaving thinking enabled (Test 2) works for tool calling but adds 10+ seconds of latency and causes thinking tokens to leak into streaming clients.</p> <h3>Possibly related issues</h3> <ul> <li>#15241 — gemma4 tool call parsing fails</li> <li>#15315 — gemma4:e4b tool parsing errors persist in 0.20.1</li> <li>#15254 — fix gemma4 arg parsing with quoted strings</li> <li>#15306 — rework gemma4 tool call handling</li> </ul></body></html><!--EndFragment--> </body> </html>

extent analysis

TL;DR

The gemma4 parser in Ollama 0.20.6 fails to extract tool calls when a system prompt is combined with think: false and tools, causing the raw JSON to leak into the content field.

Guidance

  • The issue seems to be related to the combination of a system prompt, think: false, and tools, which causes the parser to fail to extract tool calls.
  • To verify, run the provided curl commands (Test 1, Test 2, and Test 3) against a fresh gemma4:e4b model to reproduce the issue.
  • As a temporary workaround, consider leaving thinking enabled (as in Test 2) to allow tool calling to work, although this adds latency and causes thinking tokens to leak into streaming clients.
  • Review the possibly related issues (#15241, #15315, #15254, #15306) to see if they provide any insights or fixes for the tool call parsing issue.

Example

No code snippet is provided as the issue is related to the gemma4 parser and its interaction with system prompts, think: false, and tools.

Notes

The issue is specific to the gemma4:e4b model and Ollama version 0.20.6, and may not be applicable to other models or versions. The provided curl commands and test results are essential to understanding and reproducing the issue.

Recommendation

Apply the workaround of leaving thinking enabled (as in Test 2) until a fix is available, as it allows tool calling to work, although it adds latency and causes thinking tokens to leak into streaming clients.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING