ollama - 💡(How to fix) Fix Tool calls silently drop with large system prompts (~1600+ tokens) [5 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14958Fetched 2026-04-08 01:03:59
View on GitHub
Comments
5
Participants
2
Timeline
10
Reactions
0
Author
Participants
Timeline (top)
commented ×5closed ×1cross-referenced ×1labeled ×1

Code Example

$ ollama --version
ollama version is 0.17.6

OS: macOS (Apple M2 Max, 32GB)

Models tested: mistral-small3.1:24b, qwq:32b, qwen2.5:32b

---

curl http://localhost:11434/v1/chat/completions -H "Content-Type: application/json" -d '{
      "model": "mistral-small3.1:24b",
      "messages": [{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the weather?"}],
      "tools": [{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"location":{"type":"string"}}}}}],
      "tool_choice": "required"
  }'

---

sanitized logs:

ollama logs

  time=2026-03-19T09:43:38.139-05:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=7064 format=""                                                                                                                     
  time=2026-03-19T09:43:38.157-05:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=1632 used=0 remaining=1632                                                                                                       
  [GIN] 2026/03/19 - 09:43:47 | 200 | 18.676361417s | ::1 | POST "/v1/chat/completions" 


raw resp captured via a proxy:            
                                                                                                                                                                                             
  {"id":"chatcmpl-638","object":"chat.completion","created":1773931427,"model":"mistral-small3.1:24b","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":""},"finish_reason":"stop"}],"usage":{"prompt_
  tokens":1632,"completion_tokens":31,"total_tokens":1663}} 


Note: prompt_tokens:1632, completion_tokens:31 (tokens generated but not returned as tool_calls), content:"", no tool_calls field.
RAW_BUFFERClick to expand / collapse

What is the issue?

Im using

$ ollama --version
ollama version is 0.17.6

OS: macOS (Apple M2 Max, 32GB)

Models tested: mistral-small3.1:24b, qwq:32b, qwen2.5:32b

When using the OpenAI-compatible /v1/chat/completions endpoint with tool_choice: "required" and a large system prompt (~1600+ tokens), Ollama generates completion tokens, but returns empty content with no tool_calls in the response. The same request with a shorter system prompt works correctly.

Repro: Works (~570 prompt tokens, short system prompt):

  curl http://localhost:11434/v1/chat/completions -H "Content-Type: application/json" -d '{
      "model": "mistral-small3.1:24b",
      "messages": [{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the weather?"}],
      "tools": [{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"location":{"type":"string"}}}}}],
      "tool_choice": "required"
  }'

Note, I did generalize my prompt to weather...

Fails: Same endpoint and tool definitions, but with a system prompt expanded to ~1600 tokens containing detailed multi-step agent instructions. The response returns "content":"", "finish_reason":"stop" with NO tool_calls field, despite "completion_tokens":31 proving the model generated output.

What I'm seeing:

  • The model generates 31 completion tokens, but they are not captured as tool_calls
  • This happens across ALL tested models (mistral, qwen, qwq), meaning it's not model-specific
  • Works perfectly with short prompts + same tools
  • num_ctx=4096 (default), prompt is 1632 tokens leaving ~2464 tokens for generation, this should be sufficient

I expect: Tool calls should be returned in the response regardless of system prompt length, as long as the prompt fits within the context window.

Relevant log output

sanitized logs:

ollama logs

  time=2026-03-19T09:43:38.139-05:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=7064 format=""                                                                                                                     
  time=2026-03-19T09:43:38.157-05:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=1632 used=0 remaining=1632                                                                                                       
  [GIN] 2026/03/19 - 09:43:47 | 200 | 18.676361417s | ::1 | POST "/v1/chat/completions" 


raw resp captured via a proxy:            
                                                                                                                                                                                             
  {"id":"chatcmpl-638","object":"chat.completion","created":1773931427,"model":"mistral-small3.1:24b","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":""},"finish_reason":"stop"}],"usage":{"prompt_
  tokens":1632,"completion_tokens":31,"total_tokens":1663}} 


Note: prompt_tokens:1632, completion_tokens:31 (tokens generated but not returned as tool_calls), content:"", no tool_calls field.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.17.6

extent analysis

Fix Plan

To address the issue of Ollama not returning tool calls with large system prompts, we need to adjust the configuration and potentially modify the code to handle longer prompts. Here are the steps:

  • Increase the context window size: Try setting num_ctx to a higher value, e.g., num_ctx=8192, to provide more space for generation.
  • Modify the prompt processing code: Update the code to handle longer prompts by increasing the buffer size or using a more efficient processing algorithm.
  • Check the tool call extraction logic: Verify that the tool call extraction logic is correct and can handle the generated completion tokens.

Example code snippet to increase the context window size:

import os

# Set the environment variable for num_ctx
os.environ['NUM_CTX'] = '8192'

# Alternatively, you can modify the ollama configuration file
# to increase the num_ctx value

Verification

To verify that the fix worked, send a request with a large system prompt and check the response for the presence of tool calls. You can use the same curl command as before:

curl http://localhost:11434/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "mistral-small3.1:24b",
    "messages": [{"role":"system","content":"...long system prompt..."},{"role":"user","content":"What is the weather?"}],
    "tools": [{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"location":{"type":"string"}}}}}],
    "tool_choice": "required"
}'

Check the response for the presence of tool calls and verify that the content field is not empty.

Extra Tips

  • Make sure to test the fix with different models and prompt lengths to ensure that it works consistently.
  • Consider adding logging or debugging statements to the code to help diagnose any further issues.
  • If you're still experiencing problems, try reducing the prompt length or simplifying the tool call extraction logic to isolate the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING