ollama - 💡(How to fix) Fix Tool calls silently drop with large system prompts (~1600+ tokens) [5 comments, 2 participants]

ollama2026-03-19 15:15:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14958•Fetched 2026-04-08 01:03:59

View on GitHub

Comments

Participants

Timeline

Reactions

Author

cicoyle

Participants

cicoyle

rick-github

Timeline (top)

commented ×5closed ×1cross-referenced ×1labeled ×1

Code Example

$ ollama --version
ollama version is 0.17.6

OS: macOS (Apple M2 Max, 32GB)

Models tested: mistral-small3.1:24b, qwq:32b, qwen2.5:32b

---

curl http://localhost:11434/v1/chat/completions -H "Content-Type: application/json" -d '{
      "model": "mistral-small3.1:24b",
      "messages": [{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the weather?"}],
      "tools": [{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"location":{"type":"string"}}}}}],
      "tool_choice": "required"
  }'

---

sanitized logs:

ollama logs

  time=2026-03-19T09:43:38.139-05:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=7064 format=""                                                                                                                     
  time=2026-03-19T09:43:38.157-05:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=1632 used=0 remaining=1632                                                                                                       
  [GIN] 2026/03/19 - 09:43:47 | 200 | 18.676361417s | ::1 | POST "/v1/chat/completions" 


raw resp captured via a proxy:            
                                                                                                                                                                                             
  {"id":"chatcmpl-638","object":"chat.completion","created":1773931427,"model":"mistral-small3.1:24b","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":""},"finish_reason":"stop"}],"usage":{"prompt_
  tokens":1632,"completion_tokens":31,"total_tokens":1663}} 


Note: prompt_tokens:1632, completion_tokens:31 (tokens generated but not returned as tool_calls), content:"", no tool_calls field.

RAW_BUFFERClick to expand / collapse

What is the issue?

Im using

$ ollama --version
ollama version is 0.17.6

OS: macOS (Apple M2 Max, 32GB)

Models tested: mistral-small3.1:24b, qwq:32b, qwen2.5:32b

When using the OpenAI-compatible /v1/chat/completions endpoint with tool_choice: "required" and a large system prompt (~1600+ tokens), Ollama generates completion tokens, but returns empty content with no tool_calls in the response. The same request with a shorter system prompt works correctly.

Repro: Works (~570 prompt tokens, short system prompt):

  curl http://localhost:11434/v1/chat/completions -H "Content-Type: application/json" -d '{
      "model": "mistral-small3.1:24b",
      "messages": [{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the weather?"}],
      "tools": [{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"location":{"type":"string"}}}}}],
      "tool_choice": "required"
  }'

Note, I did generalize my prompt to weather...

Fails: Same endpoint and tool definitions, but with a system prompt expanded to ~1600 tokens containing detailed multi-step agent instructions. The response returns "content":"", "finish_reason":"stop" with NO tool_calls field, despite "completion_tokens":31 proving the model generated output.

What I'm seeing:

The model generates 31 completion tokens, but they are not captured as tool_calls
This happens across ALL tested models (mistral, qwen, qwq), meaning it's not model-specific
Works perfectly with short prompts + same tools
num_ctx=4096 (default), prompt is 1632 tokens leaving ~2464 tokens for generation, this should be sufficient

I expect: Tool calls should be returned in the response regardless of system prompt length, as long as the prompt fits within the context window.

Relevant log output

sanitized logs:

ollama logs

  time=2026-03-19T09:43:38.139-05:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=7064 format=""                                                                                                                     
  time=2026-03-19T09:43:38.157-05:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=1632 used=0 remaining=1632                                                                                                       
  [GIN] 2026/03/19 - 09:43:47 | 200 | 18.676361417s | ::1 | POST "/v1/chat/completions" 


raw resp captured via a proxy:            
                                                                                                                                                                                             
  {"id":"chatcmpl-638","object":"chat.completion","created":1773931427,"model":"mistral-small3.1:24b","system_fingerprint":"fp_ollama","choices":[{"index":0,"message":{"role":"assistant","content":""},"finish_reason":"stop"}],"usage":{"prompt_
  tokens":1632,"completion_tokens":31,"total_tokens":1663}} 


Note: prompt_tokens:1632, completion_tokens:31 (tokens generated but not returned as tool_calls), content:"", no tool_calls field.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.17.6

extent analysis

Fix Plan

To address the issue of Ollama not returning tool calls with large system prompts, we need to adjust the configuration and potentially modify the code to handle longer prompts. Here are the steps:

Increase the context window size: Try setting num_ctx to a higher value, e.g., num_ctx=8192, to provide more space for generation.
Modify the prompt processing code: Update the code to handle longer prompts by increasing the buffer size or using a more efficient processing algorithm.
Check the tool call extraction logic: Verify that the tool call extraction logic is correct and can handle the generated completion tokens.

Example code snippet to increase the context window size:

import os

# Set the environment variable for num_ctx
os.environ['NUM_CTX'] = '8192'

# Alternatively, you can modify the ollama configuration file
# to increase the num_ctx value

Verification

To verify that the fix worked, send a request with a large system prompt and check the response for the presence of tool calls. You can use the same curl command as before:

curl http://localhost:11434/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "mistral-small3.1:24b",
    "messages": [{"role":"system","content":"...long system prompt..."},{"role":"user","content":"What is the weather?"}],
    "tools": [{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"location":{"type":"string"}}}}}],
    "tool_choice": "required"
}'

Check the response for the presence of tool calls and verify that the content field is not empty.

Extra Tips

Make sure to test the fix with different models and prompt lengths to ensure that it works consistently.
Consider adding logging or debugging statements to the code to help diagnose any further issues.
If you're still experiencing problems, try reducing the prompt length or simplifying the tool call extraction logic to isolate the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Tool calls silently drop with large system prompts (~1600+ tokens) [5 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Tool calls silently drop with large system prompts (~1600+ tokens) [5 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING