ollama - 💡(How to fix) Fix qwen3-vl:8b missing thinking toggle template (think: false ignored) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14798Fetched 2026-04-08 00:43:22
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Timeline (top)
closed ×1commented ×1labeled ×1

The qwen3-vl:8b model ships with a bare {{ .Prompt }} template that lacks the $.IsThinkSet / $.Think thinking-control logic present in the qwen3:8b model template. As a result, "think": false in API calls is silently ignored for qwen3-vl:8b, while it works correctly for qwen3:8b.

Root Cause

The qwen3-vl:8b model ships with a bare {{ .Prompt }} template that lacks the $.IsThinkSet / $.Think thinking-control logic present in the qwen3:8b model template. As a result, "think": false in API calls is silently ignored for qwen3-vl:8b, while it works correctly for qwen3:8b.

Fix Action

Workaround

Using raw: true with explicit ChatML formatting, appending /no_think to user messages, and prefilling <think>\n\n</think>\n\n in the assistant turn works correctly:

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3-vl:8b",
  "messages": [
    {"role": "user", "content": "What is 2+2? Answer briefly. /no_think"},
    {"role": "assistant", "content": "<think>\n\n</think>\n\n"}
  ],
  "raw": true,
  "stream": false,
  "options": {"num_predict": 4096}
}'

Code Example

{{ .Prompt }}

---

# qwen3-vl:8b -- bare template
curl -s http://localhost:11434/api/show -d '{"name":"qwen3-vl:8b"}' | jq -r '.template'
# Output: {{ .Prompt }}

# qwen3:8b -- full ChatML template with thinking control
curl -s http://localhost:11434/api/show -d '{"name":"qwen3:8b"}' | jq -r '.template'
# Output: Full template with $.IsThinkSet, $.Think, /no_think logic

---

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3-vl:8b",
  "messages": [{"role": "user", "content": "What is 2+2? Answer briefly."}],
  "think": false,
  "stream": false,
  "options": {"num_predict": 4096}
}'

---

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3:8b",
  "messages": [{"role": "user", "content": "What is 2+2? Answer briefly."}],
  "think": false,
  "stream": false,
  "options": {"num_predict": 4096}
}'

---

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3-vl:8b",
  "messages": [
    {"role": "user", "content": "What is 2+2? Answer briefly. /no_think"},
    {"role": "assistant", "content": "<think>\n\n</think>\n\n"}
  ],
  "raw": true,
  "stream": false,
  "options": {"num_predict": 4096}
}'
RAW_BUFFERClick to expand / collapse

Description

The qwen3-vl:8b model ships with a bare {{ .Prompt }} template that lacks the $.IsThinkSet / $.Think thinking-control logic present in the qwen3:8b model template. As a result, "think": false in API calls is silently ignored for qwen3-vl:8b, while it works correctly for qwen3:8b.

qwen3-vl:8b template (current)

{{ .Prompt }}

No ChatML structure, no thinking toggle -- just a raw prompt passthrough.

qwen3:8b template (correct)

Full ChatML with $.IsThinkSet, $.Think, /think and /no_think toggle, and proper <think> block handling.

Impact

When using the chat API with "think": false, qwen3-vl:8b ignores the flag entirely. All tokens are consumed by thinking output, producing empty actual responses for any non-trivial prompt.

Example: A 4096 num_predict token budget produces 17,054 characters of thinking and 0 characters of response content.

Steps to Reproduce

1. Confirm the template difference

# qwen3-vl:8b -- bare template
curl -s http://localhost:11434/api/show -d '{"name":"qwen3-vl:8b"}' | jq -r '.template'
# Output: {{ .Prompt }}

# qwen3:8b -- full ChatML template with thinking control
curl -s http://localhost:11434/api/show -d '{"name":"qwen3:8b"}' | jq -r '.template'
# Output: Full template with $.IsThinkSet, $.Think, /no_think logic

2. Send a chat request with think: false

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3-vl:8b",
  "messages": [{"role": "user", "content": "What is 2+2? Answer briefly."}],
  "think": false,
  "stream": false,
  "options": {"num_predict": 4096}
}'

Actual result: The response message.content is dominated by <think>...</think> blocks consuming the full token budget. The actual answer is empty or truncated.

Expected result: With "think": false, the model should skip thinking and return a direct response, as qwen3:8b does.

3. Compare with qwen3:8b (works correctly)

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3:8b",
  "messages": [{"role": "user", "content": "What is 2+2? Answer briefly."}],
  "think": false,
  "stream": false,
  "options": {"num_predict": 4096}
}'

This correctly returns a direct answer without thinking blocks.

Workaround

Using raw: true with explicit ChatML formatting, appending /no_think to user messages, and prefilling <think>\n\n</think>\n\n in the assistant turn works correctly:

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3-vl:8b",
  "messages": [
    {"role": "user", "content": "What is 2+2? Answer briefly. /no_think"},
    {"role": "assistant", "content": "<think>\n\n</think>\n\n"}
  ],
  "raw": true,
  "stream": false,
  "options": {"num_predict": 4096}
}'

Expected Behavior

qwen3-vl:8b should ship with the same ChatML template structure as qwen3:8b, including full $.IsThinkSet / $.Think thinking toggle support. Both models are Qwen3-family and support the same thinking/no-thinking modes.

Version Info

  • Ollama 0.16.2 (tested on spark-1)
  • Ollama 0.16.3 (tested on spark-2)
  • Models: qwen3-vl:8b, qwen3:8b
  • OS: Linux (ARM64, DGX Spark)

extent analysis

Fix Plan

To fix the issue, we need to update the qwen3-vl:8b model template to include the $.IsThinkSet and $.Think thinking-control logic. Here are the steps:

  • Update the qwen3-vl:8b template to match the qwen3:8b template structure:
// qwen3-vl:8b template (updated)
{{ if $.IsThinkSet }}
  {{ if $.Think }}
    <think>
      {{ .Prompt }}
    </think>
  {{ else }}
    {{ .Prompt }}
  {{ end }}
{{ else }}
  {{ .Prompt }}
{{ end }}
  • Ensure that the $.IsThinkSet and $.Think variables are properly set based on the API request:
// Set $.IsThinkSet and $.Think variables
if request.Think != nil {
  $.IsThinkSet = true
  $.Think = *request.Think
} else {
  $.IsThinkSet = false
  $.Think = true
}
  • Update the API handler to use the updated template:
// API handler (updated)
func handleChatRequest(w http.ResponseWriter, r *http.Request) {
  // ...
  template := getTemplate("qwen3-vl:8b")
  // ...
  data := map[string]interface{}{
    "Prompt":     prompt,
    "IsThinkSet": isThinkSet,
    "Think":      think,
  }
  // ...
  tmpl.Execute(w, data)
}

Verification

To verify that the fix worked, send a chat request with "think": false and check that the response does not contain thinking blocks:

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3-vl:8b",
  "messages": [{"role": "user", "content": "What is 2+2? Answer briefly."}],
  "think": false,
  "stream": false,
  "options": {"num_predict": 4096}
}'

The response should contain a direct answer without thinking blocks.

Extra Tips

  • Ensure that the $.IsThinkSet and $.Think variables are properly set based on the API request.
  • Test the updated template with different input scenarios to ensure that it works as expected.
  • Consider adding additional logging or debugging statements to help diagnose any issues that may arise.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix qwen3-vl:8b missing thinking toggle template (think: false ignored) [1 comments, 2 participants]