ollama - 💡(How to fix) Fix qwen3-vl:8b missing thinking toggle template (think: false ignored) [1 comments, 2 participants]

ollama2026-03-12 13:31:14

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14798•Fetched 2026-04-08 00:43:22

View on GitHub

Comments

Participants

Timeline

Reactions

Author

tankbottoms

Participants

rick-github

tankbottoms

Timeline (top)

closed ×1commented ×1labeled ×1

The qwen3-vl:8b model ships with a bare {{ .Prompt }} template that lacks the $.IsThinkSet / $.Think thinking-control logic present in the qwen3:8b model template. As a result, "think": false in API calls is silently ignored for qwen3-vl:8b, while it works correctly for qwen3:8b.

Root Cause

Fix Action

Workaround

Using raw: true with explicit ChatML formatting, appending /no_think to user messages, and prefilling <think>\n\n</think>\n\n in the assistant turn works correctly:

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3-vl:8b",
  "messages": [
    {"role": "user", "content": "What is 2+2? Answer briefly. /no_think"},
    {"role": "assistant", "content": "<think>\n\n</think>\n\n"}
  ],
  "raw": true,
  "stream": false,
  "options": {"num_predict": 4096}
}'

Code Example

{{ .Prompt }}

---

# qwen3-vl:8b -- bare template
curl -s http://localhost:11434/api/show -d '{"name":"qwen3-vl:8b"}' | jq -r '.template'
# Output: {{ .Prompt }}

# qwen3:8b -- full ChatML template with thinking control
curl -s http://localhost:11434/api/show -d '{"name":"qwen3:8b"}' | jq -r '.template'
# Output: Full template with $.IsThinkSet, $.Think, /no_think logic

---

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3-vl:8b",
  "messages": [{"role": "user", "content": "What is 2+2? Answer briefly."}],
  "think": false,
  "stream": false,
  "options": {"num_predict": 4096}
}'

---

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3:8b",
  "messages": [{"role": "user", "content": "What is 2+2? Answer briefly."}],
  "think": false,
  "stream": false,
  "options": {"num_predict": 4096}
}'

---

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3-vl:8b",
  "messages": [
    {"role": "user", "content": "What is 2+2? Answer briefly. /no_think"},
    {"role": "assistant", "content": "<think>\n\n</think>\n\n"}
  ],
  "raw": true,
  "stream": false,
  "options": {"num_predict": 4096}
}'

RAW_BUFFERClick to expand / collapse

Description

qwen3-vl:8b template (current)

{{ .Prompt }}

No ChatML structure, no thinking toggle -- just a raw prompt passthrough.

qwen3:8b template (correct)

Full ChatML with $.IsThinkSet, $.Think, /think and /no_think toggle, and proper <think> block handling.

Impact

When using the chat API with "think": false, qwen3-vl:8b ignores the flag entirely. All tokens are consumed by thinking output, producing empty actual responses for any non-trivial prompt.

Example: A 4096 num_predict token budget produces 17,054 characters of thinking and 0 characters of response content.

Steps to Reproduce

1. Confirm the template difference

# qwen3-vl:8b -- bare template
curl -s http://localhost:11434/api/show -d '{"name":"qwen3-vl:8b"}' | jq -r '.template'
# Output: {{ .Prompt }}

# qwen3:8b -- full ChatML template with thinking control
curl -s http://localhost:11434/api/show -d '{"name":"qwen3:8b"}' | jq -r '.template'
# Output: Full template with $.IsThinkSet, $.Think, /no_think logic

2. Send a chat request with think: false

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3-vl:8b",
  "messages": [{"role": "user", "content": "What is 2+2? Answer briefly."}],
  "think": false,
  "stream": false,
  "options": {"num_predict": 4096}
}'

Actual result: The response message.content is dominated by <think>...</think> blocks consuming the full token budget. The actual answer is empty or truncated.

Expected result: With "think": false, the model should skip thinking and return a direct response, as qwen3:8b does.

3. Compare with qwen3:8b (works correctly)

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3:8b",
  "messages": [{"role": "user", "content": "What is 2+2? Answer briefly."}],
  "think": false,
  "stream": false,
  "options": {"num_predict": 4096}
}'

This correctly returns a direct answer without thinking blocks.

Workaround

Using raw: true with explicit ChatML formatting, appending /no_think to user messages, and prefilling <think>\n\n</think>\n\n in the assistant turn works correctly:

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3-vl:8b",
  "messages": [
    {"role": "user", "content": "What is 2+2? Answer briefly. /no_think"},
    {"role": "assistant", "content": "<think>\n\n</think>\n\n"}
  ],
  "raw": true,
  "stream": false,
  "options": {"num_predict": 4096}
}'

Expected Behavior

qwen3-vl:8b should ship with the same ChatML template structure as qwen3:8b, including full $.IsThinkSet / $.Think thinking toggle support. Both models are Qwen3-family and support the same thinking/no-thinking modes.

Version Info

Ollama 0.16.2 (tested on spark-1)
Ollama 0.16.3 (tested on spark-2)
Models: qwen3-vl:8b, qwen3:8b
OS: Linux (ARM64, DGX Spark)

extent analysis

Fix Plan

To fix the issue, we need to update the qwen3-vl:8b model template to include the $.IsThinkSet and $.Think thinking-control logic. Here are the steps:

Update the qwen3-vl:8b template to match the qwen3:8b template structure:

// qwen3-vl:8b template (updated)
{{ if $.IsThinkSet }}
  {{ if $.Think }}
    <think>
      {{ .Prompt }}
    </think>
  {{ else }}
    {{ .Prompt }}
  {{ end }}
{{ else }}
  {{ .Prompt }}
{{ end }}

Ensure that the $.IsThinkSet and $.Think variables are properly set based on the API request:

// Set $.IsThinkSet and $.Think variables
if request.Think != nil {
  $.IsThinkSet = true
  $.Think = *request.Think
} else {
  $.IsThinkSet = false
  $.Think = true
}

Update the API handler to use the updated template:

// API handler (updated)
func handleChatRequest(w http.ResponseWriter, r *http.Request) {
  // ...
  template := getTemplate("qwen3-vl:8b")
  // ...
  data := map[string]interface{}{
    "Prompt":     prompt,
    "IsThinkSet": isThinkSet,
    "Think":      think,
  }
  // ...
  tmpl.Execute(w, data)
}

Verification

To verify that the fix worked, send a chat request with "think": false and check that the response does not contain thinking blocks:

curl -s http://localhost:11434/api/chat -d '{
  "model": "qwen3-vl:8b",
  "messages": [{"role": "user", "content": "What is 2+2? Answer briefly."}],
  "think": false,
  "stream": false,
  "options": {"num_predict": 4096}
}'

The response should contain a direct answer without thinking blocks.

Extra Tips

Ensure that the $.IsThinkSet and $.Think variables are properly set based on the API request.
Test the updated template with different input scenarios to ensure that it works as expected.
Consider adding additional logging or debugging statements to help diagnose any issues that may arise.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #API routing #API middleware #SSR setup #ISR setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix qwen3-vl:8b missing thinking toggle template (think: false ignored) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Code Example

Description

qwen3-vl:8b template (current)

qwen3:8b template (correct)

Impact

Steps to Reproduce

1. Confirm the template difference

2. Send a chat request with think: false

3. Compare with qwen3:8b (works correctly)

Workaround

Expected Behavior

Version Info

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix qwen3-vl:8b missing thinking toggle template (think: false ignored) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Code Example

Description

qwen3-vl:8b template (current)

qwen3:8b template (correct)

Impact

Steps to Reproduce

1. Confirm the template difference

2. Send a chat request with think: false

3. Compare with qwen3:8b (works correctly)

Workaround

Expected Behavior

Version Info

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING