ollama - ✅(Solved) Fix format is ignored when think is disabled for qwen3.5 series [1 pull requests, 7 comments, 4 participants]

ollama2026-03-05 18:42:09

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14645•Fetched 2026-04-08 00:33:21

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

subscribed ×12cross-referenced ×8commented ×7referenced ×2

Fix Action

Fixed

Fixed by PR: server: apply format constraint when thinking is disabled (https://github.com/ollama/ollama/pull/14660)

PR fix notes

PR #14660: server: apply format constraint when thinking is disabled

Repository: ollama/ollama
Author: majiayu000
State: open | merged: False
Link: https://github.com/ollama/ollama/pull/14660

Description (problem / solution / changelog)

Summary

Fix format/structured outputs being silently ignored when think=false on thinking-capable models (e.g. qwen3.5)

Problem

When sending think=false + format=json, the structured outputs logic still defers format masking (sets currentFormat = nil) because the condition only checks whether the model has a builtin parser or thinking capability, not whether thinking is actually enabled for the request. Since no thinking content is produced, the restart signal never fires, and format masking is never applied.

Fix

Add a thinkEnabled check so that format masking is only deferred when thinking will actually produce content.

Test plan

Added format applied when think disabled test to TestChatWithPromptEndingInThinkTag
Verifies that with think=false + format, the completion receives the format constraint directly (single call, format not nil)
All existing structured outputs tests pass (think=true behavior unchanged)

Fixes #14645

Changed files

server/routes.go (modified, +2/-1)
server/routes_generate_test.go (modified, +63/-0)

Code Example

response = client.chat(
    model = 'qwen3.5:35b-a3b',
    messages=[{'role': 'user', 'content': 'why is the sky blue'}],
    think=True,
    options={
        'temperature': 0
    }
)

print('Thinking exists?', 'thinking' in response['message'])
print('===')
print(response['message']['content'])

---

Thinking exists? True
===
The sky is blue due to a phenomenon called **Rayleigh scattering**. Here is a simple breakdown of how it works:

**1. Sunlight looks white, but isn't**
...

---

response = client.chat(
    model = 'qwen3.5:35b-a3b',
    messages=[{'role': 'user', 'content': 'why is the sky blue'}],
    think=False,
    options={
        'temperature': 0
    }
)

print('Thinking exists?', 'thinking' in response['message'])
print('===')
print(response['message']['content'])

---

Thinking exists? False
===
The sky appears blue due to a phenomenon called **Rayleigh scattering**.

Here is how it works:
...

---

response = client.chat(
    model = 'qwen3.5:35b-a3b',
    messages=[{'role': 'user', 'content': 'why is the sky blue'}],
    think=True,
    format='json',
    options={
        'temperature': 0
    }
)

print('Thinking exists?', 'thinking' in response['message'])
print('===')
print(response['message']['content'])

---

Thinking exists? True
===
{"answer":"The sky is blue due to a phenomenon called
...

---

response = client.chat(
    model = 'qwen3.5:35b-a3b',
    messages=[{'role': 'user', 'content': 'why is the sky blue'}],
    think=False,
    format='json',
    options={
        'temperature': 0
    }
)

print('Thinking exists?', 'thinking' in response['message'])
print('===')
print(response['message']['content'])

---

Thinking exists? False
===
The sky appears blue due to a phenomenon called **Rayleigh scattering**.

Here is how it works:
...

RAW_BUFFERClick to expand / collapse

What is the issue?

Format is ignored when think is disabled for qwen3.5 series

I put an example here, and set temperature to 0, so that anyone can try to reproduce. Ollama version: 0.17.6 Model: qwen3.5:35b-a3b (3460ffeede54)

I believe this can be fixed with 1) a proper output token probability masking, and 2) an empty thinking tag <think>\n\n</think>\n\n in template when thinking is disabled. https://huggingface.co/Qwen/Qwen3.5-35B-A3B/blob/main/chat_template.jinja#L149

It appears to be ollama is expecting the end of thinking token, before it engages the probability masking for formatting. But since the tag is already closed in the template, the model actually never outputs that. As result, the masking is never applied.

Relevant output

[think = True, format = None] Normal since format is not enabled.

response = client.chat(
    model = 'qwen3.5:35b-a3b',
    messages=[{'role': 'user', 'content': 'why is the sky blue'}],
    think=True,
    options={
        'temperature': 0
    }
)

print('Thinking exists?', 'thinking' in response['message'])
print('===')
print(response['message']['content'])

Thinking exists? True
===
The sky is blue due to a phenomenon called **Rayleigh scattering**. Here is a simple breakdown of how it works:

**1. Sunlight looks white, but isn't**
...

[think = False, format = None] Again, normal since format is not enabled.

response = client.chat(
    model = 'qwen3.5:35b-a3b',
    messages=[{'role': 'user', 'content': 'why is the sky blue'}],
    think=False,
    options={
        'temperature': 0
    }
)

print('Thinking exists?', 'thinking' in response['message'])
print('===')
print(response['message']['content'])

Thinking exists? False
===
The sky appears blue due to a phenomenon called **Rayleigh scattering**.

Here is how it works:
...

[think = True, format = 'json'] Normal, which proves format alone is working if thinking enabled.

response = client.chat(
    model = 'qwen3.5:35b-a3b',
    messages=[{'role': 'user', 'content': 'why is the sky blue'}],
    think=True,
    format='json',
    options={
        'temperature': 0
    }
)

print('Thinking exists?', 'thinking' in response['message'])
print('===')
print(response['message']['content'])

Thinking exists? True
===
{"answer":"The sky is blue due to a phenomenon called
...

[think = False, format = 'json'] It is not returning json in this case, which shows format is ignored only when thinking is disabled.

response = client.chat(
    model = 'qwen3.5:35b-a3b',
    messages=[{'role': 'user', 'content': 'why is the sky blue'}],
    think=False,
    format='json',
    options={
        'temperature': 0
    }
)

print('Thinking exists?', 'thinking' in response['message'])
print('===')
print(response['message']['content'])

Thinking exists? False
===
The sky appears blue due to a phenomenon called **Rayleigh scattering**.

Here is how it works:
...

Ollama version

0.17.6

extent analysis

Fix Plan

To fix the issue of format being ignored when thinking is disabled for the qwen3.5 series, we need to modify the template to include an empty thinking tag when thinking is disabled. We also need to apply proper output token probability masking.

Step-by-Step Solution

Modify the template: Update the chat_template.jinja file to include an empty thinking tag when thinking is disabled.

{% if think %}
    <think>\n\n</think>\n\n
{% else %}
    <think></think>
{% endif %}

Apply output token probability masking: Modify the client.chat function to apply probability masking for formatting when thinking is disabled.

def chat(self, model, messages, think, format, options):
    # ...
    if not think and format:
        # Apply probability masking for formatting
        response['message']['content'] = self.apply_formatting_mask(response['message']['content'], format)
    # ...

Implement the apply_formatting_mask function: Create a new function to apply the probability masking for formatting.

def apply_formatting_mask(self, content, format):
    if format == 'json':
        # Convert the content to JSON format
        return json.dumps({'answer': content})
    # ...

Verification

To verify that the fix worked, run the following test cases:

response = client.chat(
    model='qwen3.5:35b-a3b',
    messages=[{'role': 'user', 'content': 'why is the sky blue'}],
    think=False,
    format='json',
    options={
        'temperature': 0
    }
)

print('Thinking exists?', 'thinking' in response['message'])
print('===')
print(response['message']['content'])

This should output the response in JSON format when thinking is disabled.

Extra Tips

Make sure to update the chat_template.jinja file correctly to include the empty thinking tag when thinking is disabled.
Test the fix thoroughly to ensure that it works as expected for different formats and thinking settings.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #authentication setup #request error #file not found #serialization error #model compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - ✅(Solved) Fix format is ignored when think is disabled for qwen3.5 series [1 pull requests, 7 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #14660: server: apply format constraint when thinking is disabled

Description (problem / solution / changelog)

Summary

Problem

Fix

Test plan

Changed files

Code Example

What is the issue?

Relevant output

Ollama version

extent analysis

Fix Plan

Step-by-Step Solution

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - ✅(Solved) Fix format is ignored when think is disabled for qwen3.5 series [1 pull requests, 7 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #14660: server: apply format constraint when thinking is disabled

Description (problem / solution / changelog)

Summary

Problem

Fix

Test plan

Changed files

Code Example

What is the issue?

Relevant output

Ollama version

extent analysis

Fix Plan

Step-by-Step Solution

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING