ollama - 💡(How to fix) Fix Add enable_thinking parameter to disable CoT/Reasoning generation [2 comments, 2 participants]

ollama2026-03-20 02:53:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14972•Fetched 2026-04-08 01:03:42

View on GitHub

Comments

Participants

Timeline

Reactions

Author

HavenCTO

Participants

HavenCTO

rick-github

Timeline (top)

commented ×2closed ×1labeled ×1

Fix Action

Fix / Workaround

Current Issues:
Unreliable Workarounds: System prompts ("Do not think") are often ignored.
Performance Hit: Models generate 5-10x more tokens on "thinking" than the actual answer, drastically slowing
inference (e.g., 80 t/s → 10 t/s).
Truncation: Users must set extremely low num_predict limits, cutting off valid answers mid-sentence.
Stability: High reasoning settings frequently cause infinite loops in agents.

RAW_BUFFERClick to expand / collapse

Requesting a native boolean flag (enable_thinking: false) to explicitly disable Chain-of-Thought (CoT) reasoning for models like Qwen 3.5 and GPT-OSS.

Current Issues:
Unreliable Workarounds: System prompts ("Do not think") are often ignored.
Performance Hit: Models generate 5-10x more tokens on "thinking" than the actual answer, drastically slowing
inference (e.g., 80 t/s → 10 t/s).
Truncation: Users must set extremely low num_predict limits, cutting off valid answers mid-sentence.
Stability: High reasoning settings frequently cause infinite loops in agents.

Proposed Solution:

Add a dedicated parameter to the API and Modelfile to toggle reasoning generation off, ensuring direct, fast, and stable responses without relying on fragile prompt engineering.

extent analysis

Fix Plan

To address the issue, we will implement a native boolean flag enable_thinking in the API and model file. This flag will allow users to explicitly disable Chain-of-Thought (CoT) reasoning for models like Qwen 3.5 and GPT-OSS.

Implementation Steps

Add a new parameter enable_thinking to the API with a default value of True.
Update the model file to accept the enable_thinking parameter and toggle reasoning generation accordingly.
Modify the inference logic to bypass reasoning generation when enable_thinking is False.

Example Code

# API update
def generate_text(prompt, enable_thinking=True):
    # ...

# Model file update
class Qwen35Model:
    def __init__(self, enable_thinking=True):
        self.enable_thinking = enable_thinking

    def generate_text(self, prompt):
        if not self.enable_thinking:
            # Bypass reasoning generation
            return self.direct_generate_text(prompt)
        # ...

# Inference logic update
def inference(model, prompt):
    if not model.enable_thinking:
        # Direct generation
        return model.direct_generate_text(prompt)
    # ...

Verification

To verify the fix, test the API and model with the enable_thinking flag set to False. Measure the performance and response quality to ensure that the fix addresses the current issues.

Extra Tips

Document the enable_thinking parameter in the API and model documentation to ensure users understand its purpose and usage.
Consider adding additional logging or monitoring to track the usage and effectiveness of the enable_thinking flag.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #serialization error #model compatibility #GPU setup #container setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Add enable_thinking parameter to disable CoT/Reasoning generation [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

extent analysis

Fix Plan

Implementation Steps

Example Code

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Add enable_thinking parameter to disable CoT/Reasoning generation [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

extent analysis

Fix Plan

Implementation Steps

Example Code

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING