ollama - 💡(How to fix) Fix Add enable_thinking parameter to disable CoT/Reasoning generation [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14972Fetched 2026-04-08 01:03:42
View on GitHub
Comments
2
Participants
2
Timeline
4
Reactions
0
Author
Timeline (top)
commented ×2closed ×1labeled ×1

Fix Action

Fix / Workaround

  • Current Issues:
  • Unreliable Workarounds: System prompts ("Do not think") are often ignored.
  • Performance Hit: Models generate 5-10x more tokens on "thinking" than the actual answer, drastically slowing
  • inference (e.g., 80 t/s → 10 t/s).
  • Truncation: Users must set extremely low num_predict limits, cutting off valid answers mid-sentence.
  • Stability: High reasoning settings frequently cause infinite loops in agents.
RAW_BUFFERClick to expand / collapse

Requesting a native boolean flag (enable_thinking: false) to explicitly disable Chain-of-Thought (CoT) reasoning for models like Qwen 3.5 and GPT-OSS.

  • Current Issues:
  • Unreliable Workarounds: System prompts ("Do not think") are often ignored.
  • Performance Hit: Models generate 5-10x more tokens on "thinking" than the actual answer, drastically slowing
  • inference (e.g., 80 t/s → 10 t/s).
  • Truncation: Users must set extremely low num_predict limits, cutting off valid answers mid-sentence.
  • Stability: High reasoning settings frequently cause infinite loops in agents.

Proposed Solution:

Add a dedicated parameter to the API and Modelfile to toggle reasoning generation off, ensuring direct, fast, and stable responses without relying on fragile prompt engineering.

extent analysis

Fix Plan

To address the issue, we will implement a native boolean flag enable_thinking in the API and model file. This flag will allow users to explicitly disable Chain-of-Thought (CoT) reasoning for models like Qwen 3.5 and GPT-OSS.

Implementation Steps

  • Add a new parameter enable_thinking to the API with a default value of True.
  • Update the model file to accept the enable_thinking parameter and toggle reasoning generation accordingly.
  • Modify the inference logic to bypass reasoning generation when enable_thinking is False.

Example Code

# API update
def generate_text(prompt, enable_thinking=True):
    # ...

# Model file update
class Qwen35Model:
    def __init__(self, enable_thinking=True):
        self.enable_thinking = enable_thinking

    def generate_text(self, prompt):
        if not self.enable_thinking:
            # Bypass reasoning generation
            return self.direct_generate_text(prompt)
        # ...

# Inference logic update
def inference(model, prompt):
    if not model.enable_thinking:
        # Direct generation
        return model.direct_generate_text(prompt)
    # ...

Verification

To verify the fix, test the API and model with the enable_thinking flag set to False. Measure the performance and response quality to ensure that the fix addresses the current issues.

Extra Tips

  • Document the enable_thinking parameter in the API and model documentation to ensure users understand its purpose and usage.
  • Consider adding additional logging or monitoring to track the usage and effectiveness of the enable_thinking flag.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING