ollama - ✅(Solved) Fix generate API ignores think=false for qwen3.5 (chat API works) [2 pull requests, 4 comments, 3 participants]

Q: Expected behavior

`think: false` should disable thinking on BOTH `generate` and `chat` endpoints. When thinking is disabled, all `num_predict` tokens should be available for the visible response.

andreamoro · 2026-03-12T09:29:51Z

[ollama] PR 3: Extend syllabus - Repository: sebadp/wasap-accademy - Author: sebadp - State: closed | merged: True - Link: https://github.com/sebadp/wasap-acca… # PR #3: Extend syllabus - Repository: sebadp/wasap-accademy - Author: sebadp - State: closed | merged: True - Link: https://github.com/sebadp/wasap-accademy/pull/3 ## Description (problem / solution / changelog) ## Summary by cubic Expands the curriculum to 23 modules (~115 lessons) with the full v2 syllabus, and marks the expansion as Done across plans and features. Foundation modules (0–2) are rewritten for LocalForge with `Ollama`, a cleaner webhook flow, Telegram support, and clearer onboarding. - **New Features** - Added `docs/curriculum/SYLLABUS.md` and PRD `docs/exec-plans/04-curriculum-expansion_prd.md`; updated `PRODUCT_PLAN.md`, `docs/exec-plans/README.md`, and `docs/features/README.md` to mark Curriculum Expansion v2 and Challenges/Gamification as Done. - Rewrote modules 0–2 for LocalForge: onboarding and project structure; Docker with `Ollama`, `Langfuse` and `ngrok`; env setup; first-message walkthrough; webhook pipeline; and LLM integration via `OllamaClient` (async `httpx`) with resilient error handling. - Added MDX for modules 3–22 covering: DB/state and repository pattern; memory + context builder and token budgeting; multimodal & formatting (audio with `faster-whisper`, vision via `Ollama` llava, WhatsApp/Telegram formatting and message splitting); guardrails & observability (`Langfuse` v3, trace context/recorder/scores); eval, prompt versioning, and auto-evolution; agents (reactive loop, planner–worker, robustness/fallbacks); security (policy engine, audit trail, HITL, shell controls); platform abstraction (`PlatformClient`) with Telegram client; MCP with hot-reload; knowledge graph; rule engine; and performance/ops (asyncio, caching, SQLite tuning, Docker, CI, eval in CI, monitoring). - **Bug Fixes** - Updated WhatsApp Graph API to v22.0, tightened HMAC verification guidance, and fixed Telegram HTML formatting/escape order. - Minor copy/formatting fixes across docs (README heading, module titles, `LocalForge` references). Written for commit 24a07831951e46a72bf2a5ebfa4834dc86138f7f. Summary will update on new commits. ## Changed files - `PRODUCT_PLAN.md` (modified, +73/-27) - `README.md` (modified, +1/-1) - `docs/curriculum/SYLLABUS.md` (added, +364/-0) - `docs/exec-plans/04-curriculum-expansion_prd.md` (added, +136/-0) - `docs/exec-plans/README.md` (modified, +2/-2) - `docs/features/README.md` (modified, +4/-1) - `frontend/content/modules/module-0/00-bienvenida.mdx` (modified, +88/-13) - `frontend/content/modules/module-0/01-estructura-proyecto.mdx` (modified, +182/-44) - `frontend/content/modules/module-0/02-docker-setup.mdx` (modified, +210/-56) - `frontend/content/modules/module-0/03-entorno-local.mdx` (modified, +191/-54) - `frontend/content/modules/module-0/04-primer-mensaje.mdx` (modified, +205/-37) - `frontend/content/modules/module-1/00-intro-webhooks.mdx` (modified, +140/-30) - `frontend/content/modules/module-1/01-firma-hmac.mdx` (modified, +187/-40) - `frontend/content/modules/module-1/02-parsear-payload.mdx` (modified, +216/-57) - `frontend/content/modules/module-1/03-responder-mensajes.mdx` (modified, +230/-68) - `frontend/content/modules/module-1/04-testing-webhook.mdx` (modified, +300/-80) - `frontend/content/modules/module-10/00-presupuesto-de-tokens.mdx` (added, +201/-0) - `frontend/content/modules/module-10/01-context-builder.mdx` (added, +234/-0) - `frontend/content/modules/module-10/02-conversation-context-build.mdx` (added, +276/-0) - `frontend/content/modules/module-10/03-fact-extraction.mdx` (added, +252/-0) - `frontend/content/modules/module-10/04-ensamblando-el-prompt.mdx` (added, +299/-0) - `frontend/content/modules/module-11/00-audio-faster-whisper.mdx` (added, +208/-0) - `frontend/content/modules/module-11/01-imagenes-vision-llm.mdx` (added, +254/-0) - `frontend/content/modules/module-11/02-markdown-a-whatsapp.mdx` (added, +208/-0) - `frontend/content/modules/module-11/03-markdown-a-telegram.mdx` (added, +234/-0) - `frontend/content/modules/module-11/04-message-splitting.mdx` (added, +252/-0) - `frontend/content/modules/module-12/00-por-que-guardrails.mdx` (added, +122/-0) - `frontend/content/modules/module-12/01-checks-deterministicos.mdx` (added, +164/-0) - `frontend/content/modules/module-12/02-checks-llm.mdx` (added, +168/-0) - `frontend/content/modules/module-12/03-pipeline-y-remediacion.mdx` (added, +190/-0) - `frontend/content/modules/module-12/04-guardrails-a-dataset.mdx` (added, +151/-0) - `frontend/content/modules/module-13/00-intro-observabilidad.mdx` (added, +132/-0) - `frontend/content/modules/module-13/01-langfuse-v3-sdk.mdx` (added, +138/-0) - `frontend/content/modules/module-13/02-trace-context.mdx` (added, +149/-0) - `frontend/content/modules/module-13/03-trace-reco

ollama2026-03-12 09:29:51

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14793•Fetched 2026-04-08 00:31:39

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×4cross-referenced ×4closed ×1labeled ×1

Error Message

This asymmetry causes silent failures for applications using the generate endpoint with thinking models — they get empty responses with no error.

Fix Action

Workaround

Switch from generate to chat API and pass think=False as a top-level parameter (not inside options).

PR fix notes

PR #3: Extend syllabus

Repository: sebadp/wasap-accademy
Author: sebadp
State: closed | merged: True
Link: https://github.com/sebadp/wasap-accademy/pull/3

Description (problem / solution / changelog)

Summary by cubic

Expands the curriculum to 23 modules (~115 lessons) with the full v2 syllabus, and marks the expansion as Done across plans and features. Foundation modules (0–2) are rewritten for LocalForge with Ollama, a cleaner webhook flow, Telegram support, and clearer onboarding.

New Features
- Added docs/curriculum/SYLLABUS.md and PRD docs/exec-plans/04-curriculum-expansion_prd.md; updated PRODUCT_PLAN.md, docs/exec-plans/README.md, and docs/features/README.md to mark Curriculum Expansion v2 and Challenges/Gamification as Done.
- Rewrote modules 0–2 for LocalForge: onboarding and project structure; Docker with Ollama, Langfuse and ngrok; env setup; first-message walkthrough; webhook pipeline; and LLM integration via OllamaClient (async httpx) with resilient error handling.
- Added MDX for modules 3–22 covering: DB/state and repository pattern; memory + context builder and token budgeting; multimodal & formatting (audio with faster-whisper, vision via Ollama llava, WhatsApp/Telegram formatting and message splitting); guardrails & observability (Langfuse v3, trace context/recorder/scores); eval, prompt versioning, and auto-evolution; agents (reactive loop, planner–worker, robustness/fallbacks); security (policy engine, audit trail, HITL, shell controls); platform abstraction (PlatformClient) with Telegram client; MCP with hot-reload; knowledge graph; rule engine; and performance/ops (asyncio, caching, SQLite tuning, Docker, CI, eval in CI, monitoring).
Bug Fixes
- Updated WhatsApp Graph API to v22.0, tightened HMAC verification guidance, and fixed Telegram HTML formatting/escape order.
- Minor copy/formatting fixes across docs (README heading, module titles, LocalForge references).

<sup>Written for commit 24a07831951e46a72bf2a5ebfa4834dc86138f7f. Summary will update on new commits.</sup>

Changed files

PRODUCT_PLAN.md (modified, +73/-27)
README.md (modified, +1/-1)
docs/curriculum/SYLLABUS.md (added, +364/-0)
docs/exec-plans/04-curriculum-expansion_prd.md (added, +136/-0)
docs/exec-plans/README.md (modified, +2/-2)
docs/features/README.md (modified, +4/-1)
frontend/content/modules/module-0/00-bienvenida.mdx (modified, +88/-13)
frontend/content/modules/module-0/01-estructura-proyecto.mdx (modified, +182/-44)
frontend/content/modules/module-0/02-docker-setup.mdx (modified, +210/-56)
frontend/content/modules/module-0/03-entorno-local.mdx (modified, +191/-54)
frontend/content/modules/module-0/04-primer-mensaje.mdx (modified, +205/-37)
frontend/content/modules/module-1/00-intro-webhooks.mdx (modified, +140/-30)
frontend/content/modules/module-1/01-firma-hmac.mdx (modified, +187/-40)
frontend/content/modules/module-1/02-parsear-payload.mdx (modified, +216/-57)
frontend/content/modules/module-1/03-responder-mensajes.mdx (modified, +230/-68)
frontend/content/modules/module-1/04-testing-webhook.mdx (modified, +300/-80)
frontend/content/modules/module-10/00-presupuesto-de-tokens.mdx (added, +201/-0)
frontend/content/modules/module-10/01-context-builder.mdx (added, +234/-0)
frontend/content/modules/module-10/02-conversation-context-build.mdx (added, +276/-0)
frontend/content/modules/module-10/03-fact-extraction.mdx (added, +252/-0)
frontend/content/modules/module-10/04-ensamblando-el-prompt.mdx (added, +299/-0)
frontend/content/modules/module-11/00-audio-faster-whisper.mdx (added, +208/-0)
frontend/content/modules/module-11/01-imagenes-vision-llm.mdx (added, +254/-0)
frontend/content/modules/module-11/02-markdown-a-whatsapp.mdx (added, +208/-0)
frontend/content/modules/module-11/03-markdown-a-telegram.mdx (added, +234/-0)
frontend/content/modules/module-11/04-message-splitting.mdx (added, +252/-0)
frontend/content/modules/module-12/00-por-que-guardrails.mdx (added, +122/-0)
frontend/content/modules/module-12/01-checks-deterministicos.mdx (added, +164/-0)
frontend/content/modules/module-12/02-checks-llm.mdx (added, +168/-0)
frontend/content/modules/module-12/03-pipeline-y-remediacion.mdx (added, +190/-0)
frontend/content/modules/module-12/04-guardrails-a-dataset.mdx (added, +151/-0)
frontend/content/modules/module-13/00-intro-observabilidad.mdx (added, +132/-0)
frontend/content/modules/module-13/01-langfuse-v3-sdk.mdx (added, +138/-0)
frontend/content/modules/module-13/02-trace-context.mdx (added, +149/-0)
frontend/content/modules/module-13/03-trace-recorder.mdx (added, +180/-0)
frontend/content/modules/module-13/04-scores-y-metricas.mdx (added, +153/-0)
frontend/content/modules/module-14/00-dataset-vivo.mdx (added, +249/-0)
frontend/content/modules/module-14/01-senales-de-usuario.mdx (added, +270/-0)
frontend/content/modules/module-14/02-llm-as-judge.mdx (added, +285/-0)
frontend/content/modules/module-14/03-prompt-versioning.mdx (added, +314/-0)
frontend/content/modules/module-14/04-auto-evolucion.mdx (added, +335/-0)
frontend/content/modules/module-15/00-que-es-un-agente.mdx (added, +201/-0)
frontend/content/modules/module-15/01-reactive-loop.mdx (added, +248/-0)
frontend/content/modules/module-15/02-session-persistence.mdx (added, +245/-0)
frontend/content/modules/module-15/03-loop-detection.mdx (added, +213/-0)
frontend/content/modules/module-15/04-planner-orchestrator.mdx (added, +352/-0)
frontend/content/modules/module-16/00-patron-planner-worker.mdx (added, +202/-0)
frontend/content/modules/module-16/01-workers-especializados.mdx (added, +234/-0)
frontend/content/modules/module-16/02-replanificacion.mdx (added, +226/-0)
frontend/content/modules/module-16/03-synthesis-y-task-memory.mdx (added, +226/-0)
frontend/content/modules/module-16/04-robustez-y-fallbacks.mdx (added, +239/-0)
frontend/content/modules/module-17/00-defense-in-depth.mdx (added, +121/-0)
frontend/content/modules/module-17/01-policy-engine.mdx (added, +263/-0)
frontend/content/modules/module-17/02-audit-trail.mdx (added, +226/-0)
frontend/content/modules/module-17/03-human-in-the-loop.mdx (added, +259/-0)
frontend/content/modules/module-17/04-shell-controls.mdx (added, +231/-0)
frontend/content/modules/module-18/00-platform-protocol.mdx (added, +193/-0)
frontend/content/modules/module-18/01-telegram-client.mdx (added, +260/-0)
frontend/content/modules/module-18/02-mcp-basico.mdx (added, +216/-0)
frontend/content/modules/module-18/03-hot-reload-mcp.mdx (added, +217/-0)
frontend/content/modules/module-18/04-integracion-completa.mdx (added, +219/-0)
frontend/content/modules/module-19/00-knowledge-graph-basics.mdx (added, +190/-0)
frontend/content/modules/module-19/01-entity-registry.mdx (added, +169/-0)
frontend/content/modules/module-19/02-graph-traversal.mdx (added, +213/-0)
frontend/content/modules/module-19/03-data-provenance.mdx (added, +196/-0)
frontend/content/modules/module-19/04-memory-versions.mdx (added, +220/-0)
frontend/content/modules/module-2/00-intro-llms.mdx (modified, +145/-20)
frontend/content/modules/module-2/01-anthropic-sdk.mdx (modified, +199/-60)
frontend/content/modules/module-2/02-prompt-engineering.mdx (modified, +173/-51)
frontend/content/modules/module-2/03-streaming.mdx (modified, +166/-68)
frontend/content/modules/module-2/04-integracion-completa.mdx (modified, +210/-91)
frontend/content/modules/module-20/00-rule-engine.mdx (added, +179/-0)
frontend/content/modules/module-20/01-conditions.mdx (added, +216/-0)
frontend/content/modules/module-20/02-actions.mdx (added, +221/-0)
frontend/content/modules/module-20/03-builtin-rules.mdx (added, +208/-0)
frontend/content/modules/module-20/04-custom-rules.mdx (added, +239/-0)
frontend/content/modules/module-21/00-profiling-critical-path.mdx (added, +157/-0)
frontend/content/modules/module-21/01-paralelizacion-asyncio.mdx (added, +204/-0)
frontend/content/modules/module-21/02-caching-estrategico.mdx (added, +182/-0)
frontend/content/modules/module-21/03-sqlite-performance.mdx (added, +194/-0)
frontend/content/modules/module-21/04-event-loop-hygiene.mdx (added, +219/-0)
frontend/content/modules/module-22/00-docker-para-ai.mdx (added, +203/-0)
frontend/content/modules/module-22/01-health-checks.mdx (added, +198/-0)
frontend/content/modules/module-22/02-ci-pipeline.mdx (added, +263/-0)
frontend/content/modules/module-22/03-eval-en-ci.mdx (added, +236/-0)
frontend/content/modules/module-22/04-monitoreo-produccion.mdx (added, +254/-0)
frontend/content/modules/module-3/00-sqlite-async.mdx (added, +172/-0)
frontend/content/modules/module-3/01-schema-design.mdx (added, +206/-0)
frontend/content/modules/module-3/02-repository-pattern.mdx (added, +219/-0)
frontend/content/modules/module-3/03-dedup-atomico.mdx (added, +170/-0)
frontend/content/modules/module-3/04-vector-storage.mdx (added, +0/-0)
frontend/content/modules/module-4/00-conversation-manager.mdx (added, +0/-0)
frontend/content/modules/module-4/01-historial-ventana.mdx (added, +0/-0)
frontend/content/modules/module-4/02-summarizer-background.mdx (added, +0/-0)
frontend/content/modules/module-4/03-conversation-state.mdx (added, +0/-0)
frontend/content/modules/module-4/04-compactacion-inteligente.mdx (added, +0/-0)
frontend/content/modules/module-5/00-memorias-semanticas.mdx (added, +0/-0)
frontend/content/modules/module-5/01-daily-logs.mdx (added, +0/-0)
frontend/content/modules/module-5/02-memory-md-sync.mdx (added, +0/-0)
frontend/content/modules/module-5/03-consolidacion-llm.mdx (added, +0/-0)

PR #193: feat: Ollama qwen3 upgrade + WhisperFlow voice pipeline

Repository: trendywink247-afk/GeekSpace2.0
Author: trendywink247-afk
State: closed | merged: True
Link: https://github.com/trendywink247-afk/GeekSpace2.0/pull/193

Description (problem / solution / changelog)

Summary

Ollama model swap: hermes3:8b → qwen3:8b (primary) + qwen3:14b (complex/planning), with intent-based /think vs /no_think mode selection
PicoClaw upgrade: qwen2.5-coder:1.5b → qwen2.5-coder:3b
Voice pipeline: 3 new Docker sidecars — Kokoro TTS (highest quality), Piper TTS (fast fallback), whisper.cpp STT (local transcription)
TTS fallback chain: Kokoro → Piper → edge-tts with health checks + 30s cache
STT fallback chain: whisper.cpp (local) → Groq Whisper (cloud)
Frontend: Server-side TTS hook (useTTS.ts) + VoiceChatPage with playback controls
Config: ollamaMaxTokens 512→4096, timeout 45s→90s, all new env vars documented

New Docker services

Service	Port	RAM limit	Purpose
kokoro-tts	5101	2GB	Neural TTS (ONNX CPU)
piper-tts	5100	512MB	Fast CPU TTS fallback
whisper-stt	5102	512MB	Local speech-to-text

Memory budget (32GB VPS)

Typical (qwen3:8b loaded): ~16.5GB used, ~15.5GB free Peak (qwen3:14b + voice sidecars): ~20GB used, ~12GB free

Test plan

ollama pull qwen3:8b && ollama pull qwen3:14b && ollama pull qwen2.5-coder:3b
docker compose build kokoro-tts piper-tts whisper-stt
docker compose up -d — verify all 3 sidecars start healthy
Health checks: curl localhost:5100/health && curl localhost:5101/health && curl localhost:5102/health
TTS test: curl -X POST localhost:5101/tts -d '{"text":"Hello"}' -H 'Content-Type: application/json' -o test.wav
STT test: curl -X POST localhost:5102/transcribe -F '[email protected]'
Send complex query → verify routing trace shows qwen3:14b with /think
Fallback test: stop Kokoro → Piper takes over → stop Piper → edge-tts takes over

https://claude.ai/code/session_017nK3obD6HhVkPkFuecwxPp

Summary by CodeRabbit

New Features
- Local voice pipeline with Kokoro/Piper/Whisper sidecars, automatic fallback, and UI labels showing which engine handled each message.
- Dual LLM model support (standard vs. complex) with optional "thinking" mode and intent-aware selection.
- Client prefers server-side TTS with browser fallback and exposes TTS/STT engine info.
Chores
- Updated default models, increased token limits, adjusted timeouts, and added env settings for voice services and models.
- Added containerized sidecar services and runner setup script; CI deploys moved to self-hosted.
Tests
- Updated tests to assert presence of engine-related fields.

Changed files

.env.example (modified, +17/-5)
.env.staging.example (modified, +12/-3)
.github/workflows/ci.yml (modified, +102/-121)
docker-compose.yml (modified, +122/-5)
kokoro-tts/Dockerfile (added, +25/-0)
kokoro-tts/download-model.sh (added, +14/-0)
kokoro-tts/server.py (added, +107/-0)
package-lock.json (modified, +9667/-19076)
picoclaw/index.js (modified, +1/-1)
piper-tts/Dockerfile (added, +19/-0)
piper-tts/download-voices.sh (added, +29/-0)
piper-tts/server.py (added, +124/-0)
scripts/setup-runner.sh (added, +25/-0)
server/src/config.ts (modified, +12/-4)
server/src/modules/agent/routes/streaming.ts (modified, +1/-1)
server/src/modules/agent/services/llm.ts (modified, +47/-15)
server/src/modules/media/index.ts (modified, +2/-0)
server/src/modules/media/routes/voice.ts (modified, +8/-13)
server/src/modules/media/services/voice.ts (modified, +255/-90)
server/src/routes/webhooks.ts (modified, +6/-4)
server/src/services/proactive-engine.ts (modified, +2/-2)
server/src/services/voice.ts (modified, +2/-0)
server/src/test/api/phase110.test.ts (modified, +2/-2)
server/src/test/api/phase80.test.ts (modified, +2/-2)
src/dashboard/pages/VoiceChatPage.tsx (modified, +22/-12)
src/hooks/useTTS.ts (modified, +169/-107)
whisper-stt/Dockerfile (added, +37/-0)
whisper-stt/download-model.sh (added, +16/-0)
whisper-stt/server.py (added, +126/-0)

Code Example

import ollama

# Test 1: generate API — think=false IGNORED
r1 = ollama.generate(
    model='qwen3.5:9b',
    prompt='Say hello in 2 words. Reply with ONLY the words.',
    options={'num_predict': 30, 'think': False}
)
print('generate think=false:')
print('  response:', repr(r1['response']))        # '' (empty!)
print('  thinking:', repr(r1['thinking'][:80]))    # still populated
print('  eval_count:', r1['eval_count'])           # 30 (all tokens burned on thinking)

# Test 2: chat API — think=False WORKS
r2 = ollama.chat(
    model='qwen3.5:9b',
    messages=[{'role': 'user', 'content': 'Say hello in 2 words. Reply with ONLY the words.'}],
    think=False,
    options={'num_predict': 30}
)
print('chat think=False:')
print('  content:', repr(r2['message']['content']))  # 'Hello there!' (actual output)
print('  eval_count:', r2['eval_count'])              # 3 (no wasted tokens)

---

generate think=false:
  response: ''
  thinking: 'Thinking Process:\n\n1.  **Analyze the Request:**\n    *   Task: Say hello.\n    *   Constraint 1:'
  eval_count: 30

chat think=False:
  content: 'Hello there!'
  eval_count: 3

RAW_BUFFERClick to expand / collapse

What is the issue?

The generate API completely ignores think: false passed in options for qwen3.5:9b. The model still produces thinking tokens that consume the entire num_predict budget, resulting in an empty response field. The chat API with think=False as a top-level parameter works correctly.

This asymmetry causes silent failures for applications using the generate endpoint with thinking models — they get empty responses with no error.

OS / Ollama version

OS: Linux 6.17.0-14-generic (Ubuntu)
Ollama: 0.17.7
Model: qwen3.5:9b

Steps to reproduce

import ollama

# Test 1: generate API — think=false IGNORED
r1 = ollama.generate(
    model='qwen3.5:9b',
    prompt='Say hello in 2 words. Reply with ONLY the words.',
    options={'num_predict': 30, 'think': False}
)
print('generate think=false:')
print('  response:', repr(r1['response']))        # '' (empty!)
print('  thinking:', repr(r1['thinking'][:80]))    # still populated
print('  eval_count:', r1['eval_count'])           # 30 (all tokens burned on thinking)

# Test 2: chat API — think=False WORKS
r2 = ollama.chat(
    model='qwen3.5:9b',
    messages=[{'role': 'user', 'content': 'Say hello in 2 words. Reply with ONLY the words.'}],
    think=False,
    options={'num_predict': 30}
)
print('chat think=False:')
print('  content:', repr(r2['message']['content']))  # 'Hello there!' (actual output)
print('  eval_count:', r2['eval_count'])              # 3 (no wasted tokens)

Actual output

generate think=false:
  response: ''
  thinking: 'Thinking Process:\n\n1.  **Analyze the Request:**\n    *   Task: Say hello.\n    *   Constraint 1:'
  eval_count: 30

chat think=False:
  content: 'Hello there!'
  eval_count: 3

Expected behavior

think: false should disable thinking on BOTH generate and chat endpoints. When thinking is disabled, all num_predict tokens should be available for the visible response.

Additional context

Non-thinking models (e.g. gemma2:9b) are unaffected — think: false is harmlessly ignored
The model's Modelfile uses RENDERER qwen3.5 / PARSER qwen3.5, so thinking is handled by Ollama's built-in renderer, not the template
Increasing num_predict to 200+ doesn't help on generate — the model just thinks longer, still empty response
Related: #14502, #14612, #14645

Workaround

Switch from generate to chat API and pass think=False as a top-level parameter (not inside options).

extent analysis

Fix Plan

To fix the issue with the generate API ignoring think: false in options for qwen3.5:9b, we need to modify the ollama.generate function to properly handle the think option.

Here are the steps:

Update the ollama.generate function to accept think as a top-level parameter.
Modify the function to pass think=False to the underlying model when generating text.

Example code:

import ollama

def generate(model, prompt, options, think=None):
    # ... existing code ...
    if think is not False:
        think = True  # default to True if not specified
    # Pass think to the underlying model
    response = ollama._generate(model, prompt, options, think=think)
    return response

# Usage
r1 = ollama.generate(
    model='qwen3.5:9b',
    prompt='Say hello in 2 words. Reply with ONLY the words.',
    options={'num_predict': 30},
    think=False
)

Verification

To verify that the fix worked, run the following test:

r1 = ollama.generate(
    model='qwen3.5:9b',
    prompt='Say hello in 2 words. Reply with ONLY the words.',
    options={'num_predict': 30},
    think=False
)
print('generate think=False:')
print('  response:', repr(r1['response']))        
print('  thinking:', repr(r1['thinking'][:80]))    
print('  eval_count:', r1['eval_count'])

The output should show a non-empty response and a lower eval_count.

Extra Tips

Make sure to update the ollama library to the latest version to ensure that the fix is included.
If you are using a custom model, ensure that it is compatible with the updated ollama.generate function.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

think: false should disable thinking on BOTH generate and chat endpoints. When thinking is disabled, all num_predict tokens should be available for the visible response.

#api #ssr #installation #tensor shape #autograd error #container setup #orchestration issue #cache issue #memory leak

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

ollama - ✅(Solved) Fix generate API ignores think=false for qwen3.5 (chat API works) [2 pull requests, 4 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Workaround

PR fix notes

PR #3: Extend syllabus

Description (problem / solution / changelog)

Summary by cubic

Changed files

PR #193: feat: Ollama qwen3 upgrade + WhisperFlow voice pipeline

Description (problem / solution / changelog)

Summary

New Docker services

Memory budget (32GB VPS)

Test plan

Summary by CodeRabbit

Changed files

Code Example

What is the issue?

OS / Ollama version

Steps to reproduce

Actual output

Expected behavior

Additional context

Workaround

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING