ollama - ✅(Solved) Fix qwen3.5 vision output routes to thinking field instead of content when using image inputs [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14716Fetched 2026-04-08 00:32:41
View on GitHub
Comments
2
Participants
3
Timeline
8
Reactions
0
Timeline (top)
commented ×2cross-referenced ×2referenced ×2closed ×1

Fix Action

Fixed

PR fix notes

PR #193: feat: Ollama qwen3 upgrade + WhisperFlow voice pipeline

Description (problem / solution / changelog)

Summary

  • Ollama model swap: hermes3:8b → qwen3:8b (primary) + qwen3:14b (complex/planning), with intent-based /think vs /no_think mode selection
  • PicoClaw upgrade: qwen2.5-coder:1.5b → qwen2.5-coder:3b
  • Voice pipeline: 3 new Docker sidecars — Kokoro TTS (highest quality), Piper TTS (fast fallback), whisper.cpp STT (local transcription)
  • TTS fallback chain: Kokoro → Piper → edge-tts with health checks + 30s cache
  • STT fallback chain: whisper.cpp (local) → Groq Whisper (cloud)
  • Frontend: Server-side TTS hook (useTTS.ts) + VoiceChatPage with playback controls
  • Config: ollamaMaxTokens 512→4096, timeout 45s→90s, all new env vars documented

New Docker services

ServicePortRAM limitPurpose
kokoro-tts51012GBNeural TTS (ONNX CPU)
piper-tts5100512MBFast CPU TTS fallback
whisper-stt5102512MBLocal speech-to-text

Memory budget (32GB VPS)

Typical (qwen3:8b loaded): ~16.5GB used, ~15.5GB free Peak (qwen3:14b + voice sidecars): ~20GB used, ~12GB free

Test plan

  • ollama pull qwen3:8b && ollama pull qwen3:14b && ollama pull qwen2.5-coder:3b
  • docker compose build kokoro-tts piper-tts whisper-stt
  • docker compose up -d — verify all 3 sidecars start healthy
  • Health checks: curl localhost:5100/health && curl localhost:5101/health && curl localhost:5102/health
  • TTS test: curl -X POST localhost:5101/tts -d '{"text":"Hello"}' -H 'Content-Type: application/json' -o test.wav
  • STT test: curl -X POST localhost:5102/transcribe -F '[email protected]'
  • Send complex query → verify routing trace shows qwen3:14b with /think
  • Fallback test: stop Kokoro → Piper takes over → stop Piper → edge-tts takes over

https://claude.ai/code/session_017nK3obD6HhVkPkFuecwxPp

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

Summary by CodeRabbit

  • New Features

    • Local voice pipeline with Kokoro/Piper/Whisper sidecars, automatic fallback, and UI labels showing which engine handled each message.
    • Dual LLM model support (standard vs. complex) with optional "thinking" mode and intent-aware selection.
    • Client prefers server-side TTS with browser fallback and exposes TTS/STT engine info.
  • Chores

    • Updated default models, increased token limits, adjusted timeouts, and added env settings for voice services and models.
    • Added containerized sidecar services and runner setup script; CI deploys moved to self-hosted.
  • Tests

    • Updated tests to assert presence of engine-related fields.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Changed files

  • .env.example (modified, +17/-5)
  • .env.staging.example (modified, +12/-3)
  • .github/workflows/ci.yml (modified, +102/-121)
  • docker-compose.yml (modified, +122/-5)
  • kokoro-tts/Dockerfile (added, +25/-0)
  • kokoro-tts/download-model.sh (added, +14/-0)
  • kokoro-tts/server.py (added, +107/-0)
  • package-lock.json (modified, +9667/-19076)
  • picoclaw/index.js (modified, +1/-1)
  • piper-tts/Dockerfile (added, +19/-0)
  • piper-tts/download-voices.sh (added, +29/-0)
  • piper-tts/server.py (added, +124/-0)
  • scripts/setup-runner.sh (added, +25/-0)
  • server/src/config.ts (modified, +12/-4)
  • server/src/modules/agent/routes/streaming.ts (modified, +1/-1)
  • server/src/modules/agent/services/llm.ts (modified, +47/-15)
  • server/src/modules/media/index.ts (modified, +2/-0)
  • server/src/modules/media/routes/voice.ts (modified, +8/-13)
  • server/src/modules/media/services/voice.ts (modified, +255/-90)
  • server/src/routes/webhooks.ts (modified, +6/-4)
  • server/src/services/proactive-engine.ts (modified, +2/-2)
  • server/src/services/voice.ts (modified, +2/-0)
  • server/src/test/api/phase110.test.ts (modified, +2/-2)
  • server/src/test/api/phase80.test.ts (modified, +2/-2)
  • src/dashboard/pages/VoiceChatPage.tsx (modified, +22/-12)
  • src/hooks/useTTS.ts (modified, +169/-107)
  • whisper-stt/Dockerfile (added, +37/-0)
  • whisper-stt/download-model.sh (added, +16/-0)
  • whisper-stt/server.py (added, +126/-0)

Code Example

**To Reproduce**

import ollama

with open("page.png", "rb") as f:
    img_bytes = f.read()

response = ollama.chat(
    model="qwen3.5:9b",
    messages=[{"role": "user", "content": "Describe this image.", "images": [img_bytes]}],
    options={"temperature": 0.7, "top_p": 0.80, "top_k": 20, "thinking": False},
)

print("content:", repr(response.message.content))   # always empty
print("thinking:", repr(response.message.thinking)) # full output here


**Expected:** response.message.content contains the model's response
**Actual:** response.message.content is empty, full output in response.message.thinking
RAW_BUFFERClick to expand / collapse

What is the issue?

When sending image inputs to qwen3.5 models via ollama.chat(), all output is routed to response.message.thinking and response.message.content is always empty. Text-only inputs work correctly. Setting thinking: False in options has no effect when images are involved.

Relevant log output

**To Reproduce**

import ollama

with open("page.png", "rb") as f:
    img_bytes = f.read()

response = ollama.chat(
    model="qwen3.5:9b",
    messages=[{"role": "user", "content": "Describe this image.", "images": [img_bytes]}],
    options={"temperature": 0.7, "top_p": 0.80, "top_k": 20, "thinking": False},
)

print("content:", repr(response.message.content))   # always empty
print("thinking:", repr(response.message.thinking)) # full output here


**Expected:** response.message.content contains the model's response
**Actual:** response.message.content is empty, full output in response.message.thinking

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.17.4

extent analysis

Fix Plan

The issue seems to be related to how the ollama.chat() function handles image inputs. To fix this, we need to modify the way we pass the image to the model.

Code Changes

We can try to pass the image as a base64 encoded string instead of bytes. Here's an example:

import ollama
import base64

with open("page.png", "rb") as f:
    img_bytes = f.read()
img_b64 = base64.b64encode(img_bytes).decode("utf-8")

response = ollama.chat(
    model="qwen3.5:9b",
    messages=[{"role": "user", "content": "Describe this image.", "images": [img_b64]}],
    options={"temperature": 0.7, "top_p": 0.80, "top_k": 20, "thinking": False},
)

print("content:", repr(response.message.content))
print("thinking:", repr(response.message.thinking))

Alternatively, we can also try to update the ollama library to the latest version, as this issue might be fixed in a newer version.

Verification

To verify that the fix worked, we can check if the response.message.content is no longer empty and contains the model's response.

Extra Tips

  • Make sure to handle any exceptions that might occur during the base64 encoding process.
  • If the issue persists, try to check the ollama library's documentation for any specific requirements or guidelines for passing image inputs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING