ollama - ✅(Solved) Fix qwen3.5 vision output routes to thinking field instead of content when using image inputs [1 pull requests, 2 comments, 3 participants]

ollama2026-03-08 11:54:11

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14716•Fetched 2026-04-08 00:32:41

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×2cross-referenced ×2referenced ×2closed ×1

Fix Action

Fixed

Fixed by PR: feat: Ollama qwen3 upgrade + WhisperFlow voice pipeline (https://github.com/trendywink247-afk/GeekSpace2.0/pull/193)

PR fix notes

PR #193: feat: Ollama qwen3 upgrade + WhisperFlow voice pipeline

Repository: trendywink247-afk/GeekSpace2.0
Author: trendywink247-afk
State: closed | merged: True
Link: https://github.com/trendywink247-afk/GeekSpace2.0/pull/193

Description (problem / solution / changelog)

Summary

Ollama model swap: hermes3:8b → qwen3:8b (primary) + qwen3:14b (complex/planning), with intent-based /think vs /no_think mode selection
PicoClaw upgrade: qwen2.5-coder:1.5b → qwen2.5-coder:3b
Voice pipeline: 3 new Docker sidecars — Kokoro TTS (highest quality), Piper TTS (fast fallback), whisper.cpp STT (local transcription)
TTS fallback chain: Kokoro → Piper → edge-tts with health checks + 30s cache
STT fallback chain: whisper.cpp (local) → Groq Whisper (cloud)
Frontend: Server-side TTS hook (useTTS.ts) + VoiceChatPage with playback controls
Config: ollamaMaxTokens 512→4096, timeout 45s→90s, all new env vars documented

New Docker services

Service	Port	RAM limit	Purpose
kokoro-tts	5101	2GB	Neural TTS (ONNX CPU)
piper-tts	5100	512MB	Fast CPU TTS fallback
whisper-stt	5102	512MB	Local speech-to-text

Memory budget (32GB VPS)

Typical (qwen3:8b loaded): ~16.5GB used, ~15.5GB free Peak (qwen3:14b + voice sidecars): ~20GB used, ~12GB free

Test plan

ollama pull qwen3:8b && ollama pull qwen3:14b && ollama pull qwen2.5-coder:3b
docker compose build kokoro-tts piper-tts whisper-stt
docker compose up -d — verify all 3 sidecars start healthy
Health checks: curl localhost:5100/health && curl localhost:5101/health && curl localhost:5102/health
TTS test: curl -X POST localhost:5101/tts -d '{"text":"Hello"}' -H 'Content-Type: application/json' -o test.wav
STT test: curl -X POST localhost:5102/transcribe -F '[email protected]'
Send complex query → verify routing trace shows qwen3:14b with /think
Fallback test: stop Kokoro → Piper takes over → stop Piper → edge-tts takes over

https://claude.ai/code/session_017nK3obD6HhVkPkFuecwxPp

Summary by CodeRabbit

New Features
- Local voice pipeline with Kokoro/Piper/Whisper sidecars, automatic fallback, and UI labels showing which engine handled each message.
- Dual LLM model support (standard vs. complex) with optional "thinking" mode and intent-aware selection.
- Client prefers server-side TTS with browser fallback and exposes TTS/STT engine info.
Chores
- Updated default models, increased token limits, adjusted timeouts, and added env settings for voice services and models.
- Added containerized sidecar services and runner setup script; CI deploys moved to self-hosted.
Tests
- Updated tests to assert presence of engine-related fields.

Changed files

.env.example (modified, +17/-5)
.env.staging.example (modified, +12/-3)
.github/workflows/ci.yml (modified, +102/-121)
docker-compose.yml (modified, +122/-5)
kokoro-tts/Dockerfile (added, +25/-0)
kokoro-tts/download-model.sh (added, +14/-0)
kokoro-tts/server.py (added, +107/-0)
package-lock.json (modified, +9667/-19076)
picoclaw/index.js (modified, +1/-1)
piper-tts/Dockerfile (added, +19/-0)
piper-tts/download-voices.sh (added, +29/-0)
piper-tts/server.py (added, +124/-0)
scripts/setup-runner.sh (added, +25/-0)
server/src/config.ts (modified, +12/-4)
server/src/modules/agent/routes/streaming.ts (modified, +1/-1)
server/src/modules/agent/services/llm.ts (modified, +47/-15)
server/src/modules/media/index.ts (modified, +2/-0)
server/src/modules/media/routes/voice.ts (modified, +8/-13)
server/src/modules/media/services/voice.ts (modified, +255/-90)
server/src/routes/webhooks.ts (modified, +6/-4)
server/src/services/proactive-engine.ts (modified, +2/-2)
server/src/services/voice.ts (modified, +2/-0)
server/src/test/api/phase110.test.ts (modified, +2/-2)
server/src/test/api/phase80.test.ts (modified, +2/-2)
src/dashboard/pages/VoiceChatPage.tsx (modified, +22/-12)
src/hooks/useTTS.ts (modified, +169/-107)
whisper-stt/Dockerfile (added, +37/-0)
whisper-stt/download-model.sh (added, +16/-0)
whisper-stt/server.py (added, +126/-0)

Code Example

**To Reproduce**

import ollama

with open("page.png", "rb") as f:
    img_bytes = f.read()

response = ollama.chat(
    model="qwen3.5:9b",
    messages=[{"role": "user", "content": "Describe this image.", "images": [img_bytes]}],
    options={"temperature": 0.7, "top_p": 0.80, "top_k": 20, "thinking": False},
)

print("content:", repr(response.message.content))   # always empty
print("thinking:", repr(response.message.thinking)) # full output here


**Expected:** response.message.content contains the model's response
**Actual:** response.message.content is empty, full output in response.message.thinking

RAW_BUFFERClick to expand / collapse

What is the issue?

When sending image inputs to qwen3.5 models via ollama.chat(), all output is routed to response.message.thinking and response.message.content is always empty. Text-only inputs work correctly. Setting thinking: False in options has no effect when images are involved.

Relevant log output

**To Reproduce**

import ollama

with open("page.png", "rb") as f:
    img_bytes = f.read()

response = ollama.chat(
    model="qwen3.5:9b",
    messages=[{"role": "user", "content": "Describe this image.", "images": [img_bytes]}],
    options={"temperature": 0.7, "top_p": 0.80, "top_k": 20, "thinking": False},
)

print("content:", repr(response.message.content))   # always empty
print("thinking:", repr(response.message.thinking)) # full output here


**Expected:** response.message.content contains the model's response
**Actual:** response.message.content is empty, full output in response.message.thinking

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.17.4

extent analysis

Fix Plan

The issue seems to be related to how the ollama.chat() function handles image inputs. To fix this, we need to modify the way we pass the image to the model.

Code Changes

We can try to pass the image as a base64 encoded string instead of bytes. Here's an example:

import ollama
import base64

with open("page.png", "rb") as f:
    img_bytes = f.read()
img_b64 = base64.b64encode(img_bytes).decode("utf-8")

response = ollama.chat(
    model="qwen3.5:9b",
    messages=[{"role": "user", "content": "Describe this image.", "images": [img_b64]}],
    options={"temperature": 0.7, "top_p": 0.80, "top_k": 20, "thinking": False},
)

print("content:", repr(response.message.content))
print("thinking:", repr(response.message.thinking))

Alternatively, we can also try to update the ollama library to the latest version, as this issue might be fixed in a newer version.

Verification

To verify that the fix worked, we can check if the response.message.content is no longer empty and contains the model's response.

Extra Tips

Make sure to handle any exceptions that might occur during the base64 encoding process.
If the issue persists, try to check the ollama library's documentation for any specific requirements or guidelines for passing image inputs.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #environment setup #docker error #permission error #memory optimization #batch processing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - ✅(Solved) Fix qwen3.5 vision output routes to thinking field instead of content when using image inputs [1 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #193: feat: Ollama qwen3 upgrade + WhisperFlow voice pipeline

Description (problem / solution / changelog)

Summary

New Docker services

Memory budget (32GB VPS)

Test plan

Summary by CodeRabbit

Changed files

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - ✅(Solved) Fix qwen3.5 vision output routes to thinking field instead of content when using image inputs [1 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #193: feat: Ollama qwen3 upgrade + WhisperFlow voice pipeline

Description (problem / solution / changelog)

Summary

New Docker services

Memory budget (32GB VPS)

Test plan

Summary by CodeRabbit

Changed files

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING