llamaIndex - ✅(Solved) Fix [Bug]: stream_chat / astream_chat in llama-index-llms-ollama drops thinking-only chunks [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
run-llama/llama_index#21232Fetched 2026-04-08 01:58:03
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Timeline (top)
cross-referenced ×2labeled ×2commented ×1

Root Cause

Root Cause In llama_index/llms/ollama/base.py, both stream_chat and astream_chat have this guard:

Fix Action

Fixed

PR fix notes

PR #21234: fix: preserve thinking-only chunks in Ollama streaming

Description (problem / solution / changelog)

Don't skip chunks that have thinking content but no message content in stream_chat/astream_chat.

Fixes #21232

Changed files

  • llama-index-integrations/llms/llama-index-llms-ollama/llama_index/llms/ollama/base.py (modified, +6/-2)

Code Example

pip install llama-index-llms-ollama==0.10.1 llama-index-core==0.14.19
Pull an Ollama model that supports thinking (e.g., DeepSeek-R1)

---

from llama_index.llms.ollama import Ollama
from llama_index.core.llms import ChatMessage

llm = Ollama(model="deepseek-r1", request_timeout=120, thinking=True)
messages = [ChatMessage(role="user", content="What is 25 * 25?")]

# --- Non-streaming (works correctly) ---
response = llm.chat(messages)
print("Thinking:", response.additional_kwargs.get("thinking"))  # ✅ Has thinking content
print("Answer:", response.message.content)

# --- Streaming (thinking is lost) ---
thinking_chunks = []
text_chunks = []
for chunk in llm.stream_chat(messages):
    thinking = (chunk.additional_kwargs or {}).get("thinking_delta", "")
    if thinking:
        thinking_chunks.append(thinking)
    if chunk.delta:
        text_chunks.append(chunk.delta)

print("Thinking chunks received:", len(thinking_chunks))  # ❌ Always 0
print("Text chunks received:", len(text_chunks))

---
RAW_BUFFERClick to expand / collapse

Bug Description

Bug Description When streaming responses from Ollama models that support extended thinking (e.g., DeepSeek-R1, QwQ), chunks where content=None but thinking is present are silently dropped. This means all thinking content is lost during streaming, even though think=True is properly passed to the Ollama API.

The non-streaming chat() and achat() methods handle thinking blocks correctly — only the streaming paths are affected.

Root Cause In llama_index/llms/ollama/base.py, both stream_chat and astream_chat have this guard:

if r["message"]["content"] is None: continue During the thinking phase, Ollama sends chunks where content is None and thinking contains the reasoning text. This guard skips those chunks entirely before the ThinkingBlock accumulation logic can process them.

Suggested Fix Change the guard in both stream_chat (~line 474) and astream_chat (~line 558) to:

if r["message"]["content"] is None and not r["message"].get("thinking"): continue This allows thinking-only chunks through while still skipping truly empty chunks.

Environment llama-index-llms-ollama: 0.10.1 llama-index-core: 0.14.19 Ollama model: DeepSeek-R1 (any model with think=True support)

Version

0.14.19

Steps to Reproduce

Steps to Reproduce Install packages

pip install llama-index-llms-ollama==0.10.1 llama-index-core==0.14.19
Pull an Ollama model that supports thinking (e.g., DeepSeek-R1)

ollama pull deepseek-r1 Run this script:

from llama_index.llms.ollama import Ollama
from llama_index.core.llms import ChatMessage

llm = Ollama(model="deepseek-r1", request_timeout=120, thinking=True)
messages = [ChatMessage(role="user", content="What is 25 * 25?")]

# --- Non-streaming (works correctly) ---
response = llm.chat(messages)
print("Thinking:", response.additional_kwargs.get("thinking"))  # ✅ Has thinking content
print("Answer:", response.message.content)

# --- Streaming (thinking is lost) ---
thinking_chunks = []
text_chunks = []
for chunk in llm.stream_chat(messages):
    thinking = (chunk.additional_kwargs or {}).get("thinking_delta", "")
    if thinking:
        thinking_chunks.append(thinking)
    if chunk.delta:
        text_chunks.append(chunk.delta)

print("Thinking chunks received:", len(thinking_chunks))  # ❌ Always 0
print("Text chunks received:", len(text_chunks))

Expected Behavior thinking_chunks should contain the model's reasoning tokens, just like the non-streaming chat() returns them in additional_kwargs["thinking"].

Actual Behavior thinking_chunks is always empty. All chunks where content=None (which is every chunk during the thinking phase) are dropped by the continue guard before thinking_delta is ever set.

Relevant Logs/Tracbacks

extent analysis

TL;DR

Update the guard condition in stream_chat and astream_chat to allow thinking-only chunks to pass through.

Guidance

  • Identify the lines of code causing the issue: stream_chat (~line 474) and astream_chat (~line 558) in llama_index/llms/ollama/base.py.
  • Update the guard condition to if r["message"]["content"] is None and not r["message"].get("thinking"): continue to prevent skipping thinking-only chunks.
  • Verify the fix by running the provided script and checking if thinking_chunks is populated correctly.
  • Ensure the Ollama model supports thinking (e.g., DeepSeek-R1) and think=True is passed to the Ollama API.

Example

# Updated guard condition
if r["message"]["content"] is None and not r["message"].get("thinking"):
    continue

Notes

This fix assumes that the issue is solely caused by the guard condition in stream_chat and astream_chat. If the problem persists after applying the fix, further investigation may be necessary.

Recommendation

Apply the suggested workaround by updating the guard condition, as it directly addresses the identified root cause and allows thinking-only chunks to be processed correctly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING