llamaIndex - ✅(Solved) Fix [Bug]: stream_chat / astream_chat in llama-index-llms-ollama drops thinking-only chunks [1 pull requests, 1 comments, 2 participants]

llamaIndex2026-03-31 08:35:39

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

run-llama/llama_index#21232•Fetched 2026-04-08 01:58:03

View on GitHub

Comments

Participants

Timeline

Reactions

Author

sadhikariSteep

Participants

dosubot[bot]

sadhikariSteep

Timeline (top)

cross-referenced ×2labeled ×2commented ×1

Root Cause

Root Cause In llama_index/llms/ollama/base.py, both stream_chat and astream_chat have this guard:

Fix Action

Fixed

Fixed by PR: fix: preserve thinking-only chunks in Ollama streaming (https://github.com/run-llama/llama_index/pull/21234)

PR fix notes

PR #21234: fix: preserve thinking-only chunks in Ollama streaming

Repository: run-llama/llama_index
Author: joaquinhuigomez
State: open | merged: False
Link: https://github.com/run-llama/llama_index/pull/21234

Description (problem / solution / changelog)

Don't skip chunks that have thinking content but no message content in stream_chat/astream_chat.

Fixes #21232

Changed files

llama-index-integrations/llms/llama-index-llms-ollama/llama_index/llms/ollama/base.py (modified, +6/-2)

Code Example

pip install llama-index-llms-ollama==0.10.1 llama-index-core==0.14.19
Pull an Ollama model that supports thinking (e.g., DeepSeek-R1)

---

from llama_index.llms.ollama import Ollama
from llama_index.core.llms import ChatMessage

llm = Ollama(model="deepseek-r1", request_timeout=120, thinking=True)
messages = [ChatMessage(role="user", content="What is 25 * 25?")]

# --- Non-streaming (works correctly) ---
response = llm.chat(messages)
print("Thinking:", response.additional_kwargs.get("thinking"))  # ✅ Has thinking content
print("Answer:", response.message.content)

# --- Streaming (thinking is lost) ---
thinking_chunks = []
text_chunks = []
for chunk in llm.stream_chat(messages):
    thinking = (chunk.additional_kwargs or {}).get("thinking_delta", "")
    if thinking:
        thinking_chunks.append(thinking)
    if chunk.delta:
        text_chunks.append(chunk.delta)

print("Thinking chunks received:", len(thinking_chunks))  # ❌ Always 0
print("Text chunks received:", len(text_chunks))

---

RAW_BUFFERClick to expand / collapse

Bug Description

Bug Description When streaming responses from Ollama models that support extended thinking (e.g., DeepSeek-R1, QwQ), chunks where content=None but thinking is present are silently dropped. This means all thinking content is lost during streaming, even though think=True is properly passed to the Ollama API.

The non-streaming chat() and achat() methods handle thinking blocks correctly — only the streaming paths are affected.

Root Cause In llama_index/llms/ollama/base.py, both stream_chat and astream_chat have this guard:

if r["message"]["content"] is None: continue During the thinking phase, Ollama sends chunks where content is None and thinking contains the reasoning text. This guard skips those chunks entirely before the ThinkingBlock accumulation logic can process them.

Suggested Fix Change the guard in both stream_chat (~line 474) and astream_chat (~line 558) to:

if r["message"]["content"] is None and not r["message"].get("thinking"): continue This allows thinking-only chunks through while still skipping truly empty chunks.

Environment llama-index-llms-ollama: 0.10.1 llama-index-core: 0.14.19 Ollama model: DeepSeek-R1 (any model with think=True support)

Version

0.14.19

Steps to Reproduce

Steps to Reproduce Install packages

pip install llama-index-llms-ollama==0.10.1 llama-index-core==0.14.19
Pull an Ollama model that supports thinking (e.g., DeepSeek-R1)

ollama pull deepseek-r1 Run this script:

from llama_index.llms.ollama import Ollama
from llama_index.core.llms import ChatMessage

llm = Ollama(model="deepseek-r1", request_timeout=120, thinking=True)
messages = [ChatMessage(role="user", content="What is 25 * 25?")]

# --- Non-streaming (works correctly) ---
response = llm.chat(messages)
print("Thinking:", response.additional_kwargs.get("thinking"))  # ✅ Has thinking content
print("Answer:", response.message.content)

# --- Streaming (thinking is lost) ---
thinking_chunks = []
text_chunks = []
for chunk in llm.stream_chat(messages):
    thinking = (chunk.additional_kwargs or {}).get("thinking_delta", "")
    if thinking:
        thinking_chunks.append(thinking)
    if chunk.delta:
        text_chunks.append(chunk.delta)

print("Thinking chunks received:", len(thinking_chunks))  # ❌ Always 0
print("Text chunks received:", len(text_chunks))

Expected Behavior thinking_chunks should contain the model's reasoning tokens, just like the non-streaming chat() returns them in additional_kwargs["thinking"].

Actual Behavior thinking_chunks is always empty. All chunks where content=None (which is every chunk during the thinking phase) are dropped by the continue guard before thinking_delta is ever set.

Relevant Logs/Tracbacks

extent analysis

TL;DR

Update the guard condition in stream_chat and astream_chat to allow thinking-only chunks to pass through.

Guidance

Identify the lines of code causing the issue: stream_chat (~line 474) and astream_chat (~line 558) in llama_index/llms/ollama/base.py.
Update the guard condition to if r["message"]["content"] is None and not r["message"].get("thinking"): continue to prevent skipping thinking-only chunks.
Verify the fix by running the provided script and checking if thinking_chunks is populated correctly.
Ensure the Ollama model supports thinking (e.g., DeepSeek-R1) and think=True is passed to the Ollama API.

Example

# Updated guard condition
if r["message"]["content"] is None and not r["message"].get("thinking"):
    continue

Notes

This fix assumes that the issue is solely caused by the guard condition in stream_chat and astream_chat. If the problem persists after applying the fix, further investigation may be necessary.

Recommendation

Apply the suggested workaround by updating the guard condition, as it directly addresses the identified root cause and allows thinking-only chunks to be processed correctly.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #authentication setup #request error #file not found #serialization error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

llamaIndex - ✅(Solved) Fix [Bug]: stream_chat / astream_chat in llama-index-llms-ollama drops thinking-only chunks [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #21234: fix: preserve thinking-only chunks in Ollama streaming

Description (problem / solution / changelog)

Changed files

Code Example

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

llamaIndex - ✅(Solved) Fix [Bug]: stream_chat / astream_chat in llama-index-llms-ollama drops thinking-only chunks [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #21234: fix: preserve thinking-only chunks in Ollama streaming

Description (problem / solution / changelog)

Changed files

Code Example

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING