litellm - 💡(How to fix) Fix [Bug]: Ollama reasoning_content always null — /api/generate doesn't return thinking field

litellm2026-05-14 20:28:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

litellm/llms/ollama/completion/transformation.py always calls /api/generate:

# line ~488
url = f"{api_base}/api/generate"

transform_response then reads:

response_text = response_json.get("response", "")
reasoning_content, content = _parse_content_for_reasoning(response_text)

The /api/generate endpoint does not return a thinking field. Its response looks like:

{"response": "4.", "context": [...]}

In contrast, /api/chat does return thinking as a separate field:

{"message": {"content": "4.", "thinking": "The user asked what 2+2 is..."}}

_parse_content_for_reasoning looks for <think> XML tags in the response string. Qwen3 via /api/generate doesn't embed those tags in the response field — so reasoning_content ends up None unconditionally.

Fix Action

Fix / Workaround

Option B is architecturally cleaner; Option A is the minimal patch.

Code Example

# line ~488
url = f"{api_base}/api/generate"

---

response_text = response_json.get("response", "")
reasoning_content, content = _parse_content_for_reasoning(response_text)

---

{"response": "4.", "context": [...]}

---

{"message": {"content": "4.", "thinking": "The user asked what 2+2 is..."}}

---

curl -X POST http://localhost:11434/api/chat \
  -d '{"model": "qwen3-vl:8b", "messages": [{"role": "user", "content": "What is 2+2?"}], "stream": false}'

# response:
# {"message": {"role": "assistant", "content": "4.", "thinking": "The user asked..."}}

---

# In the non-streaming path
ollama_message = response_json.get("message", {})
thinking = ollama_message.get("thinking")
content = ollama_message.get("content") or response_json.get("response", "")
if thinking:
    reasoning_content = thinking
else:
    reasoning_content, content = _parse_content_for_reasoning(content)

RAW_BUFFERClick to expand / collapse

Describe the bug

When using Ollama with thinker models (Qwen3, DeepSeek-R1 variants), reasoning_content is always null in the LiteLLM response even though the model generates extensive internal reasoning. This means any downstream observability (Langfuse, etc.) loses the reasoning chain entirely.

Environment

LiteLLM version: 1.83.10
Provider: Ollama (self-hosted)
Models affected: qwen3-vl:8b, qwen3.6:27b (and any Qwen3/DeepSeek-R1 variant via Ollama)

Root cause

litellm/llms/ollama/completion/transformation.py always calls /api/generate:

# line ~488
url = f"{api_base}/api/generate"

transform_response then reads:

response_text = response_json.get("response", "")
reasoning_content, content = _parse_content_for_reasoning(response_text)

The /api/generate endpoint does not return a thinking field. Its response looks like:

{"response": "4.", "context": [...]}

In contrast, /api/chat does return thinking as a separate field:

{"message": {"content": "4.", "thinking": "The user asked what 2+2 is..."}}

Verification

Calling Ollama's /api/chat directly returns the thinking field correctly:

curl -X POST http://localhost:11434/api/chat \
  -d '{"model": "qwen3-vl:8b", "messages": [{"role": "user", "content": "What is 2+2?"}], "stream": false}'

# response:
# {"message": {"role": "assistant", "content": "4.", "thinking": "The user asked..."}}

The thinking content is being generated — it just never reaches LiteLLM's response object.

Expected behaviour

response.choices[0].message.reasoning_content should contain the model's thinking chain when the model produces one.

Suggested fix

Option A (minimal): In transform_response, before falling back to _parse_content_for_reasoning on the response field, check whether the Ollama response contains a thinking key at the message level:

# In the non-streaming path
ollama_message = response_json.get("message", {})
thinking = ollama_message.get("thinking")
content = ollama_message.get("content") or response_json.get("response", "")
if thinking:
    reasoning_content = thinking
else:
    reasoning_content, content = _parse_content_for_reasoning(content)

Option B (cleaner): Switch the Ollama completion path from /api/generate to /api/chat. The response structure is different (message.content instead of response) but /api/chat is the canonical multi-turn API and has been stable for a long time. This would also fix streaming reasoning for free, since /api/chat streaming chunks already include thinking per chunk.

Option B is architecturally cleaner; Option A is the minimal patch.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #LLM response #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Ollama reasoning_content always null — /api/generate doesn't return thinking field

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Describe the bug

Environment

Root cause

Verification

Expected behaviour

Suggested fix

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Ollama reasoning_content always null — /api/generate doesn't return thinking field

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Describe the bug

Environment

Root cause

Verification

Expected behaviour

Suggested fix

Still need to ship something?

RELATED_DISCOVERY

TRENDING