litellm - 💡(How to fix) Fix [Bug]: Ollama reasoning_content always null — /api/generate doesn't return thinking field

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

litellm/llms/ollama/completion/transformation.py always calls /api/generate:

# line ~488
url = f"{api_base}/api/generate"

transform_response then reads:

response_text = response_json.get("response", "")
reasoning_content, content = _parse_content_for_reasoning(response_text)

The /api/generate endpoint does not return a thinking field. Its response looks like:

{"response": "4.", "context": [...]}

In contrast, /api/chat does return thinking as a separate field:

{"message": {"content": "4.", "thinking": "The user asked what 2+2 is..."}}

_parse_content_for_reasoning looks for <think> XML tags in the response string. Qwen3 via /api/generate doesn't embed those tags in the response field — so reasoning_content ends up None unconditionally.

Fix Action

Fix / Workaround

Option B is architecturally cleaner; Option A is the minimal patch.

Code Example

# line ~488
url = f"{api_base}/api/generate"

---

response_text = response_json.get("response", "")
reasoning_content, content = _parse_content_for_reasoning(response_text)

---

{"response": "4.", "context": [...]}

---

{"message": {"content": "4.", "thinking": "The user asked what 2+2 is..."}}

---

curl -X POST http://localhost:11434/api/chat \
  -d '{"model": "qwen3-vl:8b", "messages": [{"role": "user", "content": "What is 2+2?"}], "stream": false}'

# response:
# {"message": {"role": "assistant", "content": "4.", "thinking": "The user asked..."}}

---

# In the non-streaming path
ollama_message = response_json.get("message", {})
thinking = ollama_message.get("thinking")
content = ollama_message.get("content") or response_json.get("response", "")
if thinking:
    reasoning_content = thinking
else:
    reasoning_content, content = _parse_content_for_reasoning(content)
RAW_BUFFERClick to expand / collapse

Describe the bug

When using Ollama with thinker models (Qwen3, DeepSeek-R1 variants), reasoning_content is always null in the LiteLLM response even though the model generates extensive internal reasoning. This means any downstream observability (Langfuse, etc.) loses the reasoning chain entirely.

Environment

  • LiteLLM version: 1.83.10
  • Provider: Ollama (self-hosted)
  • Models affected: qwen3-vl:8b, qwen3.6:27b (and any Qwen3/DeepSeek-R1 variant via Ollama)

Root cause

litellm/llms/ollama/completion/transformation.py always calls /api/generate:

# line ~488
url = f"{api_base}/api/generate"

transform_response then reads:

response_text = response_json.get("response", "")
reasoning_content, content = _parse_content_for_reasoning(response_text)

The /api/generate endpoint does not return a thinking field. Its response looks like:

{"response": "4.", "context": [...]}

In contrast, /api/chat does return thinking as a separate field:

{"message": {"content": "4.", "thinking": "The user asked what 2+2 is..."}}

_parse_content_for_reasoning looks for <think> XML tags in the response string. Qwen3 via /api/generate doesn't embed those tags in the response field — so reasoning_content ends up None unconditionally.

Verification

Calling Ollama's /api/chat directly returns the thinking field correctly:

curl -X POST http://localhost:11434/api/chat \
  -d '{"model": "qwen3-vl:8b", "messages": [{"role": "user", "content": "What is 2+2?"}], "stream": false}'

# response:
# {"message": {"role": "assistant", "content": "4.", "thinking": "The user asked..."}}

The thinking content is being generated — it just never reaches LiteLLM's response object.

Expected behaviour

response.choices[0].message.reasoning_content should contain the model's thinking chain when the model produces one.

Suggested fix

Option A (minimal): In transform_response, before falling back to _parse_content_for_reasoning on the response field, check whether the Ollama response contains a thinking key at the message level:

# In the non-streaming path
ollama_message = response_json.get("message", {})
thinking = ollama_message.get("thinking")
content = ollama_message.get("content") or response_json.get("response", "")
if thinking:
    reasoning_content = thinking
else:
    reasoning_content, content = _parse_content_for_reasoning(content)

Option B (cleaner): Switch the Ollama completion path from /api/generate to /api/chat. The response structure is different (message.content instead of response) but /api/chat is the canonical multi-turn API and has been stable for a long time. This would also fix streaming reasoning for free, since /api/chat streaming chunks already include thinking per chunk.

Option B is architecturally cleaner; Option A is the minimal patch.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Ollama reasoning_content always null — /api/generate doesn't return thinking field