litellm - 💡(How to fix) Fix When reasoning_effort is set to none, thinking should be automatically disabled(for deepseek-v4)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
        "model": "deepseek-v4-pro",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Hello!"}
        ],
        "thinking": {"type": "enable"},
        "reasoning_effort": "high",
        "stream": false
      }'

---

{"id":"b25e727f-3f07-4923-9bb0-bc612d8f6f94","object":"chat.completion","created":1778226044,"model":"deepseek-v4-pro","choices":[{"index":0,"message":{"role":"assistant","content":"Hi there! How can I help you today?","reasoning_content":"We need to respond appropriately. The user said \"Hello!\" so it's a simple greeting. I'll respond with a friendly greeting and ask how I can assist."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":44,"total_tokens":56,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens_details":{"reasoning_tokens":33},"prompt_cache_hit_tokens":0,"prompt_cache_miss_tokens":12},"system_fingerprint":"fp_9954b31ca7_prod0820_fp8_kvcache_20260402"}

---

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
        "model": "deepseek-v4-pro",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Hello!"}
        ],
        "stream": false
      }'

---

{"id":"b25e727f-3f07-4923-9bb0-bc612d8f6f94","object":"chat.completion","created":1778226044,"model":"deepseek-v4-pro","choices":[{"index":0,"message":{"role":"assistant","content":"Hi there! How can I help you today?","reasoning_content":"We need to respond appropriately. The user said \"Hello!\" so it's a simple greeting. I'll respond with a friendly greeting and ask how I can assist."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":44,"total_tokens":56,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens_details":{"reasoning_tokens":33},"prompt_cache_hit_tokens":0,"prompt_cache_miss_tokens":12},"system_fingerprint":"fp_9954b31ca7_prod0820_fp8_kvcache_20260402"}

---

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
        "model": "deepseek-v4-pro",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Hello!"}
        ],
        "thinking": {"type": "disabled"},
        "stream": false
      }'

---

{"id":"07826860-e8db-4772-b59e-24ecf2f57249","object":"chat.completion","created":1778226222,"model":"deepseek-v4-pro","choices":[{"index":0,"message":{"role":"assistant","content":"Hi there! How can I help you today?"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":10,"total_tokens":22,"prompt_tokens_details":{"cached_tokens":0},"prompt_cache_hit_tokens":0,"prompt_cache_miss_tokens":12},"system_fingerprint":"fp_9954b31ca7_prod0820_fp8_kvcache_20260402"}
RAW_BUFFERClick to expand / collapse

Problem Description

When testing DeepSeek v4 official API calls, I observed that if thinking and reasoning_effort are not explicitly set, the model defaults to thinking mode (returns reasoning_content). However, when a user wants to completely disable thinking (i.e., reasoning_effort = none), they currently must manually add "thinking": {"type": "disabled"} to achieve the expected behavior.

This creates inconsistency and extra work for LiteLLM users who expect reasoning_effort: none to fully disable reasoning without needing to also set thinking: disabled. LiteLLM should automatically translate reasoning_effort: none into thinking: {"type": "disabled"} when calling the DeepSeek v4 API.

Reproduction Steps

Test 1: Explicit enable + high reasoning effort

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
        "model": "deepseek-v4-pro",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Hello!"}
        ],
        "thinking": {"type": "enable"},
        "reasoning_effort": "high",
        "stream": false
      }'

Response 1 (includes reasoning_content):

{"id":"b25e727f-3f07-4923-9bb0-bc612d8f6f94","object":"chat.completion","created":1778226044,"model":"deepseek-v4-pro","choices":[{"index":0,"message":{"role":"assistant","content":"Hi there! How can I help you today?","reasoning_content":"We need to respond appropriately. The user said \"Hello!\" so it's a simple greeting. I'll respond with a friendly greeting and ask how I can assist."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":44,"total_tokens":56,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens_details":{"reasoning_tokens":33},"prompt_cache_hit_tokens":0,"prompt_cache_miss_tokens":12},"system_fingerprint":"fp_9954b31ca7_prod0820_fp8_kvcache_20260402"}

Test 2: No thinking / reasoning_effort set (default behavior)

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
        "model": "deepseek-v4-pro",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Hello!"}
        ],
        "stream": false
      }'

Response 2 (also includes reasoning_content – default thinking mode):

{"id":"b25e727f-3f07-4923-9bb0-bc612d8f6f94","object":"chat.completion","created":1778226044,"model":"deepseek-v4-pro","choices":[{"index":0,"message":{"role":"assistant","content":"Hi there! How can I help you today?","reasoning_content":"We need to respond appropriately. The user said \"Hello!\" so it's a simple greeting. I'll respond with a friendly greeting and ask how I can assist."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":44,"total_tokens":56,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens_details":{"reasoning_tokens":33},"prompt_cache_hit_tokens":0,"prompt_cache_miss_tokens":12},"system_fingerprint":"fp_9954b31ca7_prod0820_fp8_kvcache_20260402"}

Test 3: Explicitly disable thinking

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
        "model": "deepseek-v4-pro",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Hello!"}
        ],
        "thinking": {"type": "disabled"},
        "stream": false
      }'

Response 3 (no reasoning_content):

{"id":"07826860-e8db-4772-b59e-24ecf2f57249","object":"chat.completion","created":1778226222,"model":"deepseek-v4-pro","choices":[{"index":0,"message":{"role":"assistant","content":"Hi there! How can I help you today?"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":10,"total_tokens":22,"prompt_tokens_details":{"cached_tokens":0},"prompt_cache_hit_tokens":0,"prompt_cache_miss_tokens":12},"system_fingerprint":"fp_9954b31ca7_prod0820_fp8_kvcache_20260402"}

Expected Behavior

When a user (via LiteLLM) sets reasoning_effort: "none", they intend no reasoning at all. The API should automatically behave as if thinking: {"type": "disabled"} were set. This would avoid unnecessary reasoning_content field and token consumption.

Actual Behavior

Currently, if LiteLLM passes only reasoning_effort: none to DeepSeek without also setting thinking: disabled, the model may still default to thinking mode (as shown in Test 2). This contradicts the user's intention and wastes tokens.

Suggestion for LiteLLM

Therefore, when LiteLLM translates reasoning_effort: "none" into the DeepSeek v4 API request, it should actively add "thinking": {"type": "disabled"}. This would make the behavior consistent with the parameter semantics and remove the extra configuration burden from LiteLLM users.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix When reasoning_effort is set to none, thinking should be automatically disabled(for deepseek-v4)