litellm - ✅(Solved) Fix [Bug] AsyncCompletions.create() got an unexpected keyword argument 'enable_thinking' when using LiteLLM proxy [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25697Fetched 2026-04-16 06:37:06
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
0
Timeline (top)
labeled ×3referenced ×3commented ×1cross-referenced ×1

PR fix notes

PR #25777: fix(utils): allowed_openai_params must not forward unset params as None

Description (problem / solution / changelog)

Summary

Fixes #25697.

_apply_openai_param_overrides in litellm/utils.py iterated allowed_openai_params and unconditionally wrote

optional_params[param] = non_default_params.pop(param, None)

for every entry. If the caller listed a param name but did not actually send that param in the request, the pop returned None and None was still written into optional_params. That None then reached the provider SDK as a top-level kwarg, and the openai client rejected it:

AsyncCompletions.create() got an unexpected keyword argument 'enable_thinking'

Reproducer (from #25697)

The user had this in their model config:

allowed_openai_params:
  - chat_template_kwargs
  - enable_thinking

and sent this request body:

{
  "model": "GLM-5.1-FP8",
  "messages": [...],
  "chat_template_kwargs": {
    "enable_thinking": false
  }
}

enable_thinking only existed nested inside chat_template_kwargs. The helper should have forwarded chat_template_kwargs and left enable_thinking alone. Instead it wrote optional_params[\"enable_thinking\"] = None and the openai client blew up.

Fix

Only forward a param if it was actually present in non_default_params:

if allowed_openai_params:
    for param in allowed_openai_params:
        if param in optional_params:
            continue
        if param not in non_default_params:
            continue
        optional_params[param] = non_default_params.pop(param)
  • Happy path (param sent → still forwarded): unchanged.
  • Unset path (param not sent): no longer silently sets None.
  • Drops the defensive default on pop so a future reader does not assume None is a valid legitimate forwarded value.

Testing

Local sanity check (reproduces the issue against the old helper and confirms both regression + happy path pass against the new one):

old buggy helper output: {'chat_template_kwargs': {'enable_thinking': False}, 'enable_thinking': None}
old helper: bug confirmed (enable_thinking=None leaks through)
fixed helper: regression test PASS
fixed helper: happy path PASS (param sent → forwarded)

Added a regression test in tests/llm_translation/test_optional_params.py that exercises _apply_openai_param_overrides in isolation so it does not depend on any provider-specific map_openai_params plumbing:

def test_allowed_openai_params_does_not_forward_unset_params():
    from litellm.utils import _apply_openai_param_overrides
    chat_template_kwargs = {\"enable_thinking\": False}
    optional_params: dict = {}
    non_default_params = {\"chat_template_kwargs\": chat_template_kwargs}
    result = _apply_openai_param_overrides(
        optional_params=optional_params,
        non_default_params=non_default_params,
        allowed_openai_params=[\"chat_template_kwargs\", \"enable_thinking\"],
    )
    assert result[\"chat_template_kwargs\"] == chat_template_kwargs
    assert \"enable_thinking\" not in result

Scope

Single file change in the helper + one regression test. No provider config touched, no public API changed.

Changed files

  • litellm/llms/bedrock/chat/converse_transformation.py (modified, +13/-1)
  • litellm/model_prices_and_context_window_backup.json (modified, +52/-0)
  • litellm/proxy/guardrails/guardrail_hooks/unified_guardrail/unified_guardrail.py (modified, +18/-0)
  • litellm/proxy/proxy_server.py (modified, +1/-0)
  • litellm/proxy/utils.py (modified, +16/-0)
  • litellm/types/integrations/prometheus.py (modified, +2/-0)
  • litellm/utils.py (modified, +12/-2)
  • model_prices_and_context_window.json (modified, +52/-0)
  • tests/llm_translation/test_optional_params.py (modified, +39/-0)
  • tests/proxy_unit_tests/test_proxy_utils.py (modified, +25/-1)
  • tests/test_litellm/integrations/test_prometheus_labels.py (modified, +15/-0)
  • tests/test_litellm/llms/bedrock/chat/test_converse_transformation.py (modified, +148/-5)
  • tests/test_litellm/proxy/guardrails/guardrail_hooks/test_grayswan.py (modified, +60/-2)
  • tests/test_litellm/test_cost_calculator.py (modified, +25/-1)

Code Example

curl http://192.168.1.200:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "GLM-5.1-FP8",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize GLM-5 in one sentence."}
    ],
    "temperature": 1,
    "max_tokens": 4096,
    "chat_template_kwargs": {
      "enable_thinking": false
    }
  }'

Failing (Via LiteLLM Proxy)
curl https://<litellm-proxy>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-XXX" \
  -d '{
    "model": "GLM-5.1-FP8",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize GLM-5 in one sentence."}
    ],
    "temperature": 1,
    "max_tokens": 4096,
    "chat_template_kwargs": {
      "enable_thinking": false
    }
  }'
⚙️ Model / LiteLLM Configuration
{
  "model_name": "GLM-5.1-FP8",
  "provider": "custom_openai",
  "api_base": "http://host.docker.internal:8000/v1",
  "litellm_params": {
    "model": "GLM-5.1-FP8",
    "custom_llm_provider": "custom_openai",

    "api_base": "http://host.docker.internal:8000/v1",

    "use_in_pass_through": false,
    "use_litellm_proxy": false,

    "merge_reasoning_content_in_choices": false,

    "allowed_openai_params": [
      "chat_template_kwargs",
      "enable_thinking"
    ],

    "input_cost_per_token": "1e-06",
    "output_cost_per_token": "3.2e-06"
  },
  "model_info": {
    "mode": "chat",
    "db_model": true,
    "direct_access": true
  }
}
Actual Behavior
AsyncCompletions.create() got an unexpected keyword argument 'enable_thinking'

### Steps to Reproduce

1. Start LiteLLM Proxy and ensure it is configured to route requests to a custom OpenAI-compatible backend (GLM-5.1-FP8).

2. Send a direct request to the backend (this works as expected)
3. Send the same request through LiteLLM Proxy
4. Observe the failure response from LiteLLM Proxy:

### Relevant log output
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

This issue happens only when using LiteLLM Proxy.
Direct backend requests work correctly without any issue.


🔁 Steps to Reproduce

✅ Working (Direct backend request)

curl http://192.168.1.200:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "GLM-5.1-FP8",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize GLM-5 in one sentence."}
    ],
    "temperature": 1,
    "max_tokens": 4096,
    "chat_template_kwargs": {
      "enable_thinking": false
    }
  }'

❌ Failing (Via LiteLLM Proxy)
curl https://<litellm-proxy>/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-XXX" \
  -d '{
    "model": "GLM-5.1-FP8",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Summarize GLM-5 in one sentence."}
    ],
    "temperature": 1,
    "max_tokens": 4096,
    "chat_template_kwargs": {
      "enable_thinking": false
    }
  }'
⚙️ Model / LiteLLM Configuration
{
  "model_name": "GLM-5.1-FP8",
  "provider": "custom_openai",
  "api_base": "http://host.docker.internal:8000/v1",
  "litellm_params": {
    "model": "GLM-5.1-FP8",
    "custom_llm_provider": "custom_openai",

    "api_base": "http://host.docker.internal:8000/v1",

    "use_in_pass_through": false,
    "use_litellm_proxy": false,

    "merge_reasoning_content_in_choices": false,

    "allowed_openai_params": [
      "chat_template_kwargs",
      "enable_thinking"
    ],

    "input_cost_per_token": "1e-06",
    "output_cost_per_token": "3.2e-06"
  },
  "model_info": {
    "mode": "chat",
    "db_model": true,
    "direct_access": true
  }
}
❌ Actual Behavior
AsyncCompletions.create() got an unexpected keyword argument 'enable_thinking'

### Steps to Reproduce

1. Start LiteLLM Proxy and ensure it is configured to route requests to a custom OpenAI-compatible backend (GLM-5.1-FP8).

2. Send a direct request to the backend (this works as expected)
3. Send the same request through LiteLLM Proxy
4. Observe the failure response from LiteLLM Proxy:

### Relevant log output

```shell

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.81.12

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The issue can likely be fixed by removing the 'enable_thinking' parameter from the request or configuring LiteLLM Proxy to allow this parameter.

Guidance

  • Verify that the 'enable_thinking' parameter is not allowed in the LiteLLM Proxy configuration by checking the allowed_openai_params list in the litellm_params section.
  • Check if the custom OpenAI-compatible backend (GLM-5.1-FP8) supports the 'enable_thinking' parameter and if it is required for the specific use case.
  • Remove the 'enable_thinking' parameter from the request or add it to the allowed_openai_params list if it is supported by the backend.
  • Test the request again after making the necessary changes to ensure the issue is resolved.

Example

No code snippet is provided as the issue is related to configuration and parameter passing.

Notes

The issue is specific to the LiteLLM Proxy and the custom OpenAI-compatible backend, so the solution may vary depending on the specific configuration and requirements.

Recommendation

Apply workaround: Remove the 'enable_thinking' parameter from the request or configure LiteLLM Proxy to allow this parameter, as this is the most likely cause of the issue and a straightforward solution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING