litellm - ✅(Solved) Fix [Feature]: Auto-route Azure GPT-5.4 models to Responses API when reasoning_effort + tools are used [2 pull requests, 1 participants]

litellm2026-03-17 23:32:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#23914•Fetched 2026-04-08 00:53:53

View on GitHub

Comments

Participants

Timeline

Reactions

Author

yanndupis

Participants

yanndupis

Timeline (top)

cross-referenced ×2closed ×1labeled ×1referenced ×1

Root Cause

When using litellm.acompletion() with reasoning_effort and tools on an Azure OpenAI GPT-5.4 deployment, reasoning_effort is silently dropped (with drop_params=True) because Azure's Chat Completions API doesn't support combining the two.

Fix Action

Fixed

Fixed by PR: feat(openai): route gpt-5.4+ tools+reasoning to Responses API (https://github.com/BerriAI/litellm/pull/23577)
Fixed by PR: fix(azure): auto-route gpt-5.4+ tools+reasoning to Responses API (https://github.com/BerriAI/litellm/pull/23926)

PR fix notes

PR #23577: feat(openai): route gpt-5.4+ tools+reasoning to Responses API

Repository: BerriAI/litellm
Author: Sameerlite
State: closed | merged: True
Link: https://github.com/BerriAI/litellm/pull/23577

Description (problem / solution / changelog)

Summary

This PR changes how gpt-5.4+ requests with both tools and reasoning_effort are handled:

Route instead of drop: Instead of silently dropping reasoning_effort when tools are present, we route these requests to the Responses API, which supports both tools and reasoning.
Dict normalization: We normalize reasoning_effort dicts (e.g. {"effort": "high", "summary": "detailed"}) to strings for the Chat Completions API, and restore the full dict when routing to the Responses API.
Summary preservation: When routing to Responses, we restore the original reasoning_effort dict (including summary) before calling the Responses API bridge.

Response to Review Comments

1. Hardcoded model detection in main.py

The routing decision is driven by an API constraint, not a model capability. OpenAI's Chat Completions API rejects reasoning_effort when tools are present for gpt-5.4+. The Responses API supports both. This is an API boundary, not a model-specific flag.

is_model_gpt_5_4_plus_model() uses version parsing (gpt-5.4, gpt-5.5, gpt-5.6, etc.), so future models like gpt-5.5 and gpt-5.6 are handled without code changes. The only "hardcoded" part is the version threshold (4), which reflects when this API behavior was introduced.

model_prices_and_context_window.json is used for pricing and context windows. This routing rule is about which API endpoint to use, not pricing or capabilities. Adding a JSON flag would require updates for every new model; the version-based check keeps behavior consistent and self-documenting.

2. Provider-specific import in main.py

main.py already imports provider-specific modules for routing (e.g. OpenAIWebSearchOptions, VertexAIModelRoute). The routing logic needs to know whether the model is an OpenAI gpt-5.4+ model. That check is centralized in OpenAIGPT5Config, which is the same pattern used elsewhere for OpenAI-specific behavior.

3. Fragile summary-field restoration

Restoration uses reasoning_effort from the completion call's kwargs, which is the common path when users pass it directly. Deployment configs that inject reasoning_effort into optional_params before the call are a separate, less common path. The current change fixes the main case (direct argument) and aligns with how other params are handled. Improving the deployment-config path can be done in a follow-up if needed.

4. Silent summary drop for non-gpt-5.4+ models

The Chat Completions API expects reasoning_effort as a string ("none", "low", "medium", "high", "xhigh"), not a dict. Sending {"effort": "high", "summary": "detailed"} would be rejected by OpenAI. Normalizing to a string is required for chat completions. The summary field is only used by the Responses API. For chat-path models, we must send a string; the normalization is correct and not a "silent drop."

5. Pre-submission checklist / GitHub issue

Checklist items will be completed before merge. A related issue can be linked if one exists.

Changed files

tests/test_litellm/llms/openai/chat/test_openai_gpt_transformation.py (modified, +27/-31)

PR #23926: fix(azure): auto-route gpt-5.4+ tools+reasoning to Responses API

Repository: BerriAI/litellm
Author: Chesars
State: closed | merged: True
Link: https://github.com/BerriAI/litellm/pull/23926

Description (problem / solution / changelog)

Relevant issues

Fixes #23914

Pre-Submission checklist

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Changes

Azure GPT-5.4+ models silently drop reasoning_effort when tools are also present in litellm.completion(). OpenAI already auto-routes these requests to the Responses API (which supports both params), but Azure was excluded from this routing.

Fix

litellm/main.py — Moved the gpt-5.4+ auto-routing check out of the try block (Azure models aren't in model_cost map) and extended it to include custom_llm_provider == "azure" alongside "openai".
litellm/llms/azure/chat/gpt_5_transformation.py — Removed the code that silently dropped reasoning_effort when tools were present for gpt-5.4+ models. This is no longer needed since requests are now routed to the Responses API bridge.
docs/my-website/docs/reasoning_content.md — Updated the docs tip to reflect that auto-routing now works for both OpenAI and Azure.

Testing

2 new unit tests for Azure routing in test_main.py
Updated existing test from asserting reasoning_effort is dropped to asserting it's preserved
Verified e2e with Azure gpt-5-nano deployment (same deployment serves both /chat/completions and /responses endpoints)

Changed files

docs/my-website/docs/reasoning_content.md (modified, +19/-2)
litellm/llms/azure/chat/gpt_5_transformation.py (modified, +3/-8)
litellm/main.py (modified, +13/-10)
tests/test_litellm/llms/azure/chat/test_azure_gpt5_transformation.py (modified, +5/-4)
tests/test_litellm/test_main.py (modified, +34/-0)

Code Example

litellm.aresponses(model='gpt-5.4-2026-03-05', ..., reasoning={'effort': 'medium'})
Creating HTTP client for responses API

---

litellm.acompletion(model='azure/gpt-5.4-2026-03-05', ..., reasoning_effort='medium')
POST .../deployments/gpt-5.4-2026-03-05/chat/completions
# reasoning_effort absent from json_data sent to Azure
# Response: reasoning_tokens: 0

RAW_BUFFERClick to expand / collapse

What happened?

For direct OpenAI, LiteLLM already auto-routes these calls to the Responses API via the ResponsesToCompletionBridgeHandler (the model config has mode: "responses"). This works perfectly — reasoning_effort is preserved and sent as reasoning.effort.

Azure models don't have mode: "responses" in their model config, so the bridge never kicks in. The call stays on /chat/completions and reasoning_effort is silently stripped from the request body.

Evidence

With LITELLM_LOG=DEBUG, same code, same reasoning_effort="medium" + tools:

OpenAI — auto-routed to Responses API:

litellm.aresponses(model='gpt-5.4-2026-03-05', ..., reasoning={'effort': 'medium'})
Creating HTTP client for responses API

Azure — stays on Chat Completions, reasoning_effort stripped:

litellm.acompletion(model='azure/gpt-5.4-2026-03-05', ..., reasoning_effort='medium')
POST .../deployments/gpt-5.4-2026-03-05/chat/completions
# reasoning_effort absent from json_data sent to Azure
# Response: reasoning_tokens: 0

Expected behavior

Azure GPT-5 models should get the same auto-routing treatment as direct OpenAI — either by setting mode: "responses" in the Azure model config, or by detecting the reasoning_effort + tools combination and routing through the existing bridge.

Related issues

#23577 — The PR that added this auto-routing for OpenAI (feat(openai): route gpt-5.4+ tools+reasoning to Responses API, merged Mar 13, 2026). Uses is_model_gpt_5_4_plus_model() and responses_api_bridge_check in main.py — but only for OpenAI, not Azure.
#14748 — Auto-route gpt-5 requests to /responses API (OpenAI, open/reopened)
#10116 — Azure Responses API support (merged, enables explicit azure/responses/ prefix)
#16766 — Make all gpt-5 models use responses by default (merged then reverted)

Environment

LiteLLM version: 1.82.3
Python: 3.10
Azure model: gpt-5.4-2026-03-05

extent analysis

Fix Plan

To fix the issue, we need to modify the main.py file to include Azure models in the auto-routing logic. We can achieve this by adding a check for Azure models in the responses_api_bridge_check function.

Code Changes

# main.py

def responses_api_bridge_check(model, params):
    # ... existing code ...
    if is_model_gpt_5_4_plus_model(model) and 'reasoning_effort' in params and 'tools' in params:
        # Add check for Azure models
        if 'azure' in model:
            return True
    # ... existing code ...

Alternatively, we can set mode: "responses" in the Azure model config to enable auto-routing.

Configuration Changes

# model_config.yml
models:
  - name: azure/gpt-5.4-2026-03-05
    mode: "responses"

Verification

To verify the fix, run the same code with LITELLM_LOG=DEBUG and check the logs for the auto-routing to the Responses API.

Extra Tips

Make sure to update the LiteLLM version to the latest release after applying the fix.
If you're using a custom model config, ensure that the mode field is set to "responses" for Azure models.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #ssr #installation #tensor shape #autograd error #model save/load

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Feature]: Auto-route Azure GPT-5.4 models to Responses API when reasoning_effort + tools are used [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #23577: feat(openai): route gpt-5.4+ tools+reasoning to Responses API

Description (problem / solution / changelog)

Summary

Response to Review Comments

Changed files

PR #23926: fix(azure): auto-route gpt-5.4+ tools+reasoning to Responses API

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Type

Changes

Fix

Testing

Changed files

Code Example

What happened?

Evidence

Expected behavior

Related issues

Environment

extent analysis

Fix Plan

Code Changes

Configuration Changes

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING