litellm - ✅(Solved) Fix [Feature]: Auto-route Azure GPT-5.4 models to Responses API when reasoning_effort + tools are used [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23914Fetched 2026-04-08 00:53:53
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×2closed ×1labeled ×1referenced ×1

Root Cause

When using litellm.acompletion() with reasoning_effort and tools on an Azure OpenAI GPT-5.4 deployment, reasoning_effort is silently dropped (with drop_params=True) because Azure's Chat Completions API doesn't support combining the two.

Fix Action

Fixed

PR fix notes

PR #23577: feat(openai): route gpt-5.4+ tools+reasoning to Responses API

Description (problem / solution / changelog)

Summary

This PR changes how gpt-5.4+ requests with both tools and reasoning_effort are handled:

  1. Route instead of drop: Instead of silently dropping reasoning_effort when tools are present, we route these requests to the Responses API, which supports both tools and reasoning.
  2. Dict normalization: We normalize reasoning_effort dicts (e.g. {"effort": "high", "summary": "detailed"}) to strings for the Chat Completions API, and restore the full dict when routing to the Responses API.
  3. Summary preservation: When routing to Responses, we restore the original reasoning_effort dict (including summary) before calling the Responses API bridge.

Response to Review Comments

1. Hardcoded model detection in main.py

The routing decision is driven by an API constraint, not a model capability. OpenAI's Chat Completions API rejects reasoning_effort when tools are present for gpt-5.4+. The Responses API supports both. This is an API boundary, not a model-specific flag.

is_model_gpt_5_4_plus_model() uses version parsing (gpt-5.4, gpt-5.5, gpt-5.6, etc.), so future models like gpt-5.5 and gpt-5.6 are handled without code changes. The only "hardcoded" part is the version threshold (4), which reflects when this API behavior was introduced.

model_prices_and_context_window.json is used for pricing and context windows. This routing rule is about which API endpoint to use, not pricing or capabilities. Adding a JSON flag would require updates for every new model; the version-based check keeps behavior consistent and self-documenting.

2. Provider-specific import in main.py

main.py already imports provider-specific modules for routing (e.g. OpenAIWebSearchOptions, VertexAIModelRoute). The routing logic needs to know whether the model is an OpenAI gpt-5.4+ model. That check is centralized in OpenAIGPT5Config, which is the same pattern used elsewhere for OpenAI-specific behavior.

3. Fragile summary-field restoration

Restoration uses reasoning_effort from the completion call's kwargs, which is the common path when users pass it directly. Deployment configs that inject reasoning_effort into optional_params before the call are a separate, less common path. The current change fixes the main case (direct argument) and aligns with how other params are handled. Improving the deployment-config path can be done in a follow-up if needed.

4. Silent summary drop for non-gpt-5.4+ models

The Chat Completions API expects reasoning_effort as a string ("none", "low", "medium", "high", "xhigh"), not a dict. Sending {"effort": "high", "summary": "detailed"} would be rejected by OpenAI. Normalizing to a string is required for chat completions. The summary field is only used by the Responses API. For chat-path models, we must send a string; the normalization is correct and not a "silent drop."

5. Pre-submission checklist / GitHub issue

Checklist items will be completed before merge. A related issue can be linked if one exists.

Changed files

  • tests/test_litellm/llms/openai/chat/test_openai_gpt_transformation.py (modified, +27/-31)

PR #23926: fix(azure): auto-route gpt-5.4+ tools+reasoning to Responses API

Description (problem / solution / changelog)

Relevant issues

Fixes #23914

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Changes

Azure GPT-5.4+ models silently drop reasoning_effort when tools are also present in litellm.completion(). OpenAI already auto-routes these requests to the Responses API (which supports both params), but Azure was excluded from this routing.

Fix

  1. litellm/main.py — Moved the gpt-5.4+ auto-routing check out of the try block (Azure models aren't in model_cost map) and extended it to include custom_llm_provider == "azure" alongside "openai".

  2. litellm/llms/azure/chat/gpt_5_transformation.py — Removed the code that silently dropped reasoning_effort when tools were present for gpt-5.4+ models. This is no longer needed since requests are now routed to the Responses API bridge.

  3. docs/my-website/docs/reasoning_content.md — Updated the docs tip to reflect that auto-routing now works for both OpenAI and Azure.

Testing

  • 2 new unit tests for Azure routing in test_main.py
  • Updated existing test from asserting reasoning_effort is dropped to asserting it's preserved
  • Verified e2e with Azure gpt-5-nano deployment (same deployment serves both /chat/completions and /responses endpoints)

Changed files

  • docs/my-website/docs/reasoning_content.md (modified, +19/-2)
  • litellm/llms/azure/chat/gpt_5_transformation.py (modified, +3/-8)
  • litellm/main.py (modified, +13/-10)
  • tests/test_litellm/llms/azure/chat/test_azure_gpt5_transformation.py (modified, +5/-4)
  • tests/test_litellm/test_main.py (modified, +34/-0)

Code Example

litellm.aresponses(model='gpt-5.4-2026-03-05', ..., reasoning={'effort': 'medium'})
Creating HTTP client for responses API

---

litellm.acompletion(model='azure/gpt-5.4-2026-03-05', ..., reasoning_effort='medium')
POST .../deployments/gpt-5.4-2026-03-05/chat/completions
# reasoning_effort absent from json_data sent to Azure
# Response: reasoning_tokens: 0
RAW_BUFFERClick to expand / collapse

What happened?

When using litellm.acompletion() with reasoning_effort and tools on an Azure OpenAI GPT-5.4 deployment, reasoning_effort is silently dropped (with drop_params=True) because Azure's Chat Completions API doesn't support combining the two.

For direct OpenAI, LiteLLM already auto-routes these calls to the Responses API via the ResponsesToCompletionBridgeHandler (the model config has mode: "responses"). This works perfectly — reasoning_effort is preserved and sent as reasoning.effort.

Azure models don't have mode: "responses" in their model config, so the bridge never kicks in. The call stays on /chat/completions and reasoning_effort is silently stripped from the request body.

Evidence

With LITELLM_LOG=DEBUG, same code, same reasoning_effort="medium" + tools:

OpenAI — auto-routed to Responses API:

litellm.aresponses(model='gpt-5.4-2026-03-05', ..., reasoning={'effort': 'medium'})
Creating HTTP client for responses API

Azure — stays on Chat Completions, reasoning_effort stripped:

litellm.acompletion(model='azure/gpt-5.4-2026-03-05', ..., reasoning_effort='medium')
POST .../deployments/gpt-5.4-2026-03-05/chat/completions
# reasoning_effort absent from json_data sent to Azure
# Response: reasoning_tokens: 0

Expected behavior

Azure GPT-5 models should get the same auto-routing treatment as direct OpenAI — either by setting mode: "responses" in the Azure model config, or by detecting the reasoning_effort + tools combination and routing through the existing bridge.

Related issues

  • #23577 — The PR that added this auto-routing for OpenAI (feat(openai): route gpt-5.4+ tools+reasoning to Responses API, merged Mar 13, 2026). Uses is_model_gpt_5_4_plus_model() and responses_api_bridge_check in main.py — but only for OpenAI, not Azure.
  • #14748 — Auto-route gpt-5 requests to /responses API (OpenAI, open/reopened)
  • #10116 — Azure Responses API support (merged, enables explicit azure/responses/ prefix)
  • #16766 — Make all gpt-5 models use responses by default (merged then reverted)

Environment

  • LiteLLM version: 1.82.3
  • Python: 3.10
  • Azure model: gpt-5.4-2026-03-05

extent analysis

Fix Plan

To fix the issue, we need to modify the main.py file to include Azure models in the auto-routing logic. We can achieve this by adding a check for Azure models in the responses_api_bridge_check function.

Code Changes

# main.py

def responses_api_bridge_check(model, params):
    # ... existing code ...
    if is_model_gpt_5_4_plus_model(model) and 'reasoning_effort' in params and 'tools' in params:
        # Add check for Azure models
        if 'azure' in model:
            return True
    # ... existing code ...

Alternatively, we can set mode: "responses" in the Azure model config to enable auto-routing.

Configuration Changes

# model_config.yml
models:
  - name: azure/gpt-5.4-2026-03-05
    mode: "responses"

Verification

To verify the fix, run the same code with LITELLM_LOG=DEBUG and check the logs for the auto-routing to the Responses API.

Extra Tips

  • Make sure to update the LiteLLM version to the latest release after applying the fix.
  • If you're using a custom model config, ensure that the mode field is set to "responses" for Azure models.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Azure GPT-5 models should get the same auto-routing treatment as direct OpenAI — either by setting mode: "responses" in the Azure model config, or by detecting the reasoning_effort + tools combination and routing through the existing bridge.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING