litellm - ✅(Solved) Fix WebSocket /v1/responses requires ?model= query param, breaking OpenAI spec compatibility [1 pull requests, 1 participants]

Q: Expected behavior

The endpoint should accept `model` from the `response.create` WebSocket message payload (matching the OpenAI spec), not require it as a URL query parameter. The `model` query param should either be optional or removed entirely.

litellm2026-04-10 21:12:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25532•Fetched 2026-04-11 06:13:33

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ashudeep

Participants

ashudeep

Timeline (top)

labeled ×1

Root Cause

In litellm/proxy/response_api_endpoints/endpoints.py:

@router.websocket("/v1/responses")
@router.websocket("/responses")
async def responses_websocket_endpoint(
    websocket: WebSocket,
    model: str = fastapi.Query(
        ..., description="The model to use for the responses WebSocket session."
    ),
    user_api_key_dict=Depends(user_api_key_auth_websocket),
):

model: str = fastapi.Query(...) makes it a required URL query parameter. When a client connects without it, FastAPI returns 403 at the framework level before the handler or auth logic runs.

Fix Action

Fix / Workaround

This WORKS (non-spec workaround)

async with connect(
    "ws://localhost:4000/v1/responses?model=gpt-5.4",
    additional_headers={"Authorization": "Bearer sk-1234"},
) as ws:
    await ws.send('{"type":"response.create","model":"gpt-5.4","input":"say hi"}')

PR fix notes

PR #25702: fix(responses-ws): make model query param optional, extract from first message

Repository: BerriAI/litellm
Author: ashudeep
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/25702

Description (problem / solution / changelog)

Summary

The WebSocket endpoint for /v1/responses requires model as a mandatory URL query parameter (fastapi.Query(...)). However, the OpenAI WebSocket spec sends model inside the response.create message payload, not as a URL query param.

This means any spec-compliant client (e.g. Codex CLI) connecting to ws://.../v1/responses without ?model= gets a 403 Forbidden, because FastAPI rejects the missing required query param before the handler or auth logic even runs.

Changes

_ReplayWebSocket wrapper class: thin wrapper that replays one pre-read message on the first receive_text() call, so downstream handlers (ManagedResponsesWebSocketHandler, ResponsesWebSocketStreaming) see the message as normal.
Make model query param optional (default None): clients can still pass ?model= for backward compatibility.
Extract model from first message: when model is None, the handler reads the first WebSocket message after accept(), extracts the model from the response.create payload (supporting both nested {"response": {"model": "..."}} and flat {"model": "..."} formats), wraps the websocket with _ReplayWebSocket so the message is not lost, then proceeds with normal pre-call processing and routing.

Reproduction

import asyncio
from websockets.asyncio.client import connect

async def main():
    # Before fix: 403 Forbidden
    # After fix: connects and streams normally
    async with connect(
        "ws://localhost:4000/v1/responses",
        additional_headers={"Authorization": "Bearer sk-1234"},
    ) as ws:
        await ws.send('{"type":"response.create","response":{"model":"gpt-4o","input":"say hi"}}')
        async for msg in ws:
            print(msg)

asyncio.run(main())

Testing

Verified end-to-end on a live GKE cluster:

Deployed patched LiteLLM v1.82.3 pod using init-container overlay
Pointed Codex CLI 0.118.0 at the patched proxy
Codex connected via WebSocket (no 403), model extracted from first response.create message
LLM call routed successfully, full response received (input_tokens: 31496, output_tokens: 178)
?model= query param still works (backward compatible)

Fixes #25532
Related to #22051 (WebSocket mode support)

Changed files

litellm/proxy/response_api_endpoints/endpoints.py (modified, +75/-2)

Code Example

@router.websocket("/v1/responses")
@router.websocket("/responses")
async def responses_websocket_endpoint(
    websocket: WebSocket,
    model: str = fastapi.Query(
        ..., description="The model to use for the responses WebSocket session."
    ),
    user_api_key_dict=Depends(user_api_key_auth_websocket),
):

---

import asyncio
from websockets.asyncio.client import connect

async def main():
    # This FAILS with 403 (how Codex CLI and spec-compliant clients connect)
    async with connect(
        "ws://localhost:4000/v1/responses",
        additional_headers={"Authorization": "Bearer sk-1234"},
    ) as ws:
        await ws.send('{"type":"response.create","model":"gpt-5.4","input":"say hi"}')

    # This WORKS (non-spec workaround)
    async with connect(
        "ws://localhost:4000/v1/responses?model=gpt-5.4",
        additional_headers={"Authorization": "Bearer sk-1234"},
    ) as ws:
        await ws.send('{"type":"response.create","model":"gpt-5.4","input":"say hi"}')

asyncio.run(main())

RAW_BUFFERClick to expand / collapse

Bug

The WebSocket endpoint for /v1/responses requires model as a URL query parameter (fastapi.Query(...)), but the OpenAI WebSocket spec sends model inside the response.create message payload, not as a query param.

This means any spec-compliant client (e.g. Codex CLI) connecting to ws://.../v1/responses without ?model= gets a 403 Forbidden — FastAPI rejects the missing required query param before auth even runs.

Root cause

In litellm/proxy/response_api_endpoints/endpoints.py:

@router.websocket("/v1/responses")
@router.websocket("/responses")
async def responses_websocket_endpoint(
    websocket: WebSocket,
    model: str = fastapi.Query(
        ..., description="The model to use for the responses WebSocket session."
    ),
    user_api_key_dict=Depends(user_api_key_auth_websocket),
):

model: str = fastapi.Query(...) makes it a required URL query parameter. When a client connects without it, FastAPI returns 403 at the framework level before the handler or auth logic runs.

Reproduction

import asyncio
from websockets.asyncio.client import connect

async def main():
    # This FAILS with 403 (how Codex CLI and spec-compliant clients connect)
    async with connect(
        "ws://localhost:4000/v1/responses",
        additional_headers={"Authorization": "Bearer sk-1234"},
    ) as ws:
        await ws.send('{"type":"response.create","model":"gpt-5.4","input":"say hi"}')

    # This WORKS (non-spec workaround)
    async with connect(
        "ws://localhost:4000/v1/responses?model=gpt-5.4",
        additional_headers={"Authorization": "Bearer sk-1234"},
    ) as ws:
        await ws.send('{"type":"response.create","model":"gpt-5.4","input":"say hi"}')

asyncio.run(main())

Expected behavior

The endpoint should accept model from the response.create WebSocket message payload (matching the OpenAI spec), not require it as a URL query parameter. The model query param should either be optional or removed entirely.

Impact

Codex CLI always connects to bare ws://.../v1/responses per the OpenAI spec → gets 403 → retries 5 times (~6-8 seconds) → falls back to HTTP. This adds ~6-8s latency per turn.
Any client following the OpenAI WebSocket spec will hit the same issue.

Environment

LiteLLM version: tested on both v1.82.3-stable and v1.83.5-nightly (same behavior)
Codex CLI: @openai/[email protected]
Running LiteLLM as proxy in router mode with models configured via config.yaml

Related issues

#22051 — WebSocket mode support for Responses API (covers streaming bugs after connection, not the connection failure itself)
#18456 — WebSocket auth header issue for /v1/realtime (different endpoint)

extent analysis

TL;DR

The WebSocket endpoint for /v1/responses should be modified to accept the model from the response.create message payload instead of requiring it as a URL query parameter.

Guidance

Modify the responses_websocket_endpoint function to remove the model query parameter requirement, allowing the endpoint to accept the model from the WebSocket message payload.
Update the endpoint to parse the model from the response.create message payload and use it for the WebSocket session.
Consider adding validation to ensure the model is provided in the message payload to maintain security and functionality.
Review related issues, such as #22051 and #18456, to ensure that the changes do not introduce new bugs or conflicts with existing functionality.

Example

@router.websocket("/v1/responses")
@router.websocket("/responses")
async def responses_websocket_endpoint(
    websocket: WebSocket,
    user_api_key_dict=Depends(user_api_key_auth_websocket),
):
    # Parse the model from the response.create message payload
    async for message in websocket:
        if message.type == "response.create":
            model = message.json()["model"]
            # Use the model for the WebSocket session
            # ...

Notes

The provided solution assumes that the model is always provided in the response.create message payload. Additional validation and error handling may be necessary to ensure the endpoint behaves correctly in all scenarios.

Recommendation

Apply a workaround by modifying the responses_websocket_endpoint function to accept the model from the message payload, as this change aligns with the OpenAI WebSocket spec and resolves the connection issue for spec-compliant clients.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #output truncation #response parsing #generation error #database connection

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix WebSocket /v1/responses requires ?model= query param, breaking OpenAI spec compatibility [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

This WORKS (non-spec workaround)

PR fix notes

PR #25702: fix(responses-ws): make model query param optional, extract from first message

Description (problem / solution / changelog)

Summary

Changes

Reproduction

Testing

Related

Changed files

Code Example

Bug

Root cause

Reproduction

Expected behavior

Impact

Environment

Related issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING