litellm - 💡(How to fix) Fix [Bug]: "Response with id '{response_id}' not found" in /ui/chat with vllm [2 comments, 2 participants]

Error Message

Error occurred while generating model response. Please try again. Error: Error: 404 litellm.BadRequestError: Hosted_vllmException - {"error":{"message":"Response with id 'resp_998bca5d44e22037' not found.","type":"invalid_request_error","param":"response_id","code":404}}. Received Model Group=Qwen/Qwen3.6-35B-A3B-FP8 Available Model Group Fallbacks=None

Code Example

Error occurred while generating model response. Please try again.
Error: Error: 404 litellm.BadRequestError: Hosted_vllmException -
{"error":{"message":"Response with id 'resp_998bca5d44e22037' not found.","type":"invalid_request_error","param":"response_id","code":404}}.
Received Model Group=Qwen/Qwen3.6-35B-A3B-FP8
Available Model Group Fallbacks=None

---

Qwen/Qwen3.6-35B-A3B-FP8 \
  --quantization fp8 \
  --tensor-parallel-size 2 \
  --reasoning-parser qwen3 \
  --host 0.0.0.0 \
  --port 8000 \
  --dtype auto \
  --gpu-memory-utilization 0.95 \
  --max-model-len 262144

---

vllm serve Qwen/Qwen3.6-35B-A3B-FP8 \
  --quantization fp8 \
  --tensor-parallel-size 2 \
  --reasoning-parser qwen3 \
  --host 0.0.0.0 \
  --port 8000 \
  --dtype auto \
  --gpu-memory-utilization 0.95 \
  --max-model-len 262144

---

Error occurred while generating model response. Please try again.
Error: Error: 404 litellm.BadRequestError: Hosted_vllmException -
{"error":{"message":"Response with id 'resp_998bca5d44e22037' not found.","type":"invalid_request_error","param":"response_id","code":404}}.
Received Model Group=Qwen/Qwen3.6-35B-A3B-FP8
Available Model Group Fallbacks=None

---

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Issue summary

When using a hosted_vllm model backed by vLLM, continuous chat works correctly in the LiteLLM Playground, but fails in /ui/chat starting from the second user message.

The first message succeeds, but on the second turn /ui/chat returns this error:

Error occurred while generating model response. Please try again.
Error: Error: 404 litellm.BadRequestError: Hosted_vllmException -
{"error":{"message":"Response with id 'resp_998bca5d44e22037' not found.","type":"invalid_request_error","param":"response_id","code":404}}.
Received Model Group=Qwen/Qwen3.6-35B-A3B-FP8
Available Model Group Fallbacks=None

Observed behavior

LiteLLM Playground: multi-turn chat works
/ui/chat: first message works, second and later messages fail with response_id not found

vLLM startup parameters

Qwen/Qwen3.6-35B-A3B-FP8 \
  --quantization fp8 \
  --tensor-parallel-size 2 \
  --reasoning-parser qwen3 \
  --host 0.0.0.0 \
  --port 8000 \
  --dtype auto \
  --gpu-memory-utilization 0.95 \
  --max-model-len 262144

Suspected issue It looks like /ui/chat may be using the Responses API flow and attempting to reuse a response_id, but the backend vLLM response cannot be found on the next turn. This only seems to happen in /ui/chat, not in the Playground.

litellm Playground (without problem) <img width="1670" height="903" alt="Image" src="https://github.com/user-attachments/assets/dd0a5466-0991-445e-9f4a-5e4acfb65d93" />

litellm ui chat

Steps to Reproduce

Start a vLLM server with the following model and parameters:

vllm serve Qwen/Qwen3.6-35B-A3B-FP8 \
  --quantization fp8 \
  --tensor-parallel-size 2 \
  --reasoning-parser qwen3 \
  --host 0.0.0.0 \
  --port 8000 \
  --dtype auto \
  --gpu-memory-utilization 0.95 \
  --max-model-len 262144

Configure this model in LiteLLM as a hosted_vllm backend.
Open LiteLLM /ui/chat
Select the Qwen/Qwen3.6-35B-A3B-FP8 model.
Send the first user message. Result: the first response is generated successfully.
Send a second follow-up message in the same chat session.
Observe that the request fails with:

Error occurred while generating model response. Please try again.
Error: Error: 404 litellm.BadRequestError: Hosted_vllmException -
{"error":{"message":"Response with id 'resp_998bca5d44e22037' not found.","type":"invalid_request_error","param":"response_id","code":404}}.
Received Model Group=Qwen/Qwen3.6-35B-A3B-FP8
Available Model Group Fallbacks=None

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.6

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The issue can likely be resolved by modifying the /ui/chat implementation to not reuse response_ids or by ensuring that the backend vLLM stores and retrieves responses correctly.

Guidance

Investigate the Responses API flow in /ui/chat to determine why it's attempting to reuse a response_id that cannot be found by the backend vLLM.
Verify that the vLLM backend is correctly storing and retrieving responses for multi-turn conversations.
Check the LiteLLM configuration to ensure that the hosted_vllm backend is properly set up to handle continuous chat sessions.
Consider adding logging or debugging statements to the /ui/chat code to track the response_id generation and usage.

Example

No code snippet is provided as the issue does not contain sufficient information to create a specific example.

Notes

The issue seems to be specific to the /ui/chat implementation and the interaction with the hosted_vllm backend. The fact that the LiteLLM Playground works correctly suggests that the issue is not with the vLLM model itself, but rather with how /ui/chat is using it.

Recommendation

Apply a workaround to modify the /ui/chat implementation to not reuse response_ids or ensure correct response storage and retrieval by the backend vLLM, as the root cause of the issue appears to be related to this aspect of the code.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: "Response with id '{response_id}' not found" in /ui/chat with vllm [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: "Response with id '{response_id}' not found" in /ui/chat with vllm [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING