litellm - ✅(Solved) Fix [Bug]: when upstream return BadRequestError,litellm handle the stream normally and raise the 503 MidStreamFallbackError [1 pull requests, 1 comments, 2 participants]

litellm2026-04-10 13:32:05

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25492•Fetched 2026-04-11 06:13:54

View on GitHub

Comments

Participants

Timeline

Reactions

Author

enternal111

Participants

enternal111

SamSi0322

Timeline (top)

labeled ×3commented ×1cross-referenced ×1referenced ×1

Error Message

Request Failed Error Code: 503 Message: litellm.ServiceUnavailableError: litellm.MidStreamFallbackError: litellm.APIConnectionError: 'id' Traceback (most recent call last): File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/streaming_handler.py", line 1915, in anext async for chunk in self.completion_stream: ...<71 lines>... return processed_chunk File "/usr/lib/python3.13/site-packages/litellm/llms/base_llm/base_model_iterator.py", line 172, in anext chunk = self._handle_string_chunk(str_line=str_line) File "/usr/lib/python3.13/site-packages/litellm/llms/base_llm/base_model_iterator.py", line 116, in _handle_string_chunk return self.chunk_parser(chunk=stripped_json_chunk) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.13/site-packages/litellm/llms/openai/chat/gpt_transformation.py", line 809, in chunk_parser raise e File "/usr/lib/python3.13/site-packages/litellm/llms/openai/chat/gpt_transformation.py", line 799, in chunk_parser "id": chunk["id"], ~~~~~^^^^^^ KeyError: 'id'

Fix Action

Fixed

Fixed by PR: Handle error responses embedded in SSE streams (https://github.com/BerriAI/litellm/pull/25505)

PR fix notes

PR #25505: Handle error responses embedded in SSE streams

Repository: BerriAI/litellm
Author: SamSi0322
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/25505

Description (problem / solution / changelog)

Summary

When an upstream provider (e.g. sglang) returns an error inside an HTTP 200 SSE stream as {"error": {"type": "BadRequestError", "code": 400, ...}}, the chunk_parser tried to construct a ModelResponseStream from the error dict. This failed because the error chunk lacks standard completion fields (id, choices, etc.), causing a misleading 503 MidStreamFallbackError instead of the actual 400 BadRequestError.

Fix

Added an early "error" in chunk guard at the top of chunk_parser(). When an error chunk is detected, it raises litellm.BadRequestError with the upstream error message, giving callers the correct error type and status code.

Test plan

Error chunks in SSE streams now raise BadRequestError with the upstream message
Normal completion chunks are unaffected (the guard only triggers on "error" key)
The 503 MidStreamFallbackError cascade no longer occurs for upstream 400 errors

Closes #25492

Changed files

litellm/llms/openai/chat/gpt_transformation.py (modified, +16/-0)

Code Example

Request Failed
Error Code: 503
Message: litellm.ServiceUnavailableError: litellm.MidStreamFallbackError: litellm.APIConnectionError: 'id' Traceback (most recent call last): File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/streaming_handler.py", line 1915, in __anext__ async for chunk in self.completion_stream: ...<71 lines>... return processed_chunk File "/usr/lib/python3.13/site-packages/litellm/llms/base_llm/base_model_iterator.py", line 172, in __anext__ chunk = self._handle_string_chunk(str_line=str_line) File "/usr/lib/python3.13/site-packages/litellm/llms/base_llm/base_model_iterator.py", line 116, in _handle_string_chunk return self.chunk_parser(chunk=stripped_json_chunk) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.13/site-packages/litellm/llms/openai/chat/gpt_transformation.py", line 809, in chunk_parser raise e File "/usr/lib/python3.13/site-packages/litellm/llms/openai/chat/gpt_transformation.py", line 799, in chunk_parser "id": chunk["id"], ~~~~~^^^^^^ KeyError: 'id'

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

A bug happened! when the input token < content length, and input token+max_tokens > content length, sglang return

response Headers: [:status=200, content-type=text/event-stream; charset=utf-8, , transfer-encoding=chunked] response Body: data: {"error": {"object": "error", "message": "Requested token count exceeds the model's maximum context length of 196608 tokens. You requested a total of 204830 tokens: 172830 tokens from the input messages and 32000 tokens for the completion. Please reduce the number of tokens in the input messages or the completion to fit within the limit.", "type": "BadRequestError", "param": null, "code": 400}} data: [DONE]

and litellm do not recognized the error and treat the stream normally and raise the MidStreamFallbackError

Steps to Reproduce

sglang backend
input token length < max content length, and input token length+max_tokens > max content length
503 MidStreamFallbackError

Relevant log output

Request Failed
Error Code: 503
Message: litellm.ServiceUnavailableError: litellm.MidStreamFallbackError: litellm.APIConnectionError: 'id' Traceback (most recent call last): File "/usr/lib/python3.13/site-packages/litellm/litellm_core_utils/streaming_handler.py", line 1915, in __anext__ async for chunk in self.completion_stream: ...<71 lines>... return processed_chunk File "/usr/lib/python3.13/site-packages/litellm/llms/base_llm/base_model_iterator.py", line 172, in __anext__ chunk = self._handle_string_chunk(str_line=str_line) File "/usr/lib/python3.13/site-packages/litellm/llms/base_llm/base_model_iterator.py", line 116, in _handle_string_chunk return self.chunk_parser(chunk=stripped_json_chunk) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.13/site-packages/litellm/llms/openai/chat/gpt_transformation.py", line 809, in chunk_parser raise e File "/usr/lib/python3.13/site-packages/litellm/llms/openai/chat/gpt_transformation.py", line 799, in chunk_parser "id": chunk["id"], ~~~~~^^^^^^ KeyError: 'id'

What part of LiteLLM is this about?

UI Dashboard

What LiteLLM version are you on ?

v1.81.14-stable

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The issue can be resolved by handling the BadRequestError response from the sglang backend and reducing the number of tokens in the input messages or the completion to fit within the model's maximum context length.

Guidance

Check the response from the sglang backend for a BadRequestError and handle it accordingly to prevent the MidStreamFallbackError.
Reduce the number of tokens in the input messages or the completion to ensure the total token count does not exceed the model's maximum context length of 196608 tokens.
Verify that the litellm library is correctly parsing the error response from the sglang backend and handling it as expected.
Consider updating the input validation to prevent requests that exceed the model's maximum context length from being sent to the sglang backend.

Example

No code snippet is provided as the issue does not require a specific code change, but rather handling of an error response.

Notes

The issue is specific to the v1.81.14-stable version of LiteLLM and may be resolved in future versions. The MidStreamFallbackError is raised due to the BadRequestError response from the sglang backend not being handled correctly.

Recommendation

Apply a workaround to handle the BadRequestError response and reduce the number of tokens in the input messages or the completion to fit within the model's maximum context length, as updating to a fixed version is not implied in the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #memory optimization #batch processing #GPU compatibility #latency issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - ✅(Solved) Fix [Bug]: when upstream return BadRequestError,litellm handle the stream normally and raise the 503 MidStreamFallbackError [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #25505: Handle error responses embedded in SSE streams

Description (problem / solution / changelog)

Summary

Fix

Test plan

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

litellm - ✅(Solved) Fix [Bug]: when upstream return BadRequestError,litellm handle the stream normally and raise the 503 MidStreamFallbackError [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #25505: Handle error responses embedded in SSE streams

Description (problem / solution / changelog)

Summary

Fix

Test plan

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING