litellm - ✅(Solved) Fix [Bug]: `MidStreamFallbackError` crashes with `ValueError: invalid literal for int() with base 10: 'litellm_error'` [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26913Fetched 2026-05-01 05:34:26
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×3cross-referenced ×1

When a CustomLLM handler raises RateLimitError during a streaming request, LiteLLM's MidStreamFallbackError.__init__ crashes because it receives the string 'litellm_error' as original_status and tries to cast it to int.

The crash is in litellm/exceptions.py:957:

self.status_code = int(original_status) if original_status is not None else 503

This returns HTTP 500 to the client instead of triggering the fallback chain.

Error Message

File "litellm/litellm_core_utils/streaming_handler.py", line 2181, in __anext__
    self._handle_stream_fallback_error(e)
File "litellm/litellm_core_utils/streaming_handler.py", line 2249, in _handle_stream_fallback_error
    raise MidStreamFallbackError(
        message=str(mapped_exception),
        ...
        is_pre_first_chunk=not self.sent_first_chunk,
    )
File "litellm/exceptions.py", line 957, in __init__
    self.status_code = int(original_status) if original_status is not None else 503
ValueError: invalid literal for int() with base 10: 'litellm_error'

Root Cause

When a CustomLLM handler raises RateLimitError during a streaming request, LiteLLM's MidStreamFallbackError.__init__ crashes because it receives the string 'litellm_error' as original_status and tries to cast it to int.

Fix Action

Workaround

In the custom handler's streaming/astreaming methods, catch RateLimitError and re-raise as litellm.APIError(status_code=429, ...) which has a proper integer status code.

PR fix notes

PR #26925: Fix mid-stream fallback status handling

Description (problem / solution / changelog)

Summary

  • make MidStreamFallbackError tolerate malformed exception status_code values such as "litellm_error"
  • fall back to original_exception.response.status_code when it is valid
  • otherwise default to 503 without raising during fallback construction
  • add regression coverage for malformed status values and response-status fallback

Why

Streaming fallback handling can receive mapped exceptions whose status_code is not an HTTP integer. MidStreamFallbackError cast that value directly in two places, which could turn a retriable mid-stream provider error into a client-facing 500 before the router fallback chain gets a chance to run.

Fixes #26913.

Validation

  • /tmp/litellm-26913-venv/bin/python -m pytest tests/test_litellm/test_exception_header_preservation.py -q
    • 17 passed, 2 warnings
  • UV_NO_PROJECT=1 uvx black litellm/exceptions.py tests/test_litellm/test_exception_header_preservation.py --check --diff
    • 2 files would be left unchanged

Note: project uv run is blocked locally because the installed uv is 0.10.4 and this branch requires >=0.10.9, so I used a Python 3.13 venv for the targeted test.

Changed files

  • litellm/exceptions.py (modified, +21/-3)
  • litellm/proxy/_lazy_openapi_snapshot.py (modified, +30/-2)
  • tests/test_litellm/proxy/test_lazy_openapi_snapshot.py (added, +53/-0)
  • tests/test_litellm/test_exception_header_preservation.py (modified, +45/-0)

Code Example

self.status_code = int(original_status) if original_status is not None else 503

---

from litellm import CustomLLM, RateLimitError

class MyHandler(CustomLLM):
    async def acompletion(self, *args, **kwargs):
        raise RateLimitError(
            message="Key exhausted",
            model="my-model",
            llm_provider="openai",
        )

    async def astreaming(self, *args, **kwargs):
        # This triggers the bug: RateLimitError propagates through
        # the streaming pipeline and hits MidStreamFallbackError
        await self.acompletion(*args, **kwargs)
        yield  # never reached

---

model_list:
  - model_name: my-model
    litellm_params:
      model: my-provider/my-model
  - model_name: fallback-model
    litellm_params:
      model: openai/gpt-4

litellm_settings:
  custom_provider_map:
    - {"provider": "my-provider", "custom_handler": my_handler}
  fallbacks:
    - my-model:
        - fallback-model

---

File "litellm/exceptions.py", line 957, in __init__
    self.status_code = int(original_status) if original_status is not None else 503
ValueError: invalid literal for int() with base 10: 'litellm_error'

---

File "litellm/litellm_core_utils/streaming_handler.py", line 2181, in __anext__
    self._handle_stream_fallback_error(e)
File "litellm/litellm_core_utils/streaming_handler.py", line 2249, in _handle_stream_fallback_error
    raise MidStreamFallbackError(
        message=str(mapped_exception),
        ...
        is_pre_first_chunk=not self.sent_first_chunk,
    )
File "litellm/exceptions.py", line 957, in __init__
    self.status_code = int(original_status) if original_status is not None else 503
ValueError: invalid literal for int() with base 10: 'litellm_error'

---

# Before (crashes):
self.status_code = int(original_status) if original_status is not None else 503

# After (safe):
try:
    self.status_code = int(original_status) if original_status is not None else 503
except (ValueError, TypeError):
    self.status_code = 503

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Description

When a CustomLLM handler raises RateLimitError during a streaming request, LiteLLM's MidStreamFallbackError.__init__ crashes because it receives the string 'litellm_error' as original_status and tries to cast it to int.

The crash is in litellm/exceptions.py:957:

self.status_code = int(original_status) if original_status is not None else 503

This returns HTTP 500 to the client instead of triggering the fallback chain.

Steps to Reproduce

  1. Register a custom provider with a CustomLLM handler
  2. In the handler's astreaming method, raise RateLimitError
  3. Configure fallbacks for the model group
  4. Send a streaming request

Minimal custom handler:

from litellm import CustomLLM, RateLimitError

class MyHandler(CustomLLM):
    async def acompletion(self, *args, **kwargs):
        raise RateLimitError(
            message="Key exhausted",
            model="my-model",
            llm_provider="openai",
        )

    async def astreaming(self, *args, **kwargs):
        # This triggers the bug: RateLimitError propagates through
        # the streaming pipeline and hits MidStreamFallbackError
        await self.acompletion(*args, **kwargs)
        yield  # never reached

config.yaml:

model_list:
  - model_name: my-model
    litellm_params:
      model: my-provider/my-model
  - model_name: fallback-model
    litellm_params:
      model: openai/gpt-4

litellm_settings:
  custom_provider_map:
    - {"provider": "my-provider", "custom_handler": my_handler}
  fallbacks:
    - my-model:
        - fallback-model

Expected Behavior

RateLimitError from a custom handler during streaming should trigger the fallback chain (same as non-streaming).

Actual Behavior

File "litellm/exceptions.py", line 957, in __init__
    self.status_code = int(original_status) if original_status is not None else 503
ValueError: invalid literal for int() with base 10: 'litellm_error'

The client receives HTTP 500 Internal Server Error.

Stack Trace

File "litellm/litellm_core_utils/streaming_handler.py", line 2181, in __anext__
    self._handle_stream_fallback_error(e)
File "litellm/litellm_core_utils/streaming_handler.py", line 2249, in _handle_stream_fallback_error
    raise MidStreamFallbackError(
        message=str(mapped_exception),
        ...
        is_pre_first_chunk=not self.sent_first_chunk,
    )
File "litellm/exceptions.py", line 957, in __init__
    self.status_code = int(original_status) if original_status is not None else 503
ValueError: invalid literal for int() with base 10: 'litellm_error'

Suggested Fix

In litellm/exceptions.py, line 957, handle non-integer original_status gracefully:

# Before (crashes):
self.status_code = int(original_status) if original_status is not None else 503

# After (safe):
try:
    self.status_code = int(original_status) if original_status is not None else 503
except (ValueError, TypeError):
    self.status_code = 503

Workaround

In the custom handler's streaming/astreaming methods, catch RateLimitError and re-raise as litellm.APIError(status_code=429, ...) which has a proper integer status code.

Environment

  • LiteLLM version: 1.82.3
  • Python: 3.13
  • Docker image: ghcr.io/berriai/litellm:main-stable

Steps to Reproduce

  1. Register a custom provider with a CustomLLM handler
  2. In the handler's astreaming method, raise RateLimitError
  3. Configure fallbacks for the model group
  4. Send a streaming request

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.3

Twitter / LinkedIn details

No response

extent analysis

TL;DR

Modify the MidStreamFallbackError.__init__ method in litellm/exceptions.py to handle non-integer original_status values.

Guidance

  • Identify the source of the non-integer original_status value, which in this case is the string 'litellm_error'.
  • Apply the suggested fix by wrapping the int(original_status) conversion in a try-except block to catch ValueError and TypeError exceptions.
  • As a workaround, consider catching RateLimitError in the custom handler's astreaming method and re-raising it as litellm.APIError with a proper integer status code.
  • Verify the fix by reproducing the steps to trigger the RateLimitError and checking that the fallback chain is triggered correctly.

Example

try:
    self.status_code = int(original_status) if original_status is not None else 503
except (ValueError, TypeError):
    self.status_code = 503

Notes

The provided fix assumes that a non-integer original_status value should result in a default status code of 503. Depending on the specific requirements of the application, this default value may need to be adjusted.

Recommendation

Apply the suggested fix to handle non-integer original_status values, as it provides a more robust solution than the workaround.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: `MidStreamFallbackError` crashes with `ValueError: invalid literal for int() with base 10: 'litellm_error'` [1 pull requests, 1 participants]