litellm - 💡(How to fix) Fix [Bug]: No fallbacks configured + 429 mid-stream causes 100% CPU hang (process unresponsive) [1 participants]

liefzyj · 2026-04-18T17:21:01Z

[litellm] When a streaming request receives a 429 RESOURCE EXHAUSTED error mid-stream and no fallbacks are configured , the litellm proxy process enters an inf… When a streaming request receives a 429 (RESOURCE_EXHAUSTED) error mid-stream and **no fallbacks are configured**, the litellm proxy process enters an infinite async generator loop, pinning CPU at 100% indefinitely. The process becomes completely unresponsive to new HTTP requests and must be killed manually. This is a regression/edge case introduced by PR #22375, which correctly added the `!= 429` exemption to allow the router to handle 429 via `MidStreamFallbackError`. However, when `fallbacks=None`, there is no exit path — the error loops silently. ## Fix / Workaround ## Fix Applied (local workaround) ## Description When a streaming request receives a 429 (RESOURCE_EXHAUSTED) error mid-stream and **no fallbacks are configured**, the litellm proxy process enters an infinite async generator loop, pinning CPU at 100% indefinitely. The process becomes completely unresponsive to new HTTP requests and must be killed manually. This is a regression/edge case introduced by PR #22375, which correctly added the `!= 429` exemption to allow the router to handle 429 via `MidStreamFallbackError`. However, when `fallbacks=None`, there is no exit path — the error loops silently. ## Environment - litellm version: **1.83.9** - Provider: **Vertex AI** (`vertex_ai/gemini-3.1-pro-preview`, `vertex_location: global`) - Deployment: litellm proxy via LaunchDaemon, `num_retries: 0`, **no fallbacks configured** ## Steps to Reproduce 1. Configure litellm proxy with a Vertex AI model and **no fallbacks**: ```yaml model_list: - model_name: gemini-pro litellm_params: model: vertex_ai/gemini-3.1-pro-preview vertex_project: my-project vertex_location: global router_settings: num_retries: 0 allowed_fails: 0 cooldown_time: 0 # fallbacks: NOT configured ``` 2. Send a long streaming request that triggers a mid-stream 429 from Vertex AI (e.g. hit QPM limit during active generation) 3. Observe: process CPU goes to 100% and stays there; all subsequent HTTP requests time out ## Root Cause In `streaming_handler.py`, `_handle_stream_fallback_error` wraps 429 in `MidStreamFallbackError` (intentionally, per PR #22375): ```python # streaming_handler.py ~L2303 if ( mapped_status_code is not None and 400 <= mapped_status_code < 500 and mapped_status_code != 429 # ← 429 is exempted from direct raise ): raise mapped_exception raise MidStreamFallbackError(...) # 429 ends up here ``` In `router.py`, `stream_with_fallbacks` catches `MidStreamFallbackError` and calls `async_function_with_fallbacks_common_utils`. With no fallbacks configured, this raises the error again as `fallback_error`: ```python # router.py ~L1875 except Exception as fallback_error: verbose_router_logger.error(f"Fallback also failed: {fallback_error}") raise fallback_error # re-raises MidStreamFallbackError ``` This re-raised `MidStreamFallbackError` propagates back through the nested `_wrap_streaming_iterator_with_enrichment` / `async_post_call_streaming_iterator_hook` chain in `proxy/utils.py`. Each layer catches and re-raises, but the async generator is never properly exhausted or closed, causing the event loop to spin at 100% CPU. ## Fix Applied (local workaround) Removing the `!= 429` exemption so 429 is raised directly like other 4xx errors: ```python # streaming_handler.py if ( mapped_status_code is not None and 400 <= mapped_status_code < 500 # removed: and mapped_status_code != 429 ): raise mapped_exception ``` With this change, the 429 is raised immediately as a `RateLimitError`, the downstream consumer (OpenClaw in our case) receives the error correctly, applies its own cooldown logic, and retries — no CPU hang. **Trade-off**: this disables mid-stream fallback on 429 for users who have fallbacks configured. A better fix might be: if `fallbacks=None` (or fallback also returns 429), raise directly instead of re-raising `MidStreamFallbackError`. ## Related Issues - PR #22375 — introduced the `!= 429` exemption (the root cause of this edge case) - Issue #23707 — Vertex AI 429 silent failure during streaming (different symptom) - Issue #20870 — Fallbacks on streaming with Gemini 429 (different scenario)

litellm2026-04-18 17:21:01

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26015•Fetched 2026-04-19 15:06:16

View on GitHub

Comments

Participants

Timeline

Reactions

Author

liefzyj

Participants

liefzyj

Timeline (top)

labeled ×1

When a streaming request receives a 429 (RESOURCE_EXHAUSTED) error mid-stream and no fallbacks are configured, the litellm proxy process enters an infinite async generator loop, pinning CPU at 100% indefinitely. The process becomes completely unresponsive to new HTTP requests and must be killed manually.

This is a regression/edge case introduced by PR #22375, which correctly added the != 429 exemption to allow the router to handle 429 via MidStreamFallbackError. However, when fallbacks=None, there is no exit path — the error loops silently.

Error Message

streaming_handler.py ~L2303

if ( mapped_status_code is not None and 400 <= mapped_status_code < 500 and mapped_status_code != 429 # ← 429 is exempted from direct raise ): raise mapped_exception

raise MidStreamFallbackError(...) # 429 ends up here

Root Cause

In streaming_handler.py, _handle_stream_fallback_error wraps 429 in MidStreamFallbackError (intentionally, per PR #22375):

# streaming_handler.py ~L2303
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    and mapped_status_code != 429   # ← 429 is exempted from direct raise
):
    raise mapped_exception

raise MidStreamFallbackError(...)  # 429 ends up here

In router.py, stream_with_fallbacks catches MidStreamFallbackError and calls async_function_with_fallbacks_common_utils. With no fallbacks configured, this raises the error again as fallback_error:

# router.py ~L1875
except Exception as fallback_error:
    verbose_router_logger.error(f"Fallback also failed: {fallback_error}")
    raise fallback_error  # re-raises MidStreamFallbackError

This re-raised MidStreamFallbackError propagates back through the nested _wrap_streaming_iterator_with_enrichment / async_post_call_streaming_iterator_hook chain in proxy/utils.py. Each layer catches and re-raises, but the async generator is never properly exhausted or closed, causing the event loop to spin at 100% CPU.

Fix Action

Fix / Workaround

Fix Applied (local workaround)

Code Example

model_list:
     - model_name: gemini-pro
       litellm_params:
         model: vertex_ai/gemini-3.1-pro-preview
         vertex_project: my-project
         vertex_location: global

   router_settings:
     num_retries: 0
     allowed_fails: 0
     cooldown_time: 0
     # fallbacks: NOT configured

---

# streaming_handler.py ~L2303
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    and mapped_status_code != 429   # ← 429 is exempted from direct raise
):
    raise mapped_exception

raise MidStreamFallbackError(...)  # 429 ends up here

---

# router.py ~L1875
except Exception as fallback_error:
    verbose_router_logger.error(f"Fallback also failed: {fallback_error}")
    raise fallback_error  # re-raises MidStreamFallbackError

---

# streaming_handler.py
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    # removed: and mapped_status_code != 429
):
    raise mapped_exception

RAW_BUFFERClick to expand / collapse

Description

Environment

litellm version: 1.83.9
Provider: Vertex AI (vertex_ai/gemini-3.1-pro-preview, vertex_location: global)
Deployment: litellm proxy via LaunchDaemon, num_retries: 0, no fallbacks configured

Steps to Reproduce

Configure litellm proxy with a Vertex AI model and no fallbacks:

model_list:
  - model_name: gemini-pro
    litellm_params:
      model: vertex_ai/gemini-3.1-pro-preview
      vertex_project: my-project
      vertex_location: global

router_settings:
  num_retries: 0
  allowed_fails: 0
  cooldown_time: 0
  # fallbacks: NOT configured

Send a long streaming request that triggers a mid-stream 429 from Vertex AI (e.g. hit QPM limit during active generation)
Observe: process CPU goes to 100% and stays there; all subsequent HTTP requests time out

Root Cause

In streaming_handler.py, _handle_stream_fallback_error wraps 429 in MidStreamFallbackError (intentionally, per PR #22375):

# streaming_handler.py ~L2303
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    and mapped_status_code != 429   # ← 429 is exempted from direct raise
):
    raise mapped_exception

raise MidStreamFallbackError(...)  # 429 ends up here

# router.py ~L1875
except Exception as fallback_error:
    verbose_router_logger.error(f"Fallback also failed: {fallback_error}")
    raise fallback_error  # re-raises MidStreamFallbackError

Fix Applied (local workaround)

Removing the != 429 exemption so 429 is raised directly like other 4xx errors:

# streaming_handler.py
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    # removed: and mapped_status_code != 429
):
    raise mapped_exception

With this change, the 429 is raised immediately as a RateLimitError, the downstream consumer (OpenClaw in our case) receives the error correctly, applies its own cooldown logic, and retries — no CPU hang.

Trade-off: this disables mid-stream fallback on 429 for users who have fallbacks configured. A better fix might be: if fallbacks=None (or fallback also returns 429), raise directly instead of re-raising MidStreamFallbackError.

Related Issues

PR #22375 — introduced the != 429 exemption (the root cause of this edge case)
Issue #23707 — Vertex AI 429 silent failure during streaming (different symptom)
Issue #20870 — Fallbacks on streaming with Gemini 429 (different scenario)

extent analysis

TL;DR

Removing the != 429 exemption in streaming_handler.py to raise 429 errors directly can prevent the infinite async generator loop and CPU pinning issue.

Guidance

Identify if the issue is caused by the absence of fallbacks and the specific error handling for 429 status codes in streaming_handler.py.
Verify if the problem occurs when fallbacks=None and a mid-stream 429 error is received from Vertex AI.
Consider applying the local workaround by removing the != 429 exemption to raise 429 errors directly, which may have implications for users with fallbacks configured.
Investigate alternative solutions that handle the case when fallbacks=None or the fallback also returns a 429 error, to raise the error directly instead of re-raising MidStreamFallbackError.

Example

# streaming_handler.py
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    # removed: and mapped_status_code != 429
):
    raise mapped_exception

Notes

The provided fix may have trade-offs, such as disabling mid-stream fallback on 429 for users with fallbacks configured. A more comprehensive solution should consider handling the case when fallbacks=None or the fallback also returns a 429 error.

Recommendation

Apply the workaround by removing the != 429 exemption, as it directly addresses the issue and prevents the CPU hang, but be aware of the potential trade-offs and the need for a more comprehensive solution that handles all scenarios.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#authentication issue #prompt issue #agent setup #task chaining #parallel task

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: No fallbacks configured + 429 mid-stream causes 100% CPU hang (process unresponsive) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

streaming_handler.py ~L2303

Root Cause

Fix Action

Fix / Workaround

Fix Applied (local workaround)

Code Example

Description

Environment

Steps to Reproduce

Root Cause

Fix Applied (local workaround)

Related Issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: No fallbacks configured + 429 mid-stream causes 100% CPU hang (process unresponsive) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

streaming_handler.py ~L2303

Root Cause

Fix Action

Fix / Workaround

Fix Applied (local workaround)

Code Example

Description

Environment

Steps to Reproduce

Root Cause

Fix Applied (local workaround)

Related Issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING