litellm - 💡(How to fix) Fix [Bug]: No fallbacks configured + 429 mid-stream causes 100% CPU hang (process unresponsive) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26015Fetched 2026-04-19 15:06:16
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

When a streaming request receives a 429 (RESOURCE_EXHAUSTED) error mid-stream and no fallbacks are configured, the litellm proxy process enters an infinite async generator loop, pinning CPU at 100% indefinitely. The process becomes completely unresponsive to new HTTP requests and must be killed manually.

This is a regression/edge case introduced by PR #22375, which correctly added the != 429 exemption to allow the router to handle 429 via MidStreamFallbackError. However, when fallbacks=None, there is no exit path — the error loops silently.

Error Message

streaming_handler.py ~L2303

if ( mapped_status_code is not None and 400 <= mapped_status_code < 500 and mapped_status_code != 429 # ← 429 is exempted from direct raise ): raise mapped_exception

raise MidStreamFallbackError(...) # 429 ends up here

Root Cause

In streaming_handler.py, _handle_stream_fallback_error wraps 429 in MidStreamFallbackError (intentionally, per PR #22375):

# streaming_handler.py ~L2303
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    and mapped_status_code != 429   # ← 429 is exempted from direct raise
):
    raise mapped_exception

raise MidStreamFallbackError(...)  # 429 ends up here

In router.py, stream_with_fallbacks catches MidStreamFallbackError and calls async_function_with_fallbacks_common_utils. With no fallbacks configured, this raises the error again as fallback_error:

# router.py ~L1875
except Exception as fallback_error:
    verbose_router_logger.error(f"Fallback also failed: {fallback_error}")
    raise fallback_error  # re-raises MidStreamFallbackError

This re-raised MidStreamFallbackError propagates back through the nested _wrap_streaming_iterator_with_enrichment / async_post_call_streaming_iterator_hook chain in proxy/utils.py. Each layer catches and re-raises, but the async generator is never properly exhausted or closed, causing the event loop to spin at 100% CPU.

Fix Action

Fix / Workaround

Fix Applied (local workaround)

Code Example

model_list:
     - model_name: gemini-pro
       litellm_params:
         model: vertex_ai/gemini-3.1-pro-preview
         vertex_project: my-project
         vertex_location: global

   router_settings:
     num_retries: 0
     allowed_fails: 0
     cooldown_time: 0
     # fallbacks: NOT configured

---

# streaming_handler.py ~L2303
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    and mapped_status_code != 429   # ← 429 is exempted from direct raise
):
    raise mapped_exception

raise MidStreamFallbackError(...)  # 429 ends up here

---

# router.py ~L1875
except Exception as fallback_error:
    verbose_router_logger.error(f"Fallback also failed: {fallback_error}")
    raise fallback_error  # re-raises MidStreamFallbackError

---

# streaming_handler.py
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    # removed: and mapped_status_code != 429
):
    raise mapped_exception
RAW_BUFFERClick to expand / collapse

Description

When a streaming request receives a 429 (RESOURCE_EXHAUSTED) error mid-stream and no fallbacks are configured, the litellm proxy process enters an infinite async generator loop, pinning CPU at 100% indefinitely. The process becomes completely unresponsive to new HTTP requests and must be killed manually.

This is a regression/edge case introduced by PR #22375, which correctly added the != 429 exemption to allow the router to handle 429 via MidStreamFallbackError. However, when fallbacks=None, there is no exit path — the error loops silently.

Environment

  • litellm version: 1.83.9
  • Provider: Vertex AI (vertex_ai/gemini-3.1-pro-preview, vertex_location: global)
  • Deployment: litellm proxy via LaunchDaemon, num_retries: 0, no fallbacks configured

Steps to Reproduce

  1. Configure litellm proxy with a Vertex AI model and no fallbacks:
    model_list:
      - model_name: gemini-pro
        litellm_params:
          model: vertex_ai/gemini-3.1-pro-preview
          vertex_project: my-project
          vertex_location: global
    
    router_settings:
      num_retries: 0
      allowed_fails: 0
      cooldown_time: 0
      # fallbacks: NOT configured
  2. Send a long streaming request that triggers a mid-stream 429 from Vertex AI (e.g. hit QPM limit during active generation)
  3. Observe: process CPU goes to 100% and stays there; all subsequent HTTP requests time out

Root Cause

In streaming_handler.py, _handle_stream_fallback_error wraps 429 in MidStreamFallbackError (intentionally, per PR #22375):

# streaming_handler.py ~L2303
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    and mapped_status_code != 429   # ← 429 is exempted from direct raise
):
    raise mapped_exception

raise MidStreamFallbackError(...)  # 429 ends up here

In router.py, stream_with_fallbacks catches MidStreamFallbackError and calls async_function_with_fallbacks_common_utils. With no fallbacks configured, this raises the error again as fallback_error:

# router.py ~L1875
except Exception as fallback_error:
    verbose_router_logger.error(f"Fallback also failed: {fallback_error}")
    raise fallback_error  # re-raises MidStreamFallbackError

This re-raised MidStreamFallbackError propagates back through the nested _wrap_streaming_iterator_with_enrichment / async_post_call_streaming_iterator_hook chain in proxy/utils.py. Each layer catches and re-raises, but the async generator is never properly exhausted or closed, causing the event loop to spin at 100% CPU.

Fix Applied (local workaround)

Removing the != 429 exemption so 429 is raised directly like other 4xx errors:

# streaming_handler.py
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    # removed: and mapped_status_code != 429
):
    raise mapped_exception

With this change, the 429 is raised immediately as a RateLimitError, the downstream consumer (OpenClaw in our case) receives the error correctly, applies its own cooldown logic, and retries — no CPU hang.

Trade-off: this disables mid-stream fallback on 429 for users who have fallbacks configured. A better fix might be: if fallbacks=None (or fallback also returns 429), raise directly instead of re-raising MidStreamFallbackError.

Related Issues

  • PR #22375 — introduced the != 429 exemption (the root cause of this edge case)
  • Issue #23707 — Vertex AI 429 silent failure during streaming (different symptom)
  • Issue #20870 — Fallbacks on streaming with Gemini 429 (different scenario)

extent analysis

TL;DR

Removing the != 429 exemption in streaming_handler.py to raise 429 errors directly can prevent the infinite async generator loop and CPU pinning issue.

Guidance

  • Identify if the issue is caused by the absence of fallbacks and the specific error handling for 429 status codes in streaming_handler.py.
  • Verify if the problem occurs when fallbacks=None and a mid-stream 429 error is received from Vertex AI.
  • Consider applying the local workaround by removing the != 429 exemption to raise 429 errors directly, which may have implications for users with fallbacks configured.
  • Investigate alternative solutions that handle the case when fallbacks=None or the fallback also returns a 429 error, to raise the error directly instead of re-raising MidStreamFallbackError.

Example

# streaming_handler.py
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    # removed: and mapped_status_code != 429
):
    raise mapped_exception

Notes

The provided fix may have trade-offs, such as disabling mid-stream fallback on 429 for users with fallbacks configured. A more comprehensive solution should consider handling the case when fallbacks=None or the fallback also returns a 429 error.

Recommendation

Apply the workaround by removing the != 429 exemption, as it directly addresses the issue and prevents the CPU hang, but be aware of the potential trade-offs and the need for a more comprehensive solution that handles all scenarios.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: No fallbacks configured + 429 mid-stream causes 100% CPU hang (process unresponsive) [1 participants]