litellm - 💡(How to fix) Fix [Bug] Streaming requests bypass content_policy_fallbacks: 4xx filter in streaming_handler raises ContentPolicyViolationError directly

litellm2026-05-22 13:34:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

In streaming requests, litellm.exceptions.ContentPolicyViolationError is currently treated as a non-retriable client error and raised directly, bypassing both fallbacks and content_policy_fallbacks. The non-streaming path handles CPV correctly; only the streaming code path has the bug.

Error Message

Raise non-retriable client errors directly (skip fallback).

Exception: 429 (rate-limit) IS retriable/transient — allow it

through so the Router can switch to a different model group.

if ( mapped_status_code is not None and 400 <= mapped_status_code < 500 and mapped_status_code != 429 ): raise mapped_exception

Root Cause

content_policy_fallbacks is documented as the canonical way to route around provider content-moderation rejections (DashScope/Qwen "inappropriate content", Azure OpenAI content filter, etc.). It works correctly when the underlying call is non-streaming — async_function_with_fallbacks_common_utils has a dedicated elif isinstance(e, litellm.ContentPolicyViolationError): branch (router.py:5350) that consults content_policy_fallbacks and dispatches.

For streaming requests the same provider 400 turns into a hard error. The user-visible symptom is identical to "no fallback configured" even when both fallbacks and content_policy_fallbacks are correctly wired:

litellm.BadRequestError: DashscopeException - Input data may contain inappropriate content.
Received Model Group=qwen3.7-max-thinking
Available Model Group Fallbacks=None

The Fallbacks=None is misleading — fallbacks ARE configured and the lookup function returns them when called directly. They just never get consulted because the exception was already raised past the fallback machinery.

Fix Action

Fix / Workaround

Configure a model that's known to occasionally hit content-moderation 400s on the provider side (DashScope/Qwen-thinking is a reliable trigger; some Azure OpenAI deployments work too).
Configure content_policy_fallbacks (and/or fallbacks) for that model group, pointing at a different model.
Send a streaming chat-completion (stream: true) with a payload that trips the provider's content moderation. DashScope examples: certain Chinese-political adjacent phrasings, or prompts including specific blocked terms — non-trivial to share publicly.
Observe: response is a single 400 with the misleading Fallbacks=None tail. No fallback dispatch happens.
Re-run the same payload with stream: false. Observe: fallback fires correctly, the conversation completes via the tier-2 model.

Workaround currently in production

Code Example

# Raise non-retriable client errors directly (skip fallback).
# Exception: 429 (rate-limit) IS retriable/transient — allow it
# through so the Router can switch to a different model group.
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    and mapped_status_code != 429
):
    raise mapped_exception

---

litellm.BadRequestError: DashscopeException - Input data may contain inappropriate content.
Received Model Group=qwen3.7-max-thinking
Available Model Group Fallbacks=None

---

from litellm.exceptions import ContentPolicyViolationError
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    and mapped_status_code != 429
    and not isinstance(mapped_exception, ContentPolicyViolationError)  # ← add
):
    raise mapped_exception
if (
    original_status_code is not None
    and 400 <= original_status_code < 500
    and original_status_code != 429
    and not isinstance(mapped_exception, ContentPolicyViolationError)  # ← add
):
    raise mapped_exception

RAW_BUFFERClick to expand / collapse

Summary

Where

litellm/litellm_core_utils/streaming_handler.py, CustomStreamWrapper._handle_stream_fallback_error:

# Raise non-retriable client errors directly (skip fallback).
# Exception: 429 (rate-limit) IS retriable/transient — allow it
# through so the Router can switch to a different model group.
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    and mapped_status_code != 429
):
    raise mapped_exception

ContentPolicyViolationError has status_code = 400, so this branch fires and raises directly. The raise MidStreamFallbackError(...) further down is never reached, the router's fallback chain never engages, and the conversation 500s.

Why this matters

litellm.BadRequestError: DashscopeException - Input data may contain inappropriate content.
Received Model Group=qwen3.7-max-thinking
Available Model Group Fallbacks=None

Reproduction

Configure a model that's known to occasionally hit content-moderation 400s on the provider side (DashScope/Qwen-thinking is a reliable trigger; some Azure OpenAI deployments work too).
Configure content_policy_fallbacks (and/or fallbacks) for that model group, pointing at a different model.
Send a streaming chat-completion (stream: true) with a payload that trips the provider's content moderation. DashScope examples: certain Chinese-political adjacent phrasings, or prompts including specific blocked terms — non-trivial to share publicly.
Observe: response is a single 400 with the misleading Fallbacks=None tail. No fallback dispatch happens.
Re-run the same payload with stream: false. Observe: fallback fires correctly, the conversation completes via the tier-2 model.

Suggested fix

The 4xx filter has a deliberate exemption for 429 (rate-limit is "transient client error, should fall back"). ContentPolicyViolationError belongs in that same category — it's the one specific 4xx that the project has invested in handling via dedicated fallback config. Treat it the same way:

from litellm.exceptions import ContentPolicyViolationError
if (
    mapped_status_code is not None
    and 400 <= mapped_status_code < 500
    and mapped_status_code != 429
    and not isinstance(mapped_exception, ContentPolicyViolationError)  # ← add
):
    raise mapped_exception
if (
    original_status_code is not None
    and 400 <= original_status_code < 500
    and original_status_code != 429
    and not isinstance(mapped_exception, ContentPolicyViolationError)  # ← add
):
    raise mapped_exception

That's it — three lines, two locations. With this in place, a CPV in the streaming path falls through to raise MidStreamFallbackError(...), the router's stream_with_fallbacks generator catches it, calls async_function_with_fallbacks_common_utils which has the existing CPV branch, and content_policy_fallbacks is consulted as intended.

ContextWindowExceededError (also a 4xx, also has dedicated context_window_fallbacks) deserves the same exemption by the same logic, though I haven't hit it personally — including it would close the family.

Workaround currently in production

I'm running this exact patch as a bind-mounted vendored file on a private dorx LibreChat deployment. Streaming DashScope content-policy 400s on qwen3.7-max-thinking now correctly fall through to a tier-2 model. Happy to PR if the shape above looks right — wanted to file the issue first so we can agree on whether (a) the exemption should be specific to ContentPolicyViolationError, (b) it should extend to the other dedicated-fallback errors (context window), or (c) you'd prefer a different structural fix.

Environment

litellm[proxy] from the ghcr.io/berriai/litellm:main-latest image
LibreChat as the streaming consumer (defaults to stream: true)
Provider: DashScope (Alibaba Cloud) via OpenAI-compat surface
Model: qwen/qwen3-max-thinking (and similar Qwen thinking variants)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug] Streaming requests bypass content_policy_fallbacks: 4xx filter in streaming_handler raises ContentPolicyViolationError directly

Recommended Tools

GitHub issue graph ai analysis

Error Message

Raise non-retriable client errors directly (skip fallback).

Exception: 429 (rate-limit) IS retriable/transient — allow it

through so the Router can switch to a different model group.

Root Cause

Fix Action

Fix / Workaround

Workaround currently in production

Code Example

Summary

Where

Why this matters

Reproduction

Suggested fix

Workaround currently in production

Environment

Still need to ship something?

TRENDING