litellm - 💡(How to fix) Fix [Bug]: Router.aresponses streaming bypasses mid-stream fallback (MidStreamFallbackError not handled) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#28216Fetched 2026-05-20 03:40:55
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
labeled ×2

Error Message

litellm.MidStreamFallbackError: ... — propagated to caller; Router fallback chain was not invoked

Root Cause

The chat completions path wraps its CustomStreamWrapper via Router._acompletion_streaming_iterator, which catches MidStreamFallbackError and re-enters async_function_with_fallbacks_common_utils.

The Responses API path doesn't have an equivalent:

  • Router.aresponses is bound via factory_function(litellm.aresponses, call_type="aresponses").
  • It dispatches into _ageneric_api_call_with_fallbacks_ageneric_api_call_with_fallbacks_helper, which awaits litellm.aresponses(**response_kwargs) and returns the streaming iterator unwrapped.
  • Any MidStreamFallbackError raised during iteration of the returned BaseResponsesAPIStreamingIterator propagates past the Router.

For the Anthropic / Vertex Claude bridge case, the returned iterator is LiteLLMCompletionStreamingIterator, which wraps a CustomStreamWrapper — that's where MidStreamFallbackError originates.

Fix Action

Fix / Workaround

The Responses API path doesn't have an equivalent:

  • Router.aresponses is bound via factory_function(litellm.aresponses, call_type="aresponses").
  • It dispatches into _ageneric_api_call_with_fallbacks_ageneric_api_call_with_fallbacks_helper, which awaits litellm.aresponses(**response_kwargs) and returns the streaming iterator unwrapped.
  • Any MidStreamFallbackError raised during iteration of the returned BaseResponsesAPIStreamingIterator propagates past the Router.

Code Example

import litellm

router = litellm.Router(
    model_list=[
        {
            "model_name": "anthropic/claude-sonnet-4-6",
            "litellm_params": {"model": "anthropic/claude-sonnet-4-6", "api_key": "..."},
        },
        {
            "model_name": "vertex_ai/claude-sonnet-4-6",
            "litellm_params": {"model": "vertex_ai/claude-sonnet-4-6", "api_key": "..."},
        },
    ],
    fallbacks=[{"anthropic/claude-sonnet-4-6": ["vertex_ai/claude-sonnet-4-6"]}],
)

# Force a mid-stream error from Anthropic (e.g. socket timeout pre-first-chunk).
# The equivalent setup with acompletion + stream=True correctly triggers
# the vertex_ai fallback; aresponses + stream=True does NOT.
stream = await router.aresponses(
    model="anthropic/claude-sonnet-4-6",
    input="Hello",
    stream=True,
)
async for event in stream:
    ...  # MidStreamFallbackError propagates here instead of falling back.

---

litellm.MidStreamFallbackError: ... — propagated to caller; Router fallback chain was not invoked
RAW_BUFFERClick to expand / collapse

What happened?

MidStreamFallbackError raised mid-stream during Router.aresponses(stream=True) bypasses the Router's fallback chain. Configured cross-provider fallbacks (e.g. anthropic → vertex_ai) never fire when the primary provider's stream fails mid-flight.

Observed in production: an Anthropic socket timed out before the first chunk on Router.aresponses(stream=True). The underlying CustomStreamWrapper raised MidStreamFallbackError from _handle_stream_fallback_error — intended behavior, that's exactly what's supposed to drive fallback. The Router's configured anthropic → vertex_ai fallback was not invoked; the error surfaced unhandled to the caller.

Root cause

The chat completions path wraps its CustomStreamWrapper via Router._acompletion_streaming_iterator, which catches MidStreamFallbackError and re-enters async_function_with_fallbacks_common_utils.

The Responses API path doesn't have an equivalent:

  • Router.aresponses is bound via factory_function(litellm.aresponses, call_type="aresponses").
  • It dispatches into _ageneric_api_call_with_fallbacks_ageneric_api_call_with_fallbacks_helper, which awaits litellm.aresponses(**response_kwargs) and returns the streaming iterator unwrapped.
  • Any MidStreamFallbackError raised during iteration of the returned BaseResponsesAPIStreamingIterator propagates past the Router.

For the Anthropic / Vertex Claude bridge case, the returned iterator is LiteLLMCompletionStreamingIterator, which wraps a CustomStreamWrapper — that's where MidStreamFallbackError originates.

Steps to Reproduce

import litellm

router = litellm.Router(
    model_list=[
        {
            "model_name": "anthropic/claude-sonnet-4-6",
            "litellm_params": {"model": "anthropic/claude-sonnet-4-6", "api_key": "..."},
        },
        {
            "model_name": "vertex_ai/claude-sonnet-4-6",
            "litellm_params": {"model": "vertex_ai/claude-sonnet-4-6", "api_key": "..."},
        },
    ],
    fallbacks=[{"anthropic/claude-sonnet-4-6": ["vertex_ai/claude-sonnet-4-6"]}],
)

# Force a mid-stream error from Anthropic (e.g. socket timeout pre-first-chunk).
# The equivalent setup with acompletion + stream=True correctly triggers
# the vertex_ai fallback; aresponses + stream=True does NOT.
stream = await router.aresponses(
    model="anthropic/claude-sonnet-4-6",
    input="Hello",
    stream=True,
)
async for event in stream:
    ...  # MidStreamFallbackError propagates here instead of falling back.

Empirical reproduction

Live cross-provider test (anthropic primary → openai fallback, primary's stream intercepted to raise MidStreamFallbackError(generated_content="from-primary ")):

Wrapper stateError propagatesFinal text
Bypassed (current main behavior)✅ (bug reproduced)''
Wrapped (proposed fix in #28215)❌ (fallback fires)'from-primary <openai output>'

Suggested fix

Mirror _acompletion_streaming_iterator for the Responses API. Wrap the returned BaseResponsesAPIStreamingIterator in a handler that:

  • catches MidStreamFallbackError during iteration
  • re-enters async_function_with_fallbacks_common_utils with original_function=_ageneric_api_call_with_fallbacks_helper and original_generic_function=litellm.aresponses preserved so the helper invokes the right underlying API on each fallback attempt
  • pre-first-chunk: retry with original input
  • partial content: inject continuation messages into Responses-API input
  • combines partial usage with fallback usage

The wrapper should subclass BaseResponsesAPIStreamingIterator (without calling the parent constructor) to preserve isinstance compatibility with downstream consumers (proxy cursor endpoint and litellm/interactions/litellm_responses_transformation/handler.py) — and mirror every attribute the parent constructor sets so inherited methods (e.g. _check_max_streaming_duration, _handle_failure) stay safe to call.

Implementation also needs to use metadata_variable_name="litellm_metadata" (not the default "metadata") on _update_kwargs_before_fallbacks, matching the convention _ageneric_api_call_with_fallbacks uses, so observability metadata lands in the right key.

PR open against shin_agent_oss_staging_05_19_2026: #28215.

Relevant log output

litellm.MidStreamFallbackError: ... — propagated to caller; Router fallback chain was not invoked

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on?

v1.85.0 (also reproduces against current main — the gap is structural, not a regression)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Router.aresponses streaming bypasses mid-stream fallback (MidStreamFallbackError not handled) [1 participants]