litellm - 💡(How to fix) Fix [Bug]: Router.aresponses streaming bypasses mid-stream fallback (MidStreamFallbackError not handled) [1 participants]

Root Cause

The chat completions path wraps its CustomStreamWrapper via Router._acompletion_streaming_iterator, which catches MidStreamFallbackError and re-enters async_function_with_fallbacks_common_utils.

The Responses API path doesn't have an equivalent:

Router.aresponses is bound via factory_function(litellm.aresponses, call_type="aresponses").
It dispatches into _ageneric_api_call_with_fallbacks → _ageneric_api_call_with_fallbacks_helper, which awaits litellm.aresponses(**response_kwargs) and returns the streaming iterator unwrapped.
Any MidStreamFallbackError raised during iteration of the returned BaseResponsesAPIStreamingIterator propagates past the Router.

For the Anthropic / Vertex Claude bridge case, the returned iterator is LiteLLMCompletionStreamingIterator, which wraps a CustomStreamWrapper — that's where MidStreamFallbackError originates.

Fix Action

Fix / Workaround

The Responses API path doesn't have an equivalent:

Router.aresponses is bound via factory_function(litellm.aresponses, call_type="aresponses").
It dispatches into _ageneric_api_call_with_fallbacks → _ageneric_api_call_with_fallbacks_helper, which awaits litellm.aresponses(**response_kwargs) and returns the streaming iterator unwrapped.
Any MidStreamFallbackError raised during iteration of the returned BaseResponsesAPIStreamingIterator propagates past the Router.

Code Example

import litellm

router = litellm.Router(
    model_list=[
        {
            "model_name": "anthropic/claude-sonnet-4-6",
            "litellm_params": {"model": "anthropic/claude-sonnet-4-6", "api_key": "..."},
        },
        {
            "model_name": "vertex_ai/claude-sonnet-4-6",
            "litellm_params": {"model": "vertex_ai/claude-sonnet-4-6", "api_key": "..."},
        },
    ],
    fallbacks=[{"anthropic/claude-sonnet-4-6": ["vertex_ai/claude-sonnet-4-6"]}],
)

# Force a mid-stream error from Anthropic (e.g. socket timeout pre-first-chunk).
# The equivalent setup with acompletion + stream=True correctly triggers
# the vertex_ai fallback; aresponses + stream=True does NOT.
stream = await router.aresponses(
    model="anthropic/claude-sonnet-4-6",
    input="Hello",
    stream=True,
)
async for event in stream:
    ...  # MidStreamFallbackError propagates here instead of falling back.

---

litellm.MidStreamFallbackError: ... — propagated to caller; Router fallback chain was not invoked

What happened?

MidStreamFallbackError raised mid-stream during Router.aresponses(stream=True) bypasses the Router's fallback chain. Configured cross-provider fallbacks (e.g. anthropic → vertex_ai) never fire when the primary provider's stream fails mid-flight.

Observed in production: an Anthropic socket timed out before the first chunk on Router.aresponses(stream=True). The underlying CustomStreamWrapper raised MidStreamFallbackError from _handle_stream_fallback_error — intended behavior, that's exactly what's supposed to drive fallback. The Router's configured anthropic → vertex_ai fallback was not invoked; the error surfaced unhandled to the caller.

Root cause

The Responses API path doesn't have an equivalent:

Router.aresponses is bound via factory_function(litellm.aresponses, call_type="aresponses").
It dispatches into _ageneric_api_call_with_fallbacks → _ageneric_api_call_with_fallbacks_helper, which awaits litellm.aresponses(**response_kwargs) and returns the streaming iterator unwrapped.
Any MidStreamFallbackError raised during iteration of the returned BaseResponsesAPIStreamingIterator propagates past the Router.

For the Anthropic / Vertex Claude bridge case, the returned iterator is LiteLLMCompletionStreamingIterator, which wraps a CustomStreamWrapper — that's where MidStreamFallbackError originates.

Steps to Reproduce

import litellm

router = litellm.Router(
    model_list=[
        {
            "model_name": "anthropic/claude-sonnet-4-6",
            "litellm_params": {"model": "anthropic/claude-sonnet-4-6", "api_key": "..."},
        },
        {
            "model_name": "vertex_ai/claude-sonnet-4-6",
            "litellm_params": {"model": "vertex_ai/claude-sonnet-4-6", "api_key": "..."},
        },
    ],
    fallbacks=[{"anthropic/claude-sonnet-4-6": ["vertex_ai/claude-sonnet-4-6"]}],
)

# Force a mid-stream error from Anthropic (e.g. socket timeout pre-first-chunk).
# The equivalent setup with acompletion + stream=True correctly triggers
# the vertex_ai fallback; aresponses + stream=True does NOT.
stream = await router.aresponses(
    model="anthropic/claude-sonnet-4-6",
    input="Hello",
    stream=True,
)
async for event in stream:
    ...  # MidStreamFallbackError propagates here instead of falling back.

Empirical reproduction

Live cross-provider test (anthropic primary → openai fallback, primary's stream intercepted to raise MidStreamFallbackError(generated_content="from-primary ")):

Wrapper state	Error propagates	Final text
Bypassed (current `main` behavior)	✅ (bug reproduced)	`''`
Wrapped (proposed fix in #28215)	❌ (fallback fires)	`'from-primary <openai output>'`

Suggested fix

Mirror _acompletion_streaming_iterator for the Responses API. Wrap the returned BaseResponsesAPIStreamingIterator in a handler that:

catches MidStreamFallbackError during iteration
re-enters async_function_with_fallbacks_common_utils with original_function=_ageneric_api_call_with_fallbacks_helper and original_generic_function=litellm.aresponses preserved so the helper invokes the right underlying API on each fallback attempt
pre-first-chunk: retry with original input
partial content: inject continuation messages into Responses-API input
combines partial usage with fallback usage

The wrapper should subclass BaseResponsesAPIStreamingIterator (without calling the parent constructor) to preserve isinstance compatibility with downstream consumers (proxy cursor endpoint and litellm/interactions/litellm_responses_transformation/handler.py) — and mirror every attribute the parent constructor sets so inherited methods (e.g. _check_max_streaming_duration, _handle_failure) stay safe to call.

Implementation also needs to use metadata_variable_name="litellm_metadata" (not the default "metadata") on _update_kwargs_before_fallbacks, matching the convention _ageneric_api_call_with_fallbacks uses, so observability metadata lands in the right key.

PR open against shin_agent_oss_staging_05_19_2026: #28215.

Relevant log output

litellm.MidStreamFallbackError: ... — propagated to caller; Router fallback chain was not invoked

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on?

v1.85.0 (also reproduces against current main — the gap is structural, not a regression)

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Router.aresponses streaming bypasses mid-stream fallback (MidStreamFallbackError not handled) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

What happened?

Root cause

Steps to Reproduce

Empirical reproduction

Suggested fix

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on?

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Router.aresponses streaming bypasses mid-stream fallback (MidStreamFallbackError not handled) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

What happened?

Root cause

Steps to Reproduce

Empirical reproduction

Suggested fix

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on?

Still need to ship something?

RELATED_DISCOVERY

TRENDING