litellm - 💡(How to fix) Fix Streaming fallback inconsistent with non-stream fallback on key-level router_settings [1 participants]

litellm2026-04-16 06:09:35

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25843•Fetched 2026-04-17 08:28:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Sunsilkk

Participants

Sunsilkk

Timeline (top)

labeled ×1

We found an inconsistency between non-streaming and streaming fallback behavior for the same key-level fallback config.

For the same API key and same fallback mapping:

non-streaming requests correctly switch to the fallback model
streaming requests do not switch and continue using the primary model
real production failures on streaming requests also show MidStreamFallbackError and Available Model Group Fallbacks=None

This suggests the streaming fallback path is not reading or applying the same fallback config that the non-stream path uses.

Root Cause

This is surprising because the key currently has fallback config for that exact model group, and non-stream testing confirms fallback works.

Code Example

{
  "fallbacks": [
    {
      "vertex_ai/gemini-3.1-flash-lite-preview": [
        "vertex_ai/gemini-2.5-flash",
        "openrouter/google/gemini-3.1-flash-lite-preview",
        "openrouter/google/gemini-2.5-flash",
        "deepinfra/google/gemini-2.5-flash"
      ]
    },
    {
      "vertex_ai/gemini-2.5-flash": [
        "openrouter/google/gemini-2.5-flash",
        "deepinfra/google/gemini-2.5-flash"
      ]
    }
  ]
}

---

curl -X POST "https://<proxy>/v1/chat/completions" \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex_ai/gemini-3.1-flash-lite-preview",
    "messages": [
      { "role": "user", "content": "say ping" }
    ],
    "mock_testing_fallbacks": true,
    "stream": false,
    "max_tokens": 32
  }'

---

{
  "model": "gemini-2.5-flash"
}

---

curl -N -X POST "https://<proxy>/v1/chat/completions" \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex_ai/gemini-3.1-flash-lite-preview",
    "messages": [
      { "role": "user", "content": "say ping" }
    ],
    "mock_testing_fallbacks": true,
    "stream": true,
    "max_tokens": 32
  }'

---

data: {"model":"vertex_ai/gemini-3.1-flash-lite-preview", ...}

---

Available Model Group Fallbacks=None

RAW_BUFFERClick to expand / collapse

Summary

We found an inconsistency between non-streaming and streaming fallback behavior for the same key-level fallback config.

For the same API key and same fallback mapping:

non-streaming requests correctly switch to the fallback model
streaming requests do not switch and continue using the primary model
real production failures on streaming requests also show MidStreamFallbackError and Available Model Group Fallbacks=None

This suggests the streaming fallback path is not reading or applying the same fallback config that the non-stream path uses.

Environment

LiteLLM Proxy: private hosted instance
Tested on: 2026-04-16
Key under test: key-level fallback config attached to Chatbot-Prod
Model under test: vertex_ai/gemini-3.1-flash-lite-preview

Key-level fallback config in use

{
  "fallbacks": [
    {
      "vertex_ai/gemini-3.1-flash-lite-preview": [
        "vertex_ai/gemini-2.5-flash",
        "openrouter/google/gemini-3.1-flash-lite-preview",
        "openrouter/google/gemini-2.5-flash",
        "deepinfra/google/gemini-2.5-flash"
      ]
    },
    {
      "vertex_ai/gemini-2.5-flash": [
        "openrouter/google/gemini-2.5-flash",
        "deepinfra/google/gemini-2.5-flash"
      ]
    }
  ]
}

All fallback target models are allowed for the team.

Reproduction

Case 1: Non-stream request with mock fallback

curl -X POST "https://<proxy>/v1/chat/completions" \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex_ai/gemini-3.1-flash-lite-preview",
    "messages": [
      { "role": "user", "content": "say ping" }
    ],
    "mock_testing_fallbacks": true,
    "stream": false,
    "max_tokens": 32
  }'

Result

HTTP 200
returned model was fallback model: gemini-2.5-flash

Observed response excerpt:

{
  "model": "gemini-2.5-flash"
}

So non-stream fallback works.

Case 2: Streaming request with mock fallback

curl -N -X POST "https://<proxy>/v1/chat/completions" \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex_ai/gemini-3.1-flash-lite-preview",
    "messages": [
      { "role": "user", "content": "say ping" }
    ],
    "mock_testing_fallbacks": true,
    "stream": true,
    "max_tokens": 32
  }'

Result

HTTP 200
streamed chunks still show primary model: vertex_ai/gemini-3.1-flash-lite-preview
no visible switch to fallback model

Observed streamed response excerpt:

data: {"model":"vertex_ai/gemini-3.1-flash-lite-preview", ...}

So stream path does not behave like non-stream path.

Real production evidence

A real failing streaming request logged:

error_class: MidStreamFallbackError
error_code: 429
model_group: vertex_ai/gemini-3.1-flash-lite-preview
message included:

Available Model Group Fallbacks=None

This is surprising because the key currently has fallback config for that exact model group, and non-stream testing confirms fallback works.

Expected behavior

Streaming requests should use the same fallback mapping as non-stream requests for the same key/model group.

If mock_testing_fallbacks=true proves fallback is configured for non-stream, the stream path should not ignore it.

Actual behavior

non-stream path reads/applies fallback correctly
stream path does not appear to apply the same fallback config
real streaming failures may end in MidStreamFallbackError without using configured key-level fallback

Questions

Is mock_testing_fallbacks expected to behave differently for stream=true vs stream=false?
Is key-level router_settings.fallbacks fully supported for streaming requests?
Is there a known issue in the mid-stream fallback path where fallback config becomes unavailable / logs as Available Model Group Fallbacks=None?
Does the streaming router path use a different config snapshot from non-stream routing?

extent analysis

TL;DR

The streaming fallback behavior for the same key-level fallback config is inconsistent with non-streaming requests, suggesting a potential issue with how the streaming path reads or applies the fallback configuration.

Guidance

Verify that the mock_testing_fallbacks parameter is expected to work identically for both streaming and non-streaming requests, and check if there are any known differences in behavior.
Investigate if the key-level router_settings.fallbacks configuration is fully supported for streaming requests, and if there are any limitations or known issues.
Check the logging and error handling for the mid-stream fallback path to understand why the Available Model Group Fallbacks=None message is being logged, despite the presence of a valid fallback config.
Compare the configuration snapshots used by the streaming and non-streaming router paths to ensure they are using the same fallback configuration.

Example

No code snippet is provided as the issue is more related to configuration and behavior rather than a specific code problem.

Notes

The issue seems to be related to the inconsistency between streaming and non-streaming fallback behavior, and it's unclear if this is a configuration issue, a known limitation, or a bug. Further investigation is needed to determine the root cause.

Recommendation

Apply a workaround by verifying the fallback configuration is correctly applied to both streaming and non-streaming requests, and if necessary, update the configuration to ensure consistency between the two paths. This is because the expected behavior is for streaming requests to use the same fallback mapping as non-stream requests for the same key/model group.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Streaming requests should use the same fallback mapping as non-stream requests for the same key/model group.

If mock_testing_fallbacks=true proves fallback is configured for non-stream, the stream path should not ignore it.

#api #tensor shape #autograd error #model save/load #optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix Streaming fallback inconsistent with non-stream fallback on key-level router_settings [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Environment

Key-level fallback config in use

Reproduction

Case 1: Non-stream request with mock fallback

Result

Case 2: Streaming request with mock fallback

Result

Real production evidence

Expected behavior

Actual behavior

Questions

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix Streaming fallback inconsistent with non-stream fallback on key-level router_settings [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Environment

Key-level fallback config in use

Reproduction

Case 1: Non-stream request with mock fallback

Result

Case 2: Streaming request with mock fallback

Result

Real production evidence

Expected behavior

Actual behavior

Questions

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING