litellm - 💡(How to fix) Fix Streaming fallback inconsistent with non-stream fallback on key-level router_settings [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25843Fetched 2026-04-17 08:28:44
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

We found an inconsistency between non-streaming and streaming fallback behavior for the same key-level fallback config.

For the same API key and same fallback mapping:

  • non-streaming requests correctly switch to the fallback model
  • streaming requests do not switch and continue using the primary model
  • real production failures on streaming requests also show MidStreamFallbackError and Available Model Group Fallbacks=None

This suggests the streaming fallback path is not reading or applying the same fallback config that the non-stream path uses.

Root Cause

This is surprising because the key currently has fallback config for that exact model group, and non-stream testing confirms fallback works.

Code Example

{
  "fallbacks": [
    {
      "vertex_ai/gemini-3.1-flash-lite-preview": [
        "vertex_ai/gemini-2.5-flash",
        "openrouter/google/gemini-3.1-flash-lite-preview",
        "openrouter/google/gemini-2.5-flash",
        "deepinfra/google/gemini-2.5-flash"
      ]
    },
    {
      "vertex_ai/gemini-2.5-flash": [
        "openrouter/google/gemini-2.5-flash",
        "deepinfra/google/gemini-2.5-flash"
      ]
    }
  ]
}

---

curl -X POST "https://<proxy>/v1/chat/completions" \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex_ai/gemini-3.1-flash-lite-preview",
    "messages": [
      { "role": "user", "content": "say ping" }
    ],
    "mock_testing_fallbacks": true,
    "stream": false,
    "max_tokens": 32
  }'

---

{
  "model": "gemini-2.5-flash"
}

---

curl -N -X POST "https://<proxy>/v1/chat/completions" \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex_ai/gemini-3.1-flash-lite-preview",
    "messages": [
      { "role": "user", "content": "say ping" }
    ],
    "mock_testing_fallbacks": true,
    "stream": true,
    "max_tokens": 32
  }'

---

data: {"model":"vertex_ai/gemini-3.1-flash-lite-preview", ...}

---

Available Model Group Fallbacks=None
RAW_BUFFERClick to expand / collapse

Summary

We found an inconsistency between non-streaming and streaming fallback behavior for the same key-level fallback config.

For the same API key and same fallback mapping:

  • non-streaming requests correctly switch to the fallback model
  • streaming requests do not switch and continue using the primary model
  • real production failures on streaming requests also show MidStreamFallbackError and Available Model Group Fallbacks=None

This suggests the streaming fallback path is not reading or applying the same fallback config that the non-stream path uses.

Environment

  • LiteLLM Proxy: private hosted instance
  • Tested on: 2026-04-16
  • Key under test: key-level fallback config attached to Chatbot-Prod
  • Model under test: vertex_ai/gemini-3.1-flash-lite-preview

Key-level fallback config in use

{
  "fallbacks": [
    {
      "vertex_ai/gemini-3.1-flash-lite-preview": [
        "vertex_ai/gemini-2.5-flash",
        "openrouter/google/gemini-3.1-flash-lite-preview",
        "openrouter/google/gemini-2.5-flash",
        "deepinfra/google/gemini-2.5-flash"
      ]
    },
    {
      "vertex_ai/gemini-2.5-flash": [
        "openrouter/google/gemini-2.5-flash",
        "deepinfra/google/gemini-2.5-flash"
      ]
    }
  ]
}

All fallback target models are allowed for the team.

Reproduction

Case 1: Non-stream request with mock fallback

curl -X POST "https://<proxy>/v1/chat/completions" \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex_ai/gemini-3.1-flash-lite-preview",
    "messages": [
      { "role": "user", "content": "say ping" }
    ],
    "mock_testing_fallbacks": true,
    "stream": false,
    "max_tokens": 32
  }'

Result

  • HTTP 200
  • returned model was fallback model: gemini-2.5-flash

Observed response excerpt:

{
  "model": "gemini-2.5-flash"
}

So non-stream fallback works.


Case 2: Streaming request with mock fallback

curl -N -X POST "https://<proxy>/v1/chat/completions" \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vertex_ai/gemini-3.1-flash-lite-preview",
    "messages": [
      { "role": "user", "content": "say ping" }
    ],
    "mock_testing_fallbacks": true,
    "stream": true,
    "max_tokens": 32
  }'

Result

  • HTTP 200
  • streamed chunks still show primary model: vertex_ai/gemini-3.1-flash-lite-preview
  • no visible switch to fallback model

Observed streamed response excerpt:

data: {"model":"vertex_ai/gemini-3.1-flash-lite-preview", ...}

So stream path does not behave like non-stream path.


Real production evidence

A real failing streaming request logged:

  • error_class: MidStreamFallbackError
  • error_code: 429
  • model_group: vertex_ai/gemini-3.1-flash-lite-preview
  • message included:
Available Model Group Fallbacks=None

This is surprising because the key currently has fallback config for that exact model group, and non-stream testing confirms fallback works.

Expected behavior

Streaming requests should use the same fallback mapping as non-stream requests for the same key/model group.

If mock_testing_fallbacks=true proves fallback is configured for non-stream, the stream path should not ignore it.

Actual behavior

  • non-stream path reads/applies fallback correctly
  • stream path does not appear to apply the same fallback config
  • real streaming failures may end in MidStreamFallbackError without using configured key-level fallback

Questions

  1. Is mock_testing_fallbacks expected to behave differently for stream=true vs stream=false?
  2. Is key-level router_settings.fallbacks fully supported for streaming requests?
  3. Is there a known issue in the mid-stream fallback path where fallback config becomes unavailable / logs as Available Model Group Fallbacks=None?
  4. Does the streaming router path use a different config snapshot from non-stream routing?

extent analysis

TL;DR

The streaming fallback behavior for the same key-level fallback config is inconsistent with non-streaming requests, suggesting a potential issue with how the streaming path reads or applies the fallback configuration.

Guidance

  • Verify that the mock_testing_fallbacks parameter is expected to work identically for both streaming and non-streaming requests, and check if there are any known differences in behavior.
  • Investigate if the key-level router_settings.fallbacks configuration is fully supported for streaming requests, and if there are any limitations or known issues.
  • Check the logging and error handling for the mid-stream fallback path to understand why the Available Model Group Fallbacks=None message is being logged, despite the presence of a valid fallback config.
  • Compare the configuration snapshots used by the streaming and non-streaming router paths to ensure they are using the same fallback configuration.

Example

No code snippet is provided as the issue is more related to configuration and behavior rather than a specific code problem.

Notes

The issue seems to be related to the inconsistency between streaming and non-streaming fallback behavior, and it's unclear if this is a configuration issue, a known limitation, or a bug. Further investigation is needed to determine the root cause.

Recommendation

Apply a workaround by verifying the fallback configuration is correctly applied to both streaming and non-streaming requests, and if necessary, update the configuration to ensure consistency between the two paths. This is because the expected behavior is for streaming requests to use the same fallback mapping as non-stream requests for the same key/model group.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Streaming requests should use the same fallback mapping as non-stream requests for the same key/model group.

If mock_testing_fallbacks=true proves fallback is configured for non-stream, the stream path should not ignore it.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING