litellm - ✅(Solved) Fix [Feature]: Expose actual served model in response when router falls back to a different deployment [1 pull requests, 1 participants]

litellm2026-04-10 14:45:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25503•Fetched 2026-04-11 06:13:49

View on GitHub

Comments

Participants

Timeline

Reactions

Author

VANDRANKI

Participants

VANDRANKI

Timeline (top)

labeled ×2

PR fix notes

PR #25712: feat(router): expose x-litellm-fallback-model-used header in responses

Repository: BerriAI/litellm
Author: ajhalaria-godaddy
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/25712

Description (problem / solution / changelog)

When a fallback model is used instead of the primary model, stamp x-litellm-fallback-model-used on the response's additional_headers so callers can tell which model actually served the request.

add fallback_model param to add_fallback_headers_to_response()
capture effective fallback model name in run_async_fallback() before the recursive call and pass it through to the header helper
add unit tests covering header presence/absence and edge cases

Fixes https://github.com/BerriAI/litellm/issues/25503

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Type

🆕 New Feature 🐛 Bug Fix 🧹 Refactoring 📖 Documentation 🚄 Infrastructure ✅ Test

Changes

Changed files

litellm/router_utils/add_retry_fallback_headers.py (modified, +6/-1)
litellm/router_utils/fallback_event_handlers.py (modified, +8/-0)
tests/test_litellm/router/__init__.py (added, +0/-0)
tests/test_litellm/router/test_fallback_headers.py (added, +213/-0)
uv.lock (modified, +2/-2)

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

The Feature

When the LiteLLM router falls back from one deployment to another (e.g., gpt-4o on Azure East ? gpt-4o on Azure West, or gpt-4o ? gpt-4-turbo), the model field in the response and the x-litellm-model response header still reflect the originally requested model group name, not the deployment that actually served the request.

Proposal: add a field to the response (or response header) that identifies the actual deployment/model that served the request after fallback resolution.

Suggested additions:

x-litellm-actual-model response header: the litellm_params.model of the deployment that served the call
x-litellm-model-group response header: the requested model group name (existing x-litellm-model behavior, renamed for clarity)
Alternatively, populate usage.model or a _hidden_params.model_used field with the resolved model

Motivation, pitch

Without knowing which deployment actually served a request:

Cost attribution breaks: if the fallback lands on a more expensive model (e.g., GPT-4 instead of GPT-4o-mini), the spend is logged against the wrong model.
Debugging is hard: when latency spikes or quality degrades after a fallback, there is no signal in the response to tell operators which deployment served the traffic.
Alerting on specific deployments is impossible: monitoring systems that watch for errors or latency per deployment cannot correlate without the actual model name.

This is especially important in multi-region or multi-provider router configs where fallback behavior is frequent and expected.

What part of LiteLLM is this about?

Proxy / Router

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

No response

extent analysis

TL;DR

Add a new response header or field to identify the actual deployment/model that served the request after fallback resolution.

Guidance

Introduce a new x-litellm-actual-model response header to return the litellm_params.model of the deployment that served the call.
Consider renaming the existing x-litellm-model header to x-litellm-model-group for clarity on the requested model group name.
Alternatively, explore populating the usage.model or a _hidden_params.model_used field with the resolved model for easier debugging and cost attribution.
Verify the effectiveness of the chosen solution by testing fallback scenarios and checking the response headers or fields for the correct actual deployment/model information.

Example

No explicit code example is provided due to the lack of specific technical implementation details in the issue.

Notes

The proposed solution assumes that the LiteLLM router has the capability to track and report the actual deployment/model used after fallback resolution. The exact implementation may vary depending on the underlying architecture and technology stack of the LiteLLM system.

Recommendation

Apply workaround by introducing the x-litellm-actual-model response header, as it directly addresses the issue of identifying the actual deployment/model that served the request after fallback resolution, thereby facilitating accurate cost attribution, debugging, and alerting.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#environment setup #docker error #permission error #memory optimization #batch processing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - ✅(Solved) Fix [Feature]: Expose actual served model in response when router falls back to a different deployment [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #25712: feat(router): expose x-litellm-fallback-model-used header in responses

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Changed files

Check for existing issues

The Feature

Motivation, pitch

What part of LiteLLM is this about?

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

litellm - ✅(Solved) Fix [Feature]: Expose actual served model in response when router falls back to a different deployment [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #25712: feat(router): expose x-litellm-fallback-model-used header in responses

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Changed files

Check for existing issues

The Feature

Motivation, pitch

What part of LiteLLM is this about?

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING