litellm - ✅(Solved) Fix [Bug]: `_set_model_group_info` fallback ignores DB `model_info` cost [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25874Fetched 2026-04-17 08:28:25
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×2cross-referenced ×1renamed ×1

Root Cause

In litellm/router.py, _set_model_group_info() (line ~7650):

  1. get_deployment_model_info(model_id, model_name) looks up the model in LiteLLM's built-in cost map (model_prices_and_context_window.json).
  2. For custom/self-hosted models (e.g., hosted_vllm/Qwen/Qwen3-8B), it's not in the built-in map → returns None.
  3. The fallback (line ~7684) creates ModelMapInfo with hardcoded input_cost_per_token=0, output_cost_per_token=0 — ignoring the DB model_info values that are already loaded in model_info_dict (line ~7629).
  4. _is_model_cost_zero() in auth_checks.py sees cost=0 → returns Trueskip_budget_checks = True.
  5. _virtual_key_max_budget_check is never called.

The DB model_info cost is used for spend calculation (after the request) but not for the zero-cost check (before the request). This creates an inconsistency where spend accumulates correctly but budget enforcement never triggers.

# router.py line ~7629
model_info_dict = model.get("model_info", {})  # DB values available here (e.g., 0.5)

# router.py line ~7684 — fallback ignores model_info_dict
if model_info is None:
    model_info = ModelMapInfo(
        input_cost_per_token=0,   # Should be: model_info_dict.get("input_cost_per_token", 0)
        output_cost_per_token=0,  # Should be: model_info_dict.get("output_cost_per_token", 0)
        ...
    )

Fix Action

Fixed

PR fix notes

PR #25888: fix(router): propagate custom cost_per_token from db model_info in fallback path

Description (problem / solution / changelog)

Relevant issues

Fixes #25874

<!-- e.g. "Fixes #000" -->

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

<!-- Include screenshots, screen recordings, or log output demonstrating that your changes work as expected. For bug fixes: show reproduction before the fix and passing behavior after. For new features: show the feature working end-to-end. For UI changes: include before/after screenshots. --> <table> <tr> <th>Before</th> <th>After</th> </tr> <tr> <td><img width="800" alt="Before fix - cost_per_token is None" src="https://github.com/user-attachments/assets/162771f0-f124-4f71-8797-c7e1fa4ec64f" /></td> <td><img width="800" alt="After fix - cost_per_token propagated from db model_info" src="https://github.com/user-attachments/assets/68f6cf44-90c9-42f1-a923-5c7efef49799" /></td> </tr> </table>

Unit tests — 2 new tests added and passing:

  • test_model_group_info_cost_from_db_model_info — verifies input_cost_per_token / output_cost_per_token are read from db_model_info when get_deployment_model_info fails
  • test_model_group_info_cost_none_when_db_model_info_has_no_cost — verifies values remain None when db_model_info has no cost fields
tests/test_litellm/test_router.py::test_model_group_info_cost_from_db_model_info PASSED
tests/test_litellm/test_router.py::test_model_group_info_cost_none_when_db_model_info_has_no_cost PASSED

Type

<!-- Select the type of Pull Request --> <!-- Keep only the necessary ones -->

🐛 Bug Fix

Changes

When a model is not found in LiteLLM's built-in model cost map (model_info is None in get_model_group_info()), the fallback ModelMapInfo was hardcoding input_cost_per_token and output_cost_per_token to None, ignoring any custom cost values the user had configured in the database's model_info.

This fix reads input_cost_per_token and output_cost_per_token from db_model_info (the user's database/config model_info), consistent with how mode is already read from the same source. If the keys are not present, .get() returns None, preserving backward compatibility.

Before

model_info = ModelMapInfo(
    ...
    input_cost_per_token=None,
    output_cost_per_token=None,
    ...
)

After

input_cost_per_token = db_model_info.get("input_cost_per_token")
output_cost_per_token = db_model_info.get("output_cost_per_token")

model_info = ModelMapInfo(
    ...
    input_cost_per_token=input_cost_per_token,
    output_cost_per_token=output_cost_per_token,
    ...
)

Changed files

  • litellm/router.py (modified, +4/-2)
  • tests/test_litellm/test_router.py (modified, +63/-0)

Code Example

# router.py line ~7629
model_info_dict = model.get("model_info", {})  # DB values available here (e.g., 0.5)

# router.py line ~7684 — fallback ignores model_info_dict
if model_info is None:
    model_info = ModelMapInfo(
        input_cost_per_token=0,   # Should be: model_info_dict.get("input_cost_per_token", 0)
        output_cost_per_token=0,  # Should be: model_info_dict.get("output_cost_per_token", 0)
        ...
    )

---

-                input_cost_per_token=0,
-                output_cost_per_token=0,
+                input_cost_per_token=db_model_info.get("input_cost_per_token", 0),
+                output_cost_per_token=db_model_info.get("output_cost_per_token", 0),

---

curl -X POST "${LITELLM_URL}/model/new" \
  -H "Authorization: Bearer ${MASTER_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "my-custom-model",
    "litellm_params": {
      "model": "hosted_vllm/Qwen/Qwen3-8B",
      "api_base": "http://my-vllm-server/v1",
      "api_key": "dummy"
    },
    "model_info": {
      "input_cost_per_token": 0.5,
      "output_cost_per_token": 0.5
    }
  }'

---

curl -X POST "${LITELLM_URL}/key/generate" \
  -H "Authorization: Bearer ${MASTER_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"max_budget": 0.001}'

---

curl "${LITELLM_URL}/key/info?key=sk-..." -H "Authorization: Bearer ${MASTER_KEY}"
# Shows: spend=10.0, max_budget=0.001

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When using custom-hosted models (e.g., hosted_vllm) with custom pricing set via model_info.input_cost_per_token / output_cost_per_token (either in config.yaml or via /model/new API), the budget check (_virtual_key_max_budget_check) is completely skipped because _is_model_cost_zero incorrectly returns True.

Expected behavior: Budget enforcement should work for custom-priced models. A virtual key with max_budget=5 and spend=10 should be blocked.

Actual behavior: Requests always succeed (200) regardless of budget, because the model is treated as "zero cost".

Root Cause

In litellm/router.py, _set_model_group_info() (line ~7650):

  1. get_deployment_model_info(model_id, model_name) looks up the model in LiteLLM's built-in cost map (model_prices_and_context_window.json).
  2. For custom/self-hosted models (e.g., hosted_vllm/Qwen/Qwen3-8B), it's not in the built-in map → returns None.
  3. The fallback (line ~7684) creates ModelMapInfo with hardcoded input_cost_per_token=0, output_cost_per_token=0 — ignoring the DB model_info values that are already loaded in model_info_dict (line ~7629).
  4. _is_model_cost_zero() in auth_checks.py sees cost=0 → returns Trueskip_budget_checks = True.
  5. _virtual_key_max_budget_check is never called.

The DB model_info cost is used for spend calculation (after the request) but not for the zero-cost check (before the request). This creates an inconsistency where spend accumulates correctly but budget enforcement never triggers.

# router.py line ~7629
model_info_dict = model.get("model_info", {})  # DB values available here (e.g., 0.5)

# router.py line ~7684 — fallback ignores model_info_dict
if model_info is None:
    model_info = ModelMapInfo(
        input_cost_per_token=0,   # Should be: model_info_dict.get("input_cost_per_token", 0)
        output_cost_per_token=0,  # Should be: model_info_dict.get("output_cost_per_token", 0)
        ...
    )

Suggested Fix

-                input_cost_per_token=0,
-                output_cost_per_token=0,
+                input_cost_per_token=db_model_info.get("input_cost_per_token", 0),
+                output_cost_per_token=db_model_info.get("output_cost_per_token", 0),

db_model_info is already defined at line ~7681 as model.get("model_info", {}).

Steps to Reproduce

  1. Register a custom model with explicit pricing:
curl -X POST "${LITELLM_URL}/model/new" \
  -H "Authorization: Bearer ${MASTER_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "my-custom-model",
    "litellm_params": {
      "model": "hosted_vllm/Qwen/Qwen3-8B",
      "api_base": "http://my-vllm-server/v1",
      "api_key": "dummy"
    },
    "model_info": {
      "input_cost_per_token": 0.5,
      "output_cost_per_token": 0.5
    }
  }'
  1. Create a virtual key with a small budget:
curl -X POST "${LITELLM_URL}/key/generate" \
  -H "Authorization: Bearer ${MASTER_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"max_budget": 0.001}'
  1. Send requests using the key — spend accumulates correctly (e.g., $10 per request).
  2. Verify budget is exceeded:
curl "${LITELLM_URL}/key/info?key=sk-..." -H "Authorization: Bearer ${MASTER_KEY}"
# Shows: spend=10.0, max_budget=0.001
  1. Send another request — expected 400 BudgetExceededError, got 200.
  2. Check logs — _is_model_cost_zero=True is logged, confirming budget check was skipped.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

latest

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The most likely fix is to update the _set_model_group_info() function in litellm/router.py to use the model_info values from the database instead of hardcoded zeros for custom-hosted models.

Guidance

  • The issue arises from the inconsistency in using database model_info values for spend calculation but not for the zero-cost check.
  • To fix this, update the fallback in _set_model_group_info() to use model_info_dict values for input_cost_per_token and output_cost_per_token.
  • Verify the fix by checking if budget enforcement works correctly for custom-priced models.
  • Test the fix using the provided steps to reproduce the issue.

Example

# Updated fallback in router.py
if model_info is None:
    model_info = ModelMapInfo(
        input_cost_per_token=model_info_dict.get("input_cost_per_token", 0),
        output_cost_per_token=model_info_dict.get("output_cost_per_token", 0),
        ...
    )

Notes

This fix assumes that the model_info values in the database are correctly set for custom-hosted models. If these values are not set or are incorrect, the fix may not work as expected.

Recommendation

Apply the suggested fix to update the _set_model_group_info() function, as it directly addresses the root cause of the issue and ensures consistency in using database model_info values for both spend calculation and zero-cost checks.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: `_set_model_group_info` fallback ignores DB `model_info` cost [1 pull requests, 1 participants]