litellm - 💡(How to fix) Fix [Bug]: Prevent budget enforcement from blocking model discovery endpoints (allow accessing free models)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When a team/org/user budget is exhausted in LiteLLM, the proxy returns HTTP 429 Budget Exceeded for model discovery endpoints such as GET /v1/models and GET /models.

This completely breaks OpenAI-compatible clients because they rely on these endpoints to populate available models before users can interact with the UI. As a result, all models become invisible - including free/self-hosted models that would not incur additional cost.

Error Message

This means all models become invisible to the client — including free models (zero-cost) that would not incur any additional spend. Clients like Open WebUI, Continue.dev, Cursor, Aider, TypingMind, BoltAI, LibreChat, and any other OpenAI-compatible client that calls GET /v1/models for model discovery will show zero models or display a connection error, rendering the entire interface unusable. | TypingMind / BoltAI | Connection error or empty model list. | | Any OpenAI SDK client | openai.models.list() raises an API error instead of returning the model list. | The core problem: instead of letting users see their models and getting a clear "budget exceeded" error when they attempt to chat, users see a broken interface with no models and no actionable information. | "all" | /models and /v1/models always return the full model list regardless of budget status. Budget is only enforced on actual inference calls (/chat/completions, /completions, /embeddings, etc.). | Most common use case — clients remain functional, users see a clear "budget exceeded" error only when they attempt to use a model. | "error": {

Root Cause

The issue is an inconsistency in route-level budget check gating within common_checks() in litellm/proxy/auth/auth_checks.py.

Fix Action

Fix / Workaround

Workarounds (and why they don't work)

WorkaroundProblem
Add /models to general_settings.public_routesRequires premium/enterprise license. Also removes all authentication from /models and /v1/models, bypassing model access controls entirely.
Set very high team budgetDefeats the purpose of budget controls.
Use master key in clientsBypasses all auth/budget — security risk. Not viable in multi-team setups.

There is no viable workaround today.

Code Example

# auth_checks.py:320-334
def _global_proxy_budget_check(global_proxy_spend, skip_budget_checks, route):
    if (
        litellm.max_budget > 0
        and not skip_budget_checks
        and global_proxy_spend is not None
        and RouteChecks.is_llm_api_route(route=route)
        and route != "/v1/models"      # <-- explicit exclusion
        and route != "/models"          # <-- explicit exclusion
    ):
        ...

---

# auth_checks.py:1035-1036
async def _check_end_user_budget(end_user_obj, route):
    if RouteChecks.is_info_route(route):   # <-- skips /models and /v1/models
        return
    ...

---

# auth_checks.py:3632-3676
async def _team_max_budget_check(team_object, valid_token, proxy_logging_obj):
    if team_object is not None and team_object.max_budget is not None:
        spend = await get_current_spend(...)
        if spend > team_object.max_budget:
            raise litellm.BudgetExceededError(...)  # <-- blocks /models AND /v1/models

---

# config.yaml
general_settings:
  master_key: sk-1234
  budget_exceeded_models_policy: "all"  # options: "blocked" | "all" | "free_only"

---

general_settings:
  budget_exceeded_models_policy: "all"

---

general_settings:
  budget_exceeded_models_policy: "free_only"

---

general_settings:
  budget_exceeded_models_policy: "blocked"

---

# config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: sk-xxx
  - model_name: free-local-model
    litellm_params:
      model: ollama/llama3
      api_base: http://localhost:11434
      input_cost_per_token: 0
      output_cost_per_token: 0

general_settings:
  master_key: sk-master-1234

---

litellm --config config.yaml --port 4000

---

curl -X POST http://localhost:4000/team/new \
     -H "Authorization: Bearer sk-master-1234" \
     -H "Content-Type: application/json" \
     -d '{"team_alias": "test-team", "max_budget": 0.01}'

---

curl -X POST http://localhost:4000/key/generate \
     -H "Authorization: Bearer sk-master-1234" \
     -H "Content-Type: application/json" \
     -d '{"team_id": "<team_id_from_step_2>"}'

---

curl -X POST http://localhost:4000/team/update \
     -H "Authorization: Bearer sk-master-1234" \
     -H "Content-Type: application/json" \
     -d '{"team_id": "<team_id>", "max_budget": 0.0}'

---

curl http://localhost:4000/v1/models \
     -H "Authorization: Bearer <team_key>"

---

{
     "error": {
       "message": "Budget has been exceeded! Team=<team_id> Current cost: 0.0, Max budget: 0.0",
       "type": "budget_exceeded",
       "param": null,
       "code": "429"
     }
   }

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Summary

When a team/org/user budget is exhausted in LiteLLM, the proxy returns HTTP 429 Budget Exceeded for model discovery endpoints such as GET /v1/models and GET /models.

This completely breaks OpenAI-compatible clients because they rely on these endpoints to populate available models before users can interact with the UI. As a result, all models become invisible - including free/self-hosted models that would not incur additional cost.

What happened?

When a team-level budget (or organization/user budget) is fully exhausted, the LiteLLM proxy returns HTTP 429 Budget Exceeded on the GET /v1/models and GET /models endpoints. This completely blocks model discovery for any client using that team's API key.

This means all models become invisible to the client — including free models (zero-cost) that would not incur any additional spend. Clients like Open WebUI, Continue.dev, Cursor, Aider, TypingMind, BoltAI, LibreChat, and any other OpenAI-compatible client that calls GET /v1/models for model discovery will show zero models or display a connection error, rendering the entire interface unusable.

Expected behavior

The proxy administrator should be able to control what happens to model listing endpoints when a budget is exceeded. The current behavior (complete block) should not be the only option. Specifically, the admin should be able to choose between:

  1. Return all models — model listing always works; budget is only enforced on actual inference calls
  2. Return only free models — only models with zero input/output cost are returned, so users can still use free models
  3. Block entirely — current behavior (default, for backward compatibility)

Affected Endpoints

EndpointCalled ByBlocked by team/org/user budget?
GET /v1/modelsAll OpenAI-compatible clients (Open WebUI, Aider, Cursor, Continue.dev, etc.)YES — blocked
GET /modelsSame as above (alias)YES — blocked
GET /v1/models/{model_id}OpenAI SDK models.retrieve()YES — blocked
GET /models/{model_id}Same as above (alias)YES — blocked
GET /model/infoLiteLLM admin dashboard (proprietary, not called by external clients)YES — blocked
GET /v1/model/infoLiteLLM admin dashboard (proprietary)YES — blocked
GET /v2/model/infoLiteLLM admin dashboard (proprietary, beta)YES — blocked
GET /model_group/infoLiteLLM admin dashboard (proprietary)YES — blocked

Primary concern: /v1/models and /models — these are part of the OpenAI API spec and are the standard model discovery mechanism for all compatible clients.

Secondary concern: /model/info, /v1/model/info, /v2/model/info, /model_group/info — these are LiteLLM-proprietary endpoints used by the admin dashboard UI. Not called by external OpenAI clients, but still unnecessarily blocked.

Impact on OpenAI-Compatible Clients

This is not specific to any single client. Every client that follows the OpenAI API spec calls GET /v1/models for model discovery before allowing users to interact. When this endpoint returns 429, the entire client becomes unusable — not just for paid models, but for free models too.

Examples of affected clients:

ClientImpact
Open WebUIModel dropdown shows no models. Users cannot select any model, including free ones.
Continue.dev (VS Code)Model discovery fails silently. No available models shown in configuration.
AiderCannot start a session — fails to resolve which models are available.
CursorModel picker shows nothing or errors out.
TypingMind / BoltAIConnection error or empty model list.
LibreChatCannot populate model selector.
Any OpenAI SDK clientopenai.models.list() raises an API error instead of returning the model list.

The core problem: instead of letting users see their models and getting a clear "budget exceeded" error when they attempt to chat, users see a broken interface with no models and no actionable information.

Root Cause Analysis

The issue is an inconsistency in route-level budget check gating within common_checks() in litellm/proxy/auth/auth_checks.py.

Budget checks that already correctly skip /models and /v1/models:

CheckLocationHow it skips
Global proxy budgetauth_checks.py:320-334Explicitly checks route != "/v1/models" and route != "/models"
End-user budgetauth_checks.py:1035-1036Checks RouteChecks.is_info_route(route) and returns early

Budget checks that block /models and /v1/models (inconsistent with above):

CheckLocationIssue
Team max budgetauth_checks.py:3632-3676 (_team_max_budget_check)No route check — blocks all routes including /models and /v1/models
Team multi-window budgetauth_checks.py:3679-3709 (_team_multi_budget_check)No route check
Team member budgetauth_checks.py:3507-3576 (_check_team_member_budget)No route check
Organization budgetauth_checks.py (_organization_max_budget_check)No route check
User budget (personal key)auth_checks.py:636-653 (inline in common_checks)No route check
Key multi-window budgetauth_checks.py (_virtual_key_multi_budget_check)No route check

The inconsistency in code

Global proxy budget — correctly excludes /models and /v1/models:

# auth_checks.py:320-334
def _global_proxy_budget_check(global_proxy_spend, skip_budget_checks, route):
    if (
        litellm.max_budget > 0
        and not skip_budget_checks
        and global_proxy_spend is not None
        and RouteChecks.is_llm_api_route(route=route)
        and route != "/v1/models"      # <-- explicit exclusion
        and route != "/models"          # <-- explicit exclusion
    ):
        ...

End-user budget — correctly excludes info routes (which includes /models and /v1/models):

# auth_checks.py:1035-1036
async def _check_end_user_budget(end_user_obj, route):
    if RouteChecks.is_info_route(route):   # <-- skips /models and /v1/models
        return
    ...

Team budget — no exclusion (the bug):

# auth_checks.py:3632-3676
async def _team_max_budget_check(team_object, valid_token, proxy_logging_obj):
    if team_object is not None and team_object.max_budget is not None:
        spend = await get_current_spend(...)
        if spend > team_object.max_budget:
            raise litellm.BudgetExceededError(...)  # <-- blocks /models AND /v1/models

Workarounds (and why they don't work)

WorkaroundProblem
Add /models to general_settings.public_routesRequires premium/enterprise license. Also removes all authentication from /models and /v1/models, bypassing model access controls entirely.
Set very high team budgetDefeats the purpose of budget controls.
Use master key in clientsBypasses all auth/budget — security risk. Not viable in multi-team setups.

There is no viable workaround today.

Relevant Code References

  • /models and /v1/models endpoint: litellm/proxy/proxy_server.py:7654-7791
  • /v1/models/{model_id} endpoint: litellm/proxy/proxy_server.py:7794-7864
  • /model/info and /v1/model/info endpoint: litellm/proxy/proxy_server.py:11644-11785
  • Auth entry point: litellm/proxy/auth/user_api_key_auth.py:2082-2153
  • common_checks(): litellm/proxy/auth/auth_checks.py:460-689
  • _team_max_budget_check(): litellm/proxy/auth/auth_checks.py:3632-3676
  • _team_multi_budget_check(): litellm/proxy/auth/auth_checks.py:3679-3709
  • _check_team_member_budget(): litellm/proxy/auth/auth_checks.py:3507-3576
  • _global_proxy_budget_check() (correct pattern): litellm/proxy/auth/auth_checks.py:320-334
  • _check_end_user_budget() (correct pattern): litellm/proxy/auth/auth_checks.py:1021-1056
  • _is_model_cost_zero(): litellm/proxy/auth/auth_checks.py:122-225
  • RouteChecks.is_info_route(): litellm/proxy/auth/route_checks.py:411-416
  • Route classifications (info_routes, openai_routes): litellm/proxy/_types.py

Proposed Solution: Configurable budget_exceeded_models_policy

Rather than hard-coding one behavior, introduce a new general_settings configuration option that lets the proxy administrator decide what happens to the /models and /v1/models endpoints when a budget is exceeded.

Configuration

# config.yaml
general_settings:
  master_key: sk-1234
  budget_exceeded_models_policy: "all"  # options: "blocked" | "all" | "free_only"
ValueBehaviorUse case
"blocked"Default. Current behavior — /models and /v1/models return 429 when budget is exceeded. Fully backward compatible.Strict environments where over-budget users should not interact with the proxy at all.
"all"/models and /v1/models always return the full model list regardless of budget status. Budget is only enforced on actual inference calls (/chat/completions, /completions, /embeddings, etc.).Most common use case — clients remain functional, users see a clear "budget exceeded" error only when they attempt to use a model.
"free_only"/models and /v1/models return only models with zero input and output cost (using the existing _is_model_cost_zero() logic). Paid models are hidden from the list.Organizations that offer free models (e.g., self-hosted Ollama/vLLM) alongside paid ones and want users to fall back to free models when budget runs out.

Why this fits naturally into the codebase

  1. Follows the established general_settings pattern — Read at runtime with general_settings.get("budget_exceeded_models_policy", "blocked"), same as enforce_user_param, allow_requests_on_db_unavailable, ui_access_mode, etc.

  2. Reuses existing infrastructure:

    • The _is_model_cost_zero() function in auth_checks.py:122-225 already handles all edge cases for determining if a model is free (checks both input/output cost, verifies explicit configuration via _is_cost_explicitly_configured(), handles model groups). This is the same function used today to skip budget checks for free models on inference calls.
    • The RouteChecks.is_info_route() check is already the pattern used by _check_end_user_budget to skip info routes (which includes /models and /v1/models).
  3. Consistent with existing code patterns:

    • _global_proxy_budget_check already explicitly excludes /models and /v1/models
    • _check_end_user_budget already skips info routes
    • This proposal extends that same pattern to team/org/user/team-member budget checks, but makes it configurable instead of hard-coded

Implementation sketch

In common_checks() (auth_checks.py around line 584):

When budget_exceeded_models_policy is "all" or "free_only", skip budget checks for info routes (same pattern as _check_end_user_budget). Pass a signal through UserAPIKeyAuth (e.g., a budget_exceeded: bool flag) so the /models and /v1/models endpoint knows whether to filter.

In model_list() (proxy_server.py around line 7776):

When the budget_exceeded flag is set and policy is "free_only", filter the model list using _is_model_cost_zero() for each model before building the response — only include models where both input_cost_per_token and output_cost_per_token are explicitly configured as 0.

When policy is "all", no filtering needed — return the full list as normal.

When policy is "blocked" (default), no change — current behavior is preserved.

Example configurations

"I want clients to always work, enforce budget only on inference":

general_settings:
  budget_exceeded_models_policy: "all"

"I want users to fall back to free self-hosted models when budget runs out":

general_settings:
  budget_exceeded_models_policy: "free_only"

"I want strict lockout when budget is exceeded" (current default behavior):

general_settings:
  budget_exceeded_models_policy: "blocked"

Or simply omit the setting — "blocked" is the default for backward compatibility.

Additional Context

  • LiteLLM version: latest (main branch)
  • /models and /v1/models are classified as both openai_routes and info_routes in LiteLLMRoutes (litellm/proxy/_types.py)
  • This issue affects any deployment using team/org/user budgets behind an OpenAI-compatible client
  • This is a single global setting that applies uniformly to all budget types (team, org, user, team member, key multi-window)
  • The /model/info and /v1/model/info endpoints (LiteLLM-proprietary, used by admin dashboard) are also blocked, but this is secondary since external clients don't call them

Steps to Reproduce

How to reproduce

Setup

# config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: sk-xxx
  - model_name: free-local-model
    litellm_params:
      model: ollama/llama3
      api_base: http://localhost:11434
      input_cost_per_token: 0
      output_cost_per_token: 0

general_settings:
  master_key: sk-master-1234

Steps

  1. Start LiteLLM proxy:

    litellm --config config.yaml --port 4000
  2. Create a team with a small budget:

    curl -X POST http://localhost:4000/team/new \
      -H "Authorization: Bearer sk-master-1234" \
      -H "Content-Type: application/json" \
      -d '{"team_alias": "test-team", "max_budget": 0.01}'
  3. Create a key for the team:

    curl -X POST http://localhost:4000/key/generate \
      -H "Authorization: Bearer sk-master-1234" \
      -H "Content-Type: application/json" \
      -d '{"team_id": "<team_id_from_step_2>"}'
  4. Exhaust the budget (or set max_budget: 0.0 for immediate reproduction):

    curl -X POST http://localhost:4000/team/update \
      -H "Authorization: Bearer sk-master-1234" \
      -H "Content-Type: application/json" \
      -d '{"team_id": "<team_id>", "max_budget": 0.0}'
  5. Try to list models with the team key:

    curl http://localhost:4000/v1/models \
      -H "Authorization: Bearer <team_key>"
  6. Actual resultHTTP 429:

    {
      "error": {
        "message": "Budget has been exceeded! Team=<team_id> Current cost: 0.0, Max budget: 0.0",
        "type": "budget_exceeded",
        "param": null,
        "code": "429"
      }
    }
  7. Expected resultHTTP 200 with model list (all models, or at minimum the free ones).

Relevant log output

What part of LiteLLM is this about?

No response

What LiteLLM version are you on ?

v1.83.3

Twitter / LinkedIn details

https://www.linkedin.com/in/keval-mahajan/

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The proxy administrator should be able to control what happens to model listing endpoints when a budget is exceeded. The current behavior (complete block) should not be the only option. Specifically, the admin should be able to choose between:

  1. Return all models — model listing always works; budget is only enforced on actual inference calls
  2. Return only free models — only models with zero input/output cost are returned, so users can still use free models
  3. Block entirely — current behavior (default, for backward compatibility)

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Prevent budget enforcement from blocking model discovery endpoints (allow accessing free models)