The proxy administrator should be able to **control what happens to model listing endpoints when a budget is exceeded**. The current behavior (complete block) should not be the only option. Specifically, the admin should be able to choose between: 1. **Return all models** — model listing always works; budget is only enforced on actual inference calls 2. **Return only free models** — only models with zero input/output cost are returned, so users can still use free models 3. **Block entirely** — current behavior (default, for backward compatibility)

litellm - 💡(How to fix) Fix [Bug]: Prevent budget enforcement from blocking model discovery endpoints (allow accessing free models)

StepCodex · 2026-05-14T09:16:12Z

[litellm] When a team/org/user budget is exhausted in LiteLLM, the proxy returns HTTP 429 Budget Exceeded for model discovery endpoints such as GET /v1/models… When a team/org/user budget is exhausted in LiteLLM, the proxy returns HTTP 429 Budget Exceeded for model discovery endpoints such as GET /v1/models and GET /models. This completely breaks OpenAI-compatible clients because they rely on these endpoints to populate available models before users can interact with the UI. As a result, all models become invisible - including free/self-hosted models that would not incur additional cost. ## Fix / Workaround ## Workarounds (and why they don't work) | Workaround | Problem | |---|---| | Add `/models` to `general_settings.public_routes` | Requires premium/enterprise license. Also removes **all** authentication from `/models` and `/v1/models`, bypassing model access controls entirely. | | Set very high team budget | Defeats the purpose of budget controls. | | Use master key in clients | Bypasses all auth/budget — security risk. Not viable in multi-team setups. | There is no viable workaround today. ### Check for existing issues - [x] I have searched the existing issues and checked that my issue is not a duplicate. ### What happened? ## Summary When a team/org/user budget is exhausted in LiteLLM, the proxy returns HTTP 429 Budget Exceeded for model discovery endpoints such as GET /v1/models and GET /models. This completely breaks OpenAI-compatible clients because they rely on these endpoints to populate available models before users can interact with the UI. As a result, all models become invisible - including free/self-hosted models that would not incur additional cost. ## What happened? When a **team-level budget** (or organization/user budget) is fully exhausted, the LiteLLM proxy returns `HTTP 429 Budget Exceeded` on the `GET /v1/models` and `GET /models` endpoints. This completely blocks model discovery for any client using that team's API key. This means **all models become invisible** to the client — including **free models** (zero-cost) that would not incur any additional spend. Clients like Open WebUI, Continue.dev, Cursor, Aider, TypingMind, BoltAI, LibreChat, and any other OpenAI-compatible client that calls `GET /v1/models` for model discovery will show **zero models** or display a connection error, rendering the entire interface unusable. ## Expected behavior The proxy administrator should be able to **control what happens to model listing endpoints when a budget is exceeded**. The current behavior (complete block) should not be the only option. Specifically, the admin should be able to choose between: 1. **Return all models** — model listing always works; budget is only enforced on actual inference calls 2. **Return only free models** — only models with zero input/output cost are returned, so users can still use free models 3. **Block entirely** — current behavior (default, for backward compatibility) ## Affected Endpoints | Endpoint | Called By | Blocked by team/org/user budget? | |----------|-----------|----------------------------------| | **`GET /v1/models`** | All OpenAI-compatible clients (Open WebUI, Aider, Cursor, Continue.dev, etc.) | **YES — blocked** | | **`GET /models`** | Same as above (alias) | **YES — blocked** | | `GET /v1/models/{model_id}` | OpenAI SDK `models.retrieve()` | **YES — blocked** | | `GET /models/{model_id}` | Same as above (alias) | **YES — blocked** | | `GET /model/info` | LiteLLM admin dashboard (proprietary, not called by external clients) | **YES — blocked** | | `GET /v1/model/info` | LiteLLM admin dashboard (proprietary) | **YES — blocked** | | `GET /v2/model/info` | LiteLLM admin dashboard (proprietary, beta) | **YES — blocked** | | `GET /model_group/info` | LiteLLM admin dashboard (proprietary) | **YES — blocked** | **Primary concern:** `/v1/models` and `/models` — these are part of the OpenAI API spec and are the standard model discovery mechanism for all compatible clients. **Secondary concern:** `/model/info`, `/v1/model/info`, `/v2/model/info`, `/model_group/info` — these are LiteLLM-proprietary endpoints used by the admin dashboard UI. Not called by external OpenAI clients, but still unnecessarily blocked. ## Impact on OpenAI-Compatible Clients This is not specific to any single client. **Every client that follows the OpenAI API spec** calls `GET /v1/models` for model discovery before allowing users to interact. When this endpoint returns 429, the entire client becomes unusable — not just for paid models, but for free models too. Examples of affected clients: | Client | Impact | |--------|--------| | **Open WebUI** | Model dropdown shows no models. Users cannot select any model, including free ones. | | **Continue.dev** (VS Code) | Model discovery fails silently. No available models shown in configuration. | | **Aider** | Cannot start a session — fails to resolve which models are available. | | **Cursor** | Model picker shows nothing or errors out. | | **Ty

litellm2026-05-14 09:16:12

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

When a team/org/user budget is exhausted in LiteLLM, the proxy returns HTTP 429 Budget Exceeded for model discovery endpoints such as GET /v1/models and GET /models.

This completely breaks OpenAI-compatible clients because they rely on these endpoints to populate available models before users can interact with the UI. As a result, all models become invisible - including free/self-hosted models that would not incur additional cost.

Error Message

This means all models become invisible to the client — including free models (zero-cost) that would not incur any additional spend. Clients like Open WebUI, Continue.dev, Cursor, Aider, TypingMind, BoltAI, LibreChat, and any other OpenAI-compatible client that calls GET /v1/models for model discovery will show zero models or display a connection error, rendering the entire interface unusable. | TypingMind / BoltAI | Connection error or empty model list. | | Any OpenAI SDK client | openai.models.list() raises an API error instead of returning the model list. | The core problem: instead of letting users see their models and getting a clear "budget exceeded" error when they attempt to chat, users see a broken interface with no models and no actionable information. | "all" | /models and /v1/models always return the full model list regardless of budget status. Budget is only enforced on actual inference calls (/chat/completions, /completions, /embeddings, etc.). | Most common use case — clients remain functional, users see a clear "budget exceeded" error only when they attempt to use a model. | "error": {

Root Cause

The issue is an inconsistency in route-level budget check gating within common_checks() in litellm/proxy/auth/auth_checks.py.

Fix Action

Fix / Workaround

Workarounds (and why they don't work)

Workaround	Problem
Add `/models` to `general_settings.public_routes`	Requires premium/enterprise license. Also removes all authentication from `/models` and `/v1/models`, bypassing model access controls entirely.
Set very high team budget	Defeats the purpose of budget controls.
Use master key in clients	Bypasses all auth/budget — security risk. Not viable in multi-team setups.

There is no viable workaround today.

Code Example

# auth_checks.py:320-334
def _global_proxy_budget_check(global_proxy_spend, skip_budget_checks, route):
    if (
        litellm.max_budget > 0
        and not skip_budget_checks
        and global_proxy_spend is not None
        and RouteChecks.is_llm_api_route(route=route)
        and route != "/v1/models"      # <-- explicit exclusion
        and route != "/models"          # <-- explicit exclusion
    ):
        ...

---

# auth_checks.py:1035-1036
async def _check_end_user_budget(end_user_obj, route):
    if RouteChecks.is_info_route(route):   # <-- skips /models and /v1/models
        return
    ...

---

# auth_checks.py:3632-3676
async def _team_max_budget_check(team_object, valid_token, proxy_logging_obj):
    if team_object is not None and team_object.max_budget is not None:
        spend = await get_current_spend(...)
        if spend > team_object.max_budget:
            raise litellm.BudgetExceededError(...)  # <-- blocks /models AND /v1/models

---

# config.yaml
general_settings:
  master_key: sk-1234
  budget_exceeded_models_policy: "all"  # options: "blocked" | "all" | "free_only"

---

general_settings:
  budget_exceeded_models_policy: "all"

---

general_settings:
  budget_exceeded_models_policy: "free_only"

---

general_settings:
  budget_exceeded_models_policy: "blocked"

---

# config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: sk-xxx
  - model_name: free-local-model
    litellm_params:
      model: ollama/llama3
      api_base: http://localhost:11434
      input_cost_per_token: 0
      output_cost_per_token: 0

general_settings:
  master_key: sk-master-1234

---

litellm --config config.yaml --port 4000

---

curl -X POST http://localhost:4000/team/new \
     -H "Authorization: Bearer sk-master-1234" \
     -H "Content-Type: application/json" \
     -d '{"team_alias": "test-team", "max_budget": 0.01}'

---

curl -X POST http://localhost:4000/key/generate \
     -H "Authorization: Bearer sk-master-1234" \
     -H "Content-Type: application/json" \
     -d '{"team_id": "<team_id_from_step_2>"}'

---

curl -X POST http://localhost:4000/team/update \
     -H "Authorization: Bearer sk-master-1234" \
     -H "Content-Type: application/json" \
     -d '{"team_id": "<team_id>", "max_budget": 0.0}'

---

curl http://localhost:4000/v1/models \
     -H "Authorization: Bearer <team_key>"

---

{
     "error": {
       "message": "Budget has been exceeded! Team=<team_id> Current cost: 0.0, Max budget: 0.0",
       "type": "budget_exceeded",
       "param": null,
       "code": "429"
     }
   }

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Summary

When a team/org/user budget is exhausted in LiteLLM, the proxy returns HTTP 429 Budget Exceeded for model discovery endpoints such as GET /v1/models and GET /models.

What happened?

When a team-level budget (or organization/user budget) is fully exhausted, the LiteLLM proxy returns HTTP 429 Budget Exceeded on the GET /v1/models and GET /models endpoints. This completely blocks model discovery for any client using that team's API key.

Expected behavior

The proxy administrator should be able to control what happens to model listing endpoints when a budget is exceeded. The current behavior (complete block) should not be the only option. Specifically, the admin should be able to choose between:

Return all models — model listing always works; budget is only enforced on actual inference calls
Return only free models — only models with zero input/output cost are returned, so users can still use free models
Block entirely — current behavior (default, for backward compatibility)

Affected Endpoints

Endpoint	Called By	Blocked by team/org/user budget?
`GET /v1/models`	All OpenAI-compatible clients (Open WebUI, Aider, Cursor, Continue.dev, etc.)	YES — blocked
`GET /models`	Same as above (alias)	YES — blocked
`GET /v1/models/{model_id}`	OpenAI SDK `models.retrieve()`	YES — blocked
`GET /models/{model_id}`	Same as above (alias)	YES — blocked
`GET /model/info`	LiteLLM admin dashboard (proprietary, not called by external clients)	YES — blocked
`GET /v1/model/info`	LiteLLM admin dashboard (proprietary)	YES — blocked
`GET /v2/model/info`	LiteLLM admin dashboard (proprietary, beta)	YES — blocked
`GET /model_group/info`	LiteLLM admin dashboard (proprietary)	YES — blocked

Primary concern: /v1/models and /models — these are part of the OpenAI API spec and are the standard model discovery mechanism for all compatible clients.

Secondary concern: /model/info, /v1/model/info, /v2/model/info, /model_group/info — these are LiteLLM-proprietary endpoints used by the admin dashboard UI. Not called by external OpenAI clients, but still unnecessarily blocked.

Impact on OpenAI-Compatible Clients

This is not specific to any single client. Every client that follows the OpenAI API spec calls GET /v1/models for model discovery before allowing users to interact. When this endpoint returns 429, the entire client becomes unusable — not just for paid models, but for free models too.

Examples of affected clients:

Client	Impact
Open WebUI	Model dropdown shows no models. Users cannot select any model, including free ones.
Continue.dev (VS Code)	Model discovery fails silently. No available models shown in configuration.
Aider	Cannot start a session — fails to resolve which models are available.
Cursor	Model picker shows nothing or errors out.
TypingMind / BoltAI	Connection error or empty model list.
LibreChat	Cannot populate model selector.
Any OpenAI SDK client	`openai.models.list()` raises an API error instead of returning the model list.

The core problem: instead of letting users see their models and getting a clear "budget exceeded" error when they attempt to chat, users see a broken interface with no models and no actionable information.

Root Cause Analysis

The issue is an inconsistency in route-level budget check gating within common_checks() in litellm/proxy/auth/auth_checks.py.

Budget checks that already correctly skip `/models` and `/v1/models`:

Check	Location	How it skips
Global proxy budget	`auth_checks.py:320-334`	Explicitly checks `route != "/v1/models" and route != "/models"`
End-user budget	`auth_checks.py:1035-1036`	Checks `RouteChecks.is_info_route(route)` and returns early

Budget checks that block `/models` and `/v1/models` (inconsistent with above):

Check	Location	Issue
Team max budget	`auth_checks.py:3632-3676` (`_team_max_budget_check`)	No route check — blocks all routes including `/models` and `/v1/models`
Team multi-window budget	`auth_checks.py:3679-3709` (`_team_multi_budget_check`)	No route check
Team member budget	`auth_checks.py:3507-3576` (`_check_team_member_budget`)	No route check
Organization budget	`auth_checks.py` (`_organization_max_budget_check`)	No route check
User budget (personal key)	`auth_checks.py:636-653` (inline in `common_checks`)	No route check
Key multi-window budget	`auth_checks.py` (`_virtual_key_multi_budget_check`)	No route check

The inconsistency in code

Global proxy budget — correctly excludes /models and /v1/models:

# auth_checks.py:320-334
def _global_proxy_budget_check(global_proxy_spend, skip_budget_checks, route):
    if (
        litellm.max_budget > 0
        and not skip_budget_checks
        and global_proxy_spend is not None
        and RouteChecks.is_llm_api_route(route=route)
        and route != "/v1/models"      # <-- explicit exclusion
        and route != "/models"          # <-- explicit exclusion
    ):
        ...

End-user budget — correctly excludes info routes (which includes /models and /v1/models):

# auth_checks.py:1035-1036
async def _check_end_user_budget(end_user_obj, route):
    if RouteChecks.is_info_route(route):   # <-- skips /models and /v1/models
        return
    ...

Team budget — no exclusion (the bug):

# auth_checks.py:3632-3676
async def _team_max_budget_check(team_object, valid_token, proxy_logging_obj):
    if team_object is not None and team_object.max_budget is not None:
        spend = await get_current_spend(...)
        if spend > team_object.max_budget:
            raise litellm.BudgetExceededError(...)  # <-- blocks /models AND /v1/models

Workarounds (and why they don't work)

Workaround	Problem
Add `/models` to `general_settings.public_routes`	Requires premium/enterprise license. Also removes all authentication from `/models` and `/v1/models`, bypassing model access controls entirely.
Set very high team budget	Defeats the purpose of budget controls.
Use master key in clients	Bypasses all auth/budget — security risk. Not viable in multi-team setups.

There is no viable workaround today.

Relevant Code References

/models and /v1/models endpoint: litellm/proxy/proxy_server.py:7654-7791
/v1/models/{model_id} endpoint: litellm/proxy/proxy_server.py:7794-7864
/model/info and /v1/model/info endpoint: litellm/proxy/proxy_server.py:11644-11785
Auth entry point: litellm/proxy/auth/user_api_key_auth.py:2082-2153
common_checks(): litellm/proxy/auth/auth_checks.py:460-689
_team_max_budget_check(): litellm/proxy/auth/auth_checks.py:3632-3676
_team_multi_budget_check(): litellm/proxy/auth/auth_checks.py:3679-3709
_check_team_member_budget(): litellm/proxy/auth/auth_checks.py:3507-3576
_global_proxy_budget_check() (correct pattern): litellm/proxy/auth/auth_checks.py:320-334
_check_end_user_budget() (correct pattern): litellm/proxy/auth/auth_checks.py:1021-1056
_is_model_cost_zero(): litellm/proxy/auth/auth_checks.py:122-225
RouteChecks.is_info_route(): litellm/proxy/auth/route_checks.py:411-416
Route classifications (info_routes, openai_routes): litellm/proxy/_types.py

Proposed Solution: Configurable `budget_exceeded_models_policy`

Rather than hard-coding one behavior, introduce a new general_settings configuration option that lets the proxy administrator decide what happens to the /models and /v1/models endpoints when a budget is exceeded.

Configuration

# config.yaml
general_settings:
  master_key: sk-1234
  budget_exceeded_models_policy: "all"  # options: "blocked" | "all" | "free_only"

Value	Behavior	Use case
`"blocked"`	Default. Current behavior — `/models` and `/v1/models` return 429 when budget is exceeded. Fully backward compatible.	Strict environments where over-budget users should not interact with the proxy at all.
`"all"`	`/models` and `/v1/models` always return the full model list regardless of budget status. Budget is only enforced on actual inference calls (`/chat/completions`, `/completions`, `/embeddings`, etc.).	Most common use case — clients remain functional, users see a clear "budget exceeded" error only when they attempt to use a model.
`"free_only"`	`/models` and `/v1/models` return only models with zero input and output cost (using the existing `_is_model_cost_zero()` logic). Paid models are hidden from the list.	Organizations that offer free models (e.g., self-hosted Ollama/vLLM) alongside paid ones and want users to fall back to free models when budget runs out.

Why this fits naturally into the codebase

Follows the established general_settings pattern — Read at runtime with general_settings.get("budget_exceeded_models_policy", "blocked"), same as enforce_user_param, allow_requests_on_db_unavailable, ui_access_mode, etc.
Reuses existing infrastructure:
- The _is_model_cost_zero() function in auth_checks.py:122-225 already handles all edge cases for determining if a model is free (checks both input/output cost, verifies explicit configuration via _is_cost_explicitly_configured(), handles model groups). This is the same function used today to skip budget checks for free models on inference calls.
- The RouteChecks.is_info_route() check is already the pattern used by _check_end_user_budget to skip info routes (which includes /models and /v1/models).
Consistent with existing code patterns:
- _global_proxy_budget_check already explicitly excludes /models and /v1/models
- _check_end_user_budget already skips info routes
- This proposal extends that same pattern to team/org/user/team-member budget checks, but makes it configurable instead of hard-coded

Implementation sketch

In common_checks() (auth_checks.py around line 584):

When budget_exceeded_models_policy is "all" or "free_only", skip budget checks for info routes (same pattern as _check_end_user_budget). Pass a signal through UserAPIKeyAuth (e.g., a budget_exceeded: bool flag) so the /models and /v1/models endpoint knows whether to filter.

In model_list() (proxy_server.py around line 7776):

When the budget_exceeded flag is set and policy is "free_only", filter the model list using _is_model_cost_zero() for each model before building the response — only include models where both input_cost_per_token and output_cost_per_token are explicitly configured as 0.

When policy is "all", no filtering needed — return the full list as normal.

When policy is "blocked" (default), no change — current behavior is preserved.

Example configurations

"I want clients to always work, enforce budget only on inference":

general_settings:
  budget_exceeded_models_policy: "all"

"I want users to fall back to free self-hosted models when budget runs out":

general_settings:
  budget_exceeded_models_policy: "free_only"

"I want strict lockout when budget is exceeded" (current default behavior):

general_settings:
  budget_exceeded_models_policy: "blocked"

Or simply omit the setting — "blocked" is the default for backward compatibility.

Additional Context

LiteLLM version: latest (main branch)
/models and /v1/models are classified as both openai_routes and info_routes in LiteLLMRoutes (litellm/proxy/_types.py)
This issue affects any deployment using team/org/user budgets behind an OpenAI-compatible client
This is a single global setting that applies uniformly to all budget types (team, org, user, team member, key multi-window)
The /model/info and /v1/model/info endpoints (LiteLLM-proprietary, used by admin dashboard) are also blocked, but this is secondary since external clients don't call them

Steps to Reproduce

How to reproduce

Setup

# config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: sk-xxx
  - model_name: free-local-model
    litellm_params:
      model: ollama/llama3
      api_base: http://localhost:11434
      input_cost_per_token: 0
      output_cost_per_token: 0

general_settings:
  master_key: sk-master-1234

Steps

Start LiteLLM proxy:

litellm --config config.yaml --port 4000

Create a team with a small budget:

curl -X POST http://localhost:4000/team/new \
  -H "Authorization: Bearer sk-master-1234" \
  -H "Content-Type: application/json" \
  -d '{"team_alias": "test-team", "max_budget": 0.01}'

Create a key for the team:

curl -X POST http://localhost:4000/key/generate \
  -H "Authorization: Bearer sk-master-1234" \
  -H "Content-Type: application/json" \
  -d '{"team_id": "<team_id_from_step_2>"}'

Exhaust the budget (or set max_budget: 0.0 for immediate reproduction):

curl -X POST http://localhost:4000/team/update \
  -H "Authorization: Bearer sk-master-1234" \
  -H "Content-Type: application/json" \
  -d '{"team_id": "<team_id>", "max_budget": 0.0}'

Try to list models with the team key:

curl http://localhost:4000/v1/models \
  -H "Authorization: Bearer <team_key>"

Actual result — HTTP 429:

{
  "error": {
    "message": "Budget has been exceeded! Team=<team_id> Current cost: 0.0, Max budget: 0.0",
    "type": "budget_exceeded",
    "param": null,
    "code": "429"
  }
}

Expected result — HTTP 200 with model list (all models, or at minimum the free ones).

Relevant log output

What part of LiteLLM is this about?

No response

What LiteLLM version are you on ?

v1.83.3

Twitter / LinkedIn details

https://www.linkedin.com/in/keval-mahajan/

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Return all models — model listing always works; budget is only enforced on actual inference calls
Return only free models — only models with zero input/output cost are returned, so users can still use free models
Block entirely — current behavior (default, for backward compatibility)

#api #ssr #GPU setup #container setup #orchestration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - 💡(How to fix) Fix [Bug]: Prevent budget enforcement from blocking model discovery endpoints (allow accessing free models)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workarounds (and why they don't work)

Code Example

Check for existing issues

What happened?

Summary

What happened?

Expected behavior

Affected Endpoints

Impact on OpenAI-Compatible Clients

Root Cause Analysis

Budget checks that already correctly skip /models and /v1/models:

Budget checks that block /models and /v1/models (inconsistent with above):

The inconsistency in code

Workarounds (and why they don't work)

Relevant Code References

Proposed Solution: Configurable budget_exceeded_models_policy

Configuration

Why this fits naturally into the codebase

Implementation sketch

Example configurations

Additional Context

Steps to Reproduce

How to reproduce

Setup

Steps

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Budget checks that already correctly skip `/models` and `/v1/models`:

Budget checks that block `/models` and `/v1/models` (inconsistent with above):

Proposed Solution: Configurable `budget_exceeded_models_policy`