litellm - 💡(How to fix) Fix [Bug] Virtual keys with tpm/rpm limits leak _litellm_* params into provider API calls, breaking fallback chains

When a virtual key has tpm_limit or rpm_limit configured, the parallel_request_limiter_v3 pre-call hook injects internal LiteLLM parameters (_litellm_rate_limit_descriptors, _litellm_tpm_reserved_model, _litellm_tpm_reserved_scopes, _litellm_tpm_reserved_tokens) into the request data dictionary. These parameters are NOT filtered out before the request is forwarded to the upstream provider API, causing all strict providers (OpenAI, Anthropic) to reject the request with HTTP 400.

This effectively breaks router_settings.fallbacks feature when virtual keys have rate limits configured — fallback tiers cannot serve any request because LiteLLM's own internal params poison the payload.

Fix Action

Workaround

Create virtual keys WITHOUT rate limits:

curl -X POST http://localhost:4000/key/generate \
  -d '{"models": ["..."], "key_alias": "...", "max_budget": 100}'
  # NO tpm_limit, NO rpm_limit

Or update existing keys:

curl -X POST http://localhost:4000/key/update \
  -d '{"key": "<key>", "tpm_limit": null, "rpm_limit": null}'

Trade-off: lose per-key rate limit enforcement at LiteLLM layer. Budget USD enforcement (via separate hook) still works correctly.

Code Example

model_list:
  - model_name: my-model
    litellm_params:
      model: gemini/gemini-2.5-flash-lite
      api_key: os.environ/GEMINI_API_KEY
  - model_name: my-model-fallback
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY

router_settings:
  fallbacks:
    - my-model: ["my-model-fallback"]

general_settings:
  master_key: sk-master-xxx
  database_url: postgres://...

---

curl -X POST http://localhost:4000/key/generate \
  -H "Authorization: Bearer sk-master-xxx" \
  -d '{
    "models": ["my-model", "my-model-fallback"],
    "key_alias": "test-key",
    "max_budget": 100,
    "tpm_limit": 100000,
    "rpm_limit": 1000
  }'

---

curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer <virtual_key_from_step_2>" \
  -d '{"model": "my-model", "messages": [{"role": "user", "content": "hi"}]}'

---

litellm.RateLimitError: OpenAIException - Unrecognized request arguments
supplied: _litellm_rate_limit_descriptors, _litellm_tpm_reserved_model,
_litellm_tpm_reserved_scopes, _litellm_tpm_reserved_tokens

---

litellm.BadRequestError: AnthropicException -
{"type":"invalid_request_error",
"message":"_litellm_rate_limit_descriptors: Extra inputs are not permitted"}

---

data["_litellm_rate_limit_descriptors"] = descriptors

---

curl -X POST http://localhost:4000/key/generate \
  -d '{"models": ["..."], "key_alias": "...", "max_budget": 100}'
  # NO tpm_limit, NO rpm_limit

---

curl -X POST http://localhost:4000/key/update \
  -d '{"key": "<key>", "tpm_limit": null, "rpm_limit": null}'

---

# Before final provider call
data = {k: v for k, v in data.items() if not k.startswith('_litellm_')}

Description

Repro steps

Setup proxy with multi-provider config:

model_list:
  - model_name: my-model
    litellm_params:
      model: gemini/gemini-2.5-flash-lite
      api_key: os.environ/GEMINI_API_KEY
  - model_name: my-model-fallback
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY

router_settings:
  fallbacks:
    - my-model: ["my-model-fallback"]

general_settings:
  master_key: sk-master-xxx
  database_url: postgres://...

Create virtual key WITH rate limits:

curl -X POST http://localhost:4000/key/generate \
  -H "Authorization: Bearer sk-master-xxx" \
  -d '{
    "models": ["my-model", "my-model-fallback"],
    "key_alias": "test-key",
    "max_budget": 100,
    "tpm_limit": 100000,
    "rpm_limit": 1000
  }'

Invalidate primary provider (force fallback): set GEMINI_API_KEY to invalid value, restart proxy.
Send request:

curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer <virtual_key_from_step_2>" \
  -d '{"model": "my-model", "messages": [{"role": "user", "content": "hi"}]}'

Expected behavior

LiteLLM router catches Gemini failure (AuthenticationError), cascades to OpenAI fallback, returns successful response.

Actual behavior

Fallback to OpenAI also fails with:

litellm.RateLimitError: OpenAIException - Unrecognized request arguments
supplied: _litellm_rate_limit_descriptors, _litellm_tpm_reserved_model,
_litellm_tpm_reserved_scopes, _litellm_tpm_reserved_tokens

If the chain has a third tier (e.g., Anthropic):

litellm.BadRequestError: AnthropicException -
{"type":"invalid_request_error",
"message":"_litellm_rate_limit_descriptors: Extra inputs are not permitted"}

Root cause (located in source)

litellm/proxy/hooks/parallel_request_limiter_v3.py line 2027:

data["_litellm_rate_limit_descriptors"] = descriptors

This hook runs as pre-call for rate limit enforcement, adding descriptors to the request data dict. The hook is correctly scoped (only runs when tpm/rpm limits are set), but the descriptors leak through to the final API call because they are not stripped before the provider HTTP request is constructed.

drop_params: true in litellm_settings does NOT remove these params because that flag only handles user-provided params unknown to the provider, not internal _litellm_* params injected by hooks.

Why only "strict" providers are affected

Google Gemini API: silently ignores unknown params → bug invisible when primary uses Gemini
OpenAI API: rejects unknown params with HTTP 400
Anthropic API: rejects unknown params with HTTP 400

This explains why the bug only manifests in the fallback chain (where non-primary providers are invoked).

Workaround

Create virtual keys WITHOUT rate limits:

curl -X POST http://localhost:4000/key/generate \
  -d '{"models": ["..."], "key_alias": "...", "max_budget": 100}'
  # NO tpm_limit, NO rpm_limit

Or update existing keys:

curl -X POST http://localhost:4000/key/update \
  -d '{"key": "<key>", "tpm_limit": null, "rpm_limit": null}'

Trade-off: lose per-key rate limit enforcement at LiteLLM layer. Budget USD enforcement (via separate hook) still works correctly.

Suggested fix

In parallel_request_limiter_v3.py, strip all _litellm_* keys from data before passing to the LLM API call. Could be added as a post-hook or directly in the router's _acompletion method:

# Before final provider call
data = {k: v for k, v in data.items() if not k.startswith('_litellm_')}

Environment

LiteLLM Proxy: v1.85.0
Deployment: Docker ghcr.io/berriai/litellm:main-stable
DB: PostgreSQL 16
Affected providers: OpenAI (gpt-4o-mini), Anthropic (claude-haiku-4-5-20251001)
Unaffected providers: Google (gemini-2.5-flash-lite)

Impact

Anyone using LiteLLM as a multi-provider router with both:

Virtual keys for per-tenant rate limiting (TPM/RPM)
Fallback chains for high availability

...has fallback chain effectively non-functional. Failures will only surface when primary provider has downtime, which makes this bug particularly insidious (passes all happy-path tests).

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug] Virtual keys with tpm/rpm limits leak _litellm_* params into provider API calls, breaking fallback chains

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Description

Repro steps

Expected behavior

Actual behavior

Root cause (located in source)

Why only "strict" providers are affected

Workaround

Suggested fix

Environment

Impact

FAQ

Expected behavior

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug] Virtual keys with tpm/rpm limits leak _litellm_* params into provider API calls, breaking fallback chains

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Description

Repro steps

Expected behavior

Actual behavior

Root cause (located in source)

Why only "strict" providers are affected

Workaround

Suggested fix

Environment

Impact

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING