litellm - 💡(How to fix) Fix [Bug] Virtual keys with tpm/rpm limits leak _litellm_* params into provider API calls, breaking fallback chains

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When a virtual key has tpm_limit or rpm_limit configured, the parallel_request_limiter_v3 pre-call hook injects internal LiteLLM parameters (_litellm_rate_limit_descriptors, _litellm_tpm_reserved_model, _litellm_tpm_reserved_scopes, _litellm_tpm_reserved_tokens) into the request data dictionary. These parameters are NOT filtered out before the request is forwarded to the upstream provider API, causing all strict providers (OpenAI, Anthropic) to reject the request with HTTP 400.

This effectively breaks router_settings.fallbacks feature when virtual keys have rate limits configured — fallback tiers cannot serve any request because LiteLLM's own internal params poison the payload.

Error Message

litellm.RateLimitError: OpenAIException - Unrecognized request arguments supplied: _litellm_rate_limit_descriptors, _litellm_tpm_reserved_model, _litellm_tpm_reserved_scopes, _litellm_tpm_reserved_tokens

Root Cause

This effectively breaks router_settings.fallbacks feature when virtual keys have rate limits configured — fallback tiers cannot serve any request because LiteLLM's own internal params poison the payload.

Fix Action

Workaround

Create virtual keys WITHOUT rate limits:

curl -X POST http://localhost:4000/key/generate \
  -d '{"models": ["..."], "key_alias": "...", "max_budget": 100}'
  # NO tpm_limit, NO rpm_limit

Or update existing keys:

curl -X POST http://localhost:4000/key/update \
  -d '{"key": "<key>", "tpm_limit": null, "rpm_limit": null}'

Trade-off: lose per-key rate limit enforcement at LiteLLM layer. Budget USD enforcement (via separate hook) still works correctly.

Code Example

model_list:
  - model_name: my-model
    litellm_params:
      model: gemini/gemini-2.5-flash-lite
      api_key: os.environ/GEMINI_API_KEY
  - model_name: my-model-fallback
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY

router_settings:
  fallbacks:
    - my-model: ["my-model-fallback"]

general_settings:
  master_key: sk-master-xxx
  database_url: postgres://...

---

curl -X POST http://localhost:4000/key/generate \
  -H "Authorization: Bearer sk-master-xxx" \
  -d '{
    "models": ["my-model", "my-model-fallback"],
    "key_alias": "test-key",
    "max_budget": 100,
    "tpm_limit": 100000,
    "rpm_limit": 1000
  }'

---

curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer <virtual_key_from_step_2>" \
  -d '{"model": "my-model", "messages": [{"role": "user", "content": "hi"}]}'

---

litellm.RateLimitError: OpenAIException - Unrecognized request arguments
supplied: _litellm_rate_limit_descriptors, _litellm_tpm_reserved_model,
_litellm_tpm_reserved_scopes, _litellm_tpm_reserved_tokens

---

litellm.BadRequestError: AnthropicException -
{"type":"invalid_request_error",
"message":"_litellm_rate_limit_descriptors: Extra inputs are not permitted"}

---

data["_litellm_rate_limit_descriptors"] = descriptors

---

curl -X POST http://localhost:4000/key/generate \
  -d '{"models": ["..."], "key_alias": "...", "max_budget": 100}'
  # NO tpm_limit, NO rpm_limit

---

curl -X POST http://localhost:4000/key/update \
  -d '{"key": "<key>", "tpm_limit": null, "rpm_limit": null}'

---

# Before final provider call
data = {k: v for k, v in data.items() if not k.startswith('_litellm_')}
RAW_BUFFERClick to expand / collapse

Description

When a virtual key has tpm_limit or rpm_limit configured, the parallel_request_limiter_v3 pre-call hook injects internal LiteLLM parameters (_litellm_rate_limit_descriptors, _litellm_tpm_reserved_model, _litellm_tpm_reserved_scopes, _litellm_tpm_reserved_tokens) into the request data dictionary. These parameters are NOT filtered out before the request is forwarded to the upstream provider API, causing all strict providers (OpenAI, Anthropic) to reject the request with HTTP 400.

This effectively breaks router_settings.fallbacks feature when virtual keys have rate limits configured — fallback tiers cannot serve any request because LiteLLM's own internal params poison the payload.

Repro steps

  1. Setup proxy with multi-provider config:
model_list:
  - model_name: my-model
    litellm_params:
      model: gemini/gemini-2.5-flash-lite
      api_key: os.environ/GEMINI_API_KEY
  - model_name: my-model-fallback
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY

router_settings:
  fallbacks:
    - my-model: ["my-model-fallback"]

general_settings:
  master_key: sk-master-xxx
  database_url: postgres://...
  1. Create virtual key WITH rate limits:
curl -X POST http://localhost:4000/key/generate \
  -H "Authorization: Bearer sk-master-xxx" \
  -d '{
    "models": ["my-model", "my-model-fallback"],
    "key_alias": "test-key",
    "max_budget": 100,
    "tpm_limit": 100000,
    "rpm_limit": 1000
  }'
  1. Invalidate primary provider (force fallback): set GEMINI_API_KEY to invalid value, restart proxy.

  2. Send request:

curl -X POST http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer <virtual_key_from_step_2>" \
  -d '{"model": "my-model", "messages": [{"role": "user", "content": "hi"}]}'

Expected behavior

LiteLLM router catches Gemini failure (AuthenticationError), cascades to OpenAI fallback, returns successful response.

Actual behavior

Fallback to OpenAI also fails with:

litellm.RateLimitError: OpenAIException - Unrecognized request arguments
supplied: _litellm_rate_limit_descriptors, _litellm_tpm_reserved_model,
_litellm_tpm_reserved_scopes, _litellm_tpm_reserved_tokens

If the chain has a third tier (e.g., Anthropic):

litellm.BadRequestError: AnthropicException -
{"type":"invalid_request_error",
"message":"_litellm_rate_limit_descriptors: Extra inputs are not permitted"}

Root cause (located in source)

litellm/proxy/hooks/parallel_request_limiter_v3.py line 2027:

data["_litellm_rate_limit_descriptors"] = descriptors

This hook runs as pre-call for rate limit enforcement, adding descriptors to the request data dict. The hook is correctly scoped (only runs when tpm/rpm limits are set), but the descriptors leak through to the final API call because they are not stripped before the provider HTTP request is constructed.

drop_params: true in litellm_settings does NOT remove these params because that flag only handles user-provided params unknown to the provider, not internal _litellm_* params injected by hooks.

Why only "strict" providers are affected

  • Google Gemini API: silently ignores unknown params → bug invisible when primary uses Gemini
  • OpenAI API: rejects unknown params with HTTP 400
  • Anthropic API: rejects unknown params with HTTP 400

This explains why the bug only manifests in the fallback chain (where non-primary providers are invoked).

Workaround

Create virtual keys WITHOUT rate limits:

curl -X POST http://localhost:4000/key/generate \
  -d '{"models": ["..."], "key_alias": "...", "max_budget": 100}'
  # NO tpm_limit, NO rpm_limit

Or update existing keys:

curl -X POST http://localhost:4000/key/update \
  -d '{"key": "<key>", "tpm_limit": null, "rpm_limit": null}'

Trade-off: lose per-key rate limit enforcement at LiteLLM layer. Budget USD enforcement (via separate hook) still works correctly.

Suggested fix

In parallel_request_limiter_v3.py, strip all _litellm_* keys from data before passing to the LLM API call. Could be added as a post-hook or directly in the router's _acompletion method:

# Before final provider call
data = {k: v for k, v in data.items() if not k.startswith('_litellm_')}

Environment

  • LiteLLM Proxy: v1.85.0
  • Deployment: Docker ghcr.io/berriai/litellm:main-stable
  • DB: PostgreSQL 16
  • Affected providers: OpenAI (gpt-4o-mini), Anthropic (claude-haiku-4-5-20251001)
  • Unaffected providers: Google (gemini-2.5-flash-lite)

Impact

Anyone using LiteLLM as a multi-provider router with both:

  1. Virtual keys for per-tenant rate limiting (TPM/RPM)
  2. Fallback chains for high availability

...has fallback chain effectively non-functional. Failures will only surface when primary provider has downtime, which makes this bug particularly insidious (passes all happy-path tests).


Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

LiteLLM router catches Gemini failure (AuthenticationError), cascades to OpenAI fallback, returns successful response.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug] Virtual keys with tpm/rpm limits leak _litellm_* params into provider API calls, breaking fallback chains