litellm - ✅(Solved) Fix [Bug] custom_llm_provider not propagated to budget_limiter.async_log_success_event for /v1/messages + /v1/embeddings [1 pull requests, 1 participants]

nucocloud · 2026-04-28T13:53:23Z

[litellm] PR 553: Add LiteLLM section to Other group 3 alerting rules - Repository: samber/awesome-prometheus-alerts - Author: nucocloud - State: open | merged… # PR #553: Add LiteLLM section to Other group (3 alerting rules) - Repository: samber/awesome-prometheus-alerts - Author: nucocloud - State: open | merged: False - Link: https://github.com/samber/awesome-prometheus-alerts/pull/553 ## Description (problem / solution / changelog) ## Context [LiteLLM](https://github.com/BerriAI/litellm) is a widely-used LLM-gateway/proxy that exposes Prometheus metrics via its built-in callback. Currently there's no LiteLLM section in this repo, despite its adoption as an OpenAI/Anthropic-compatible proxy in many production stacks. ## What this PR adds 3 alerting rules under a new `LiteLLM` service in the `Other` group: 1. **LiteLLM provider spend over budget** — soft-warning on cumulative 24h spend per model-name regex. Useful when LiteLLM's native `provider_budget_config` hard-cap is unavailable, disabled, or buggy (we hit such a bug, see [BerriAI/litellm#26701](https://github.com/BerriAI/litellm/issues/26701)). 2. **LiteLLM proxy failed requests rate high** — error-rate ratio alert for downstream LLM provider availability/auth issues. 3. **LiteLLM request latency p95 high** — histogram-quantile alert for downstream provider response-time degradation. ## Validation - All 3 rules: `promtool check rules` returns `SUCCESS: 3 rules found`. - Validated on a real LiteLLM v1.83.7 production deployment. - The spend rule (`AnthropicSpend24hOverBudget` in our deployment) was end-to-end tested via real haiku-call → alert fires → Telegram-routed → resolved post-revert. ## Notes for reviewers - The `(claude-|anthropic/).*` regex in the spend-rule example is just one provider-pattern; users will customize for their own providers (`openai-`, `gpt-`, `gemini-`, etc.). The description explicitly notes this. - The spend-counter has a known first-value-problem on brand-new series (PromQL `increase()` needs ≥2 datapoints with growth-difference). Documented in the rule's `comments:` field. - All 3 metrics (`litellm_spend_metric_total`, `litellm_proxy_failed_requests_metric_total`, `litellm_proxy_total_requests_metric_total`, `litellm_request_total_latency_metric_bucket`) are exposed by LiteLLM's built-in `prometheus` callback (no separate exporter needed). ## Reference - LiteLLM Prometheus docs: https://docs.litellm.ai/docs/proxy/prometheus ## Changed files - `_data/rules.yml` (modified, +25/-0) ## Workaround Disable `provider_budget_config` entirely. This loses the hard-cap protection but stops the spam. We replaced it with a Prometheus alert on `litellm_spend_metric_total` as a soft-warning fallback. ## Bug When `provider_budget_config` is enabled (e.g., `anthropic: 5.0/24h`), every call to `/v1/messages` (Anthropic format) and `/v1/embeddings` triggers a `ValueError` in the budget-limiter callback. Calls succeed (200 OK), but stderr floods with traceback from `budget_limiter.async_log_success_event`. ## Reproducer LiteLLM v1.83.7, config: ```yaml litellm_settings: callbacks: ["prometheus"] provider_budget_config: anthropic: budget_limit: 5.0 time_period: "24h" model_list: - model_name: claude-haiku-4-5-direct-anthropic litellm_params: model: anthropic/claude-haiku-4-5-20251001 api_key: os.environ/ANTHROPIC_API_KEY ``` Then call: ```bash curl -X POST http://litellm:4000/v1/messages \ -H "x-api-key: $LITELLM_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "Content-Type: application/json" \ -d '{"model":"claude-haiku-4-5-direct-anthropic","max_tokens":10,"messages":[{"role":"user","content":"hi"}]}' ``` Returns 200 + valid response, BUT stderr emits ValueError from `router_strategy/budget_limiter.py` complaining about missing `custom_llm_provider` in the kwargs/data dict. Same behavior for `/v1/embeddings` calls. ## Frequency In our production deployment: **306 ValueError tracebacks / 2 hours** during normal operation (call volume ~100 req/h split between `/v1/messages` and `/v1/embeddings`). ## Workaround Disable `provider_budget_config` entirely. This loses the hard-cap protection but stops the spam. We replaced it with a Prometheus alert on `litellm_spend_metric_total` as a soft-warning fallback. ## Root-cause hypothesis The `provider_budget_config` callback `async_log_success_event` reads `custom_llm_provider` from `data` (or kwargs), but the request-routing layer for `/v1/messages` and `/v1/embeddings` does NOT inject `custom_llm_provider` into the kwargs the way `/v1/chat/completions` does. We tried adding `custom_llm_provider:` under `litellm_params:` in YAML — wirkungslos (LiteLLM reads `data.get` at deployment-top-level, not from litellm_params). ## Distinct from existing issues - #24770 (UI lets model-names without provider/-prefix → budget tracking fails) — our model-config has the `anthropic/`-prefix correctly, bug appears on every call regardless of UI involvement. - #4849 (counter resets on restart) — different problem. - #1

litellm2026-04-28 13:53:23

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26701•Fetched 2026-04-29 06:12:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

nucocloud

Participants

nucocloud

Timeline (top)

cross-referenced ×1labeled ×1

Error Message

When provider_budget_config is enabled (e.g., anthropic: 5.0/24h), every call to /v1/messages (Anthropic format) and /v1/embeddings triggers a ValueError in the budget-limiter callback. Calls succeed (200 OK), but stderr floods with traceback from budget_limiter.async_log_success_event.

Fix Action

Workaround

Disable provider_budget_config entirely. This loses the hard-cap protection but stops the spam. We replaced it with a Prometheus alert on litellm_spend_metric_total as a soft-warning fallback.

PR fix notes

PR #553: Add LiteLLM section to Other group (3 alerting rules)

Repository: samber/awesome-prometheus-alerts
Author: nucocloud
State: open | merged: False
Link: https://github.com/samber/awesome-prometheus-alerts/pull/553

Description (problem / solution / changelog)

Context

LiteLLM is a widely-used LLM-gateway/proxy that exposes Prometheus metrics via its built-in callback. Currently there's no LiteLLM section in this repo, despite its adoption as an OpenAI/Anthropic-compatible proxy in many production stacks.

What this PR adds

3 alerting rules under a new LiteLLM service in the Other group:

LiteLLM provider spend over budget — soft-warning on cumulative 24h spend per model-name regex. Useful when LiteLLM's native provider_budget_config hard-cap is unavailable, disabled, or buggy (we hit such a bug, see BerriAI/litellm#26701).
LiteLLM proxy failed requests rate high — error-rate ratio alert for downstream LLM provider availability/auth issues.
LiteLLM request latency p95 high — histogram-quantile alert for downstream provider response-time degradation.

Validation

All 3 rules: promtool check rules returns SUCCESS: 3 rules found.
Validated on a real LiteLLM v1.83.7 production deployment.
The spend rule (AnthropicSpend24hOverBudget in our deployment) was end-to-end tested via real haiku-call → alert fires → Telegram-routed → resolved post-revert.

Notes for reviewers

The (claude-|anthropic/).* regex in the spend-rule example is just one provider-pattern; users will customize for their own providers (openai-, gpt-, gemini-, etc.). The description explicitly notes this.
The spend-counter has a known first-value-problem on brand-new series (PromQL increase() needs ≥2 datapoints with growth-difference). Documented in the rule's comments: field.
All 3 metrics (litellm_spend_metric_total, litellm_proxy_failed_requests_metric_total, litellm_proxy_total_requests_metric_total, litellm_request_total_latency_metric_bucket) are exposed by LiteLLM's built-in prometheus callback (no separate exporter needed).

Reference

LiteLLM Prometheus docs: https://docs.litellm.ai/docs/proxy/prometheus

Changed files

_data/rules.yml (modified, +25/-0)

Code Example

litellm_settings:
  callbacks: ["prometheus"]

provider_budget_config:
  anthropic:
    budget_limit: 5.0
    time_period: "24h"

model_list:
  - model_name: claude-haiku-4-5-direct-anthropic
    litellm_params:
      model: anthropic/claude-haiku-4-5-20251001
      api_key: os.environ/ANTHROPIC_API_KEY

---

curl -X POST http://litellm:4000/v1/messages \
  -H "x-api-key: $LITELLM_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-haiku-4-5-direct-anthropic","max_tokens":10,"messages":[{"role":"user","content":"hi"}]}'

RAW_BUFFERClick to expand / collapse

Bug

Reproducer

LiteLLM v1.83.7, config:

litellm_settings:
  callbacks: ["prometheus"]

provider_budget_config:
  anthropic:
    budget_limit: 5.0
    time_period: "24h"

model_list:
  - model_name: claude-haiku-4-5-direct-anthropic
    litellm_params:
      model: anthropic/claude-haiku-4-5-20251001
      api_key: os.environ/ANTHROPIC_API_KEY

Then call:

curl -X POST http://litellm:4000/v1/messages \
  -H "x-api-key: $LITELLM_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-haiku-4-5-direct-anthropic","max_tokens":10,"messages":[{"role":"user","content":"hi"}]}'

Returns 200 + valid response, BUT stderr emits ValueError from router_strategy/budget_limiter.py complaining about missing custom_llm_provider in the kwargs/data dict.

Same behavior for /v1/embeddings calls.

Frequency

In our production deployment: 306 ValueError tracebacks / 2 hours during normal operation (call volume ~100 req/h split between /v1/messages and /v1/embeddings).

Workaround

Disable provider_budget_config entirely. This loses the hard-cap protection but stops the spam. We replaced it with a Prometheus alert on litellm_spend_metric_total as a soft-warning fallback.

Root-cause hypothesis

The provider_budget_config callback async_log_success_event reads custom_llm_provider from data (or kwargs), but the request-routing layer for /v1/messages and /v1/embeddings does NOT inject custom_llm_provider into the kwargs the way /v1/chat/completions does. We tried adding custom_llm_provider: under litellm_params: in YAML — wirkungslos (LiteLLM reads data.get at deployment-top-level, not from litellm_params).

Distinct from existing issues

#24770 (UI lets model-names without provider/-prefix → budget tracking fails) — our model-config has the anthropic/-prefix correctly, bug appears on every call regardless of UI involvement.
#4849 (counter resets on restart) — different problem.
#19929 (counter +2 instead of +1) — different problem.
#17415 (Bedrock metrics not updating) — different problem.

Environment

LiteLLM proxy v1.83.7 (latest stable)
Deployment via systemd on Ubuntu 24.04
Python 3.12
Anthropic provider via anthropic/ prefix routing

extent analysis

TL;DR

The most likely fix is to modify the budget_limiter.async_log_success_event callback to handle cases where custom_llm_provider is missing from the kwargs.

Guidance

Verify that the custom_llm_provider key is indeed missing from the kwargs passed to budget_limiter.async_log_success_event by adding a debug log statement before the line that raises the ValueError.
Check the request-routing layer for /v1/messages and /v1/embeddings to see why custom_llm_provider is not being injected into the kwargs, and modify it to include this key if necessary.
Consider adding a default value or a fallback mechanism in budget_limiter.async_log_success_event to handle cases where custom_llm_provider is missing.
Review the litellm_params configuration to ensure that custom_llm_provider is not being overridden or ignored.

Example

# In budget_limiter.py
def async_log_success_event(self, **kwargs):
    custom_llm_provider = kwargs.get('custom_llm_provider')
    if custom_llm_provider is None:
        # Handle the case where custom_llm_provider is missing
        print("Warning: custom_llm_provider is missing from kwargs")
        # Add a default value or fallback mechanism here
    # Rest of the function remains the same

Notes

The provided workaround of disabling provider_budget_config entirely may not be desirable as it loses the hard-cap protection. The suggested modifications to budget_limiter.async_log_success_event should be tested thoroughly to ensure they do not introduce any new issues.

Recommendation

Apply a workaround by modifying the budget_limiter.async_log_success_event callback to handle missing custom_llm_provider keys, as this is a more targeted solution that addresses the root cause of the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #integration issue #index setup #retrieval issue #search optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug] custom_llm_provider not propagated to budget_limiter.async_log_success_event for /v1/messages + /v1/embeddings [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Workaround

PR fix notes

PR #553: Add LiteLLM section to Other group (3 alerting rules)

Description (problem / solution / changelog)

Context

What this PR adds

Validation

Notes for reviewers

Reference

Changed files

Code Example

Bug

Reproducer

Frequency

Workaround

Root-cause hypothesis

Distinct from existing issues

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING