litellm - 💡(How to fix) Fix [Bug]: Rate limit error message body leaks full SHA-256 token hash on 429 responses [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#27884Fetched 2026-05-14 03:29:59
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
labeled ×2

When the parallel request limiter returns a 429 response, the JSON error body includes the full 64-character SHA-256 hash of the offending virtual key in the error.message field. This identifier is then visible to any HTTP client that hits the rate limit, including end users / customers of the proxy.

redact_user_api_key_info: True in litellm_settings does not affect this code path — only Langfuse callback metadata and a few other surfaces.

Error Message

detail = ( f"Rate limit exceeded for {descriptor_key}: {descriptor_value}. " f"Limit type: {rate_limit_type}. " f"Current limit: {current_limit}, Remaining: {remaining_display}. " f"Limit resets at: {reset_time_formatted}" )

raise HTTPException( status_code=429, detail=detail, headers={ "retry-after": str(self.window_size), ... }, )

Root Cause

While the hash cannot be reversed to obtain the original key, exposing it in a customer-facing response body has real downsides:

  • Cross-request correlation. A third party that intercepts or aggregates 429s across customers can fingerprint which key is hitting limits.
  • Information disclosure about internal structure. Customers and integrators learn that LiteLLM stores keys as SHA-256 hashes — useful recon for an attacker.
  • Surprise vs redact_user_api_key_info: True. Users who set that flag reasonably expect "api_key info" to be redacted in user-visible surfaces, but the flag does not cover this path. The behavior is silently inconsistent.
  • Existing infrastructure. Internal logs of this exception already display the value as REDACTED when redact_user_api_key_info is set, suggesting the redaction logic exists and just isn't applied here.

I considered using a CustomLogger.async_post_call_failure_hook to rewrite the body, but the rate limiter raises HTTPException directly from the pre-call path, so failure hooks never see it.

Code Example

detail = (
    f"Rate limit exceeded for {descriptor_key}: {descriptor_value}. "
    f"Limit type: {rate_limit_type}. "
    f"Current limit: {current_limit}, Remaining: {remaining_display}. "
    f"Limit resets at: {reset_time_formatted}"
)

raise HTTPException(
    status_code=429,
    detail=detail,
    headers={
        "retry-after": str(self.window_size),
        ...
    },
)

---

curl -s -X POST -H "Authorization: Bearer $MASTER_KEY" -H "Content-Type: application/json" \
     "http://localhost:4000/key/generate" \
     -d '{"models":["my-model"],"rpm_limit":5,"tpm_limit":1000,"key_alias":"ratelimit-test"}'

---

VKEY="sk-..."
   for i in $(seq 1 15); do
     curl -s -o /tmp/r_$i.json -w "%{http_code} " \
       -X POST http://localhost:4000/v1/chat/completions \
       -H "Authorization: Bearer $VKEY" -H "Content-Type: application/json" \
       -d '{"model":"my-model","messages":[{"role":"user","content":"hi"}],"max_tokens":5}' &
   done
   wait
   cat /tmp/r_1.json  # or whichever 429'd

---

{
     "error": {
       "message": "Rate limit exceeded for api_key: 523544f141d47ff188ff366337ddd3c9b44968b565d83a1c9b6fa56c543d3042. Limit type: requests. Current limit: 5, Remaining: 0. Limit resets at: 2026-05-13 18:25:44 UTC",
       "type": "None",
       "param": "None",
       "code": "429"
     }
   }

---

def _safe_descriptor_value(key: str, value: str) -> str:
    if key == "api_key" and len(value) >= 16:
        return f"{value[:8]}…"  # first 8 chars only, enough for support debugging
    return value

---

from litellm import redact_user_api_key_info as _redact_flag

descriptor_display = (
    "REDACTED"
    if (descriptor_key == "api_key" and _redact_flag)
    else descriptor_value
)

---

curl -s -X POST -H "Authorization: Bearer $MASTER_KEY" -H "Content-Type: application/json" \
     "http://localhost:4000/key/generate" \
     -d '{"models":["my-model"],"rpm_limit":5,"tpm_limit":1000,"key_alias":"ratelimit-test"}'

---

VKEY="sk-..."
   for i in $(seq 1 15); do
     curl -s -o /tmp/r_$i.json -w "%{http_code} " \
       -X POST http://localhost:4000/v1/chat/completions \
       -H "Authorization: Bearer $VKEY" -H "Content-Type: application/json" \
       -d '{"model":"my-model","messages":[{"role":"user","content":"hi"}],"max_tokens":5}' &
   done
   wait
   cat /tmp/r_1.json  # or whichever 429'd

---

{
     "error": {
       "message": "Rate limit exceeded for api_key: 523544f141d47ff188ff366337ddd3c9b44968b565d83a1c9b6fa56c543d3042. Limit type: requests. Current limit: 5, Remaining: 0. Limit resets at: 2026-05-13 18:25:44 UTC",
       "type": "None",
       "param": "None",
       "code": "429"
     }
   }

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Summary

When the parallel request limiter returns a 429 response, the JSON error body includes the full 64-character SHA-256 hash of the offending virtual key in the error.message field. This identifier is then visible to any HTTP client that hits the rate limit, including end users / customers of the proxy.

redact_user_api_key_info: True in litellm_settings does not affect this code path — only Langfuse callback metadata and a few other surfaces.

Source

litellm/proxy/hooks/parallel_request_limiter_v3.py, around line 1261 (in litellm latest as of 2026-05-13 via ghcr.io/berriai/litellm:main-stable):

detail = (
    f"Rate limit exceeded for {descriptor_key}: {descriptor_value}. "
    f"Limit type: {rate_limit_type}. "
    f"Current limit: {current_limit}, Remaining: {remaining_display}. "
    f"Limit resets at: {reset_time_formatted}"
)

raise HTTPException(
    status_code=429,
    detail=detail,
    headers={
        "retry-after": str(self.window_size),
        ...
    },
)

When descriptor_key == "api_key", the descriptor_value is the full token hash (the token field returned by /key/generate, i.e. SHA-256 of the raw sk-... key).

Reproduction

  1. Generate a virtual key with a low rate limit:
   curl -s -X POST -H "Authorization: Bearer $MASTER_KEY" -H "Content-Type: application/json" \
     "http://localhost:4000/key/generate" \
     -d '{"models":["my-model"],"rpm_limit":5,"tpm_limit":1000,"key_alias":"ratelimit-test"}'
  1. Exceed the rate limit with parallel requests:
   VKEY="sk-..."
   for i in $(seq 1 15); do
     curl -s -o /tmp/r_$i.json -w "%{http_code} " \
       -X POST http://localhost:4000/v1/chat/completions \
       -H "Authorization: Bearer $VKEY" -H "Content-Type: application/json" \
       -d '{"model":"my-model","messages":[{"role":"user","content":"hi"}],"max_tokens":5}' &
   done
   wait
   cat /tmp/r_1.json  # or whichever 429'd
  1. Observe the 429 response body:
   {
     "error": {
       "message": "Rate limit exceeded for api_key: 523544f141d47ff188ff366337ddd3c9b44968b565d83a1c9b6fa56c543d3042. Limit type: requests. Current limit: 5, Remaining: 0. Limit resets at: 2026-05-13 18:25:44 UTC",
       "type": "None",
       "param": "None",
       "code": "429"
     }
   }

The 64-char hex string is the SHA-256 hash of the virtual key.

Why this matters

While the hash cannot be reversed to obtain the original key, exposing it in a customer-facing response body has real downsides:

  • Cross-request correlation. A third party that intercepts or aggregates 429s across customers can fingerprint which key is hitting limits.
  • Information disclosure about internal structure. Customers and integrators learn that LiteLLM stores keys as SHA-256 hashes — useful recon for an attacker.
  • Surprise vs redact_user_api_key_info: True. Users who set that flag reasonably expect "api_key info" to be redacted in user-visible surfaces, but the flag does not cover this path. The behavior is silently inconsistent.
  • Existing infrastructure. Internal logs of this exception already display the value as REDACTED when redact_user_api_key_info is set, suggesting the redaction logic exists and just isn't applied here.

I considered using a CustomLogger.async_post_call_failure_hook to rewrite the body, but the rate limiter raises HTTPException directly from the pre-call path, so failure hooks never see it.

Suggested fix

When constructing the detail string at the cited line, sanitize the descriptor value for sensitive descriptor keys. Minimum:

def _safe_descriptor_value(key: str, value: str) -> str:
    if key == "api_key" and len(value) >= 16:
        return f"{value[:8]}…"  # first 8 chars only, enough for support debugging
    return value

Or, more aggressively, respect litellm.redact_user_api_key_info:

from litellm import redact_user_api_key_info as _redact_flag

descriptor_display = (
    "REDACTED"
    if (descriptor_key == "api_key" and _redact_flag)
    else descriptor_value
)

Happy to send a PR if a maintainer agrees on the preferred form.

Environment

  • LiteLLM image: ghcr.io/berriai/litellm:main-stable (pulled 2026-05-13)
  • Python 3.13
  • Deployment: Docker, Postgres backend, Langfuse v3 callbacks
  • Config has redact_user_api_key_info: True set in litellm_settings

Steps to Reproduce

  1. Generate a virtual key with a low rate limit:
   curl -s -X POST -H "Authorization: Bearer $MASTER_KEY" -H "Content-Type: application/json" \
     "http://localhost:4000/key/generate" \
     -d '{"models":["my-model"],"rpm_limit":5,"tpm_limit":1000,"key_alias":"ratelimit-test"}'
  1. Exceed the rate limit with parallel requests:
   VKEY="sk-..."
   for i in $(seq 1 15); do
     curl -s -o /tmp/r_$i.json -w "%{http_code} " \
       -X POST http://localhost:4000/v1/chat/completions \
       -H "Authorization: Bearer $VKEY" -H "Content-Type: application/json" \
       -d '{"model":"my-model","messages":[{"role":"user","content":"hi"}],"max_tokens":5}' &
   done
   wait
   cat /tmp/r_1.json  # or whichever 429'd
  1. Observe the 429 response body:
   {
     "error": {
       "message": "Rate limit exceeded for api_key: 523544f141d47ff188ff366337ddd3c9b44968b565d83a1c9b6fa56c543d3042. Limit type: requests. Current limit: 5, Remaining: 0. Limit resets at: 2026-05-13 18:25:44 UTC",
       "type": "None",
       "param": "None",
       "code": "429"
     }
   }

The 64-char hex string is the SHA-256 hash of the virtual key.

Relevant log output

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

1.83.10

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Rate limit error message body leaks full SHA-256 token hash on 429 responses [1 participants]