litellm - 💡(How to fix) Fix [Bug]: Rate limit error message body leaks full SHA-256 token hash on 429 responses [1 participants]

litellm2026-05-14 00:02:29

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#27884•Fetched 2026-05-14 03:29:59

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jamiemardis

Participants

jamiemardis

Timeline (top)

labeled ×2

When the parallel request limiter returns a 429 response, the JSON error body includes the full 64-character SHA-256 hash of the offending virtual key in the error.message field. This identifier is then visible to any HTTP client that hits the rate limit, including end users / customers of the proxy.

redact_user_api_key_info: True in litellm_settings does not affect this code path — only Langfuse callback metadata and a few other surfaces.

Error Message

detail = ( f"Rate limit exceeded for {descriptor_key}: {descriptor_value}. " f"Limit type: {rate_limit_type}. " f"Current limit: {current_limit}, Remaining: {remaining_display}. " f"Limit resets at: {reset_time_formatted}" )

raise HTTPException( status_code=429, detail=detail, headers={ "retry-after": str(self.window_size), ... }, )

Root Cause

While the hash cannot be reversed to obtain the original key, exposing it in a customer-facing response body has real downsides:

Cross-request correlation. A third party that intercepts or aggregates 429s across customers can fingerprint which key is hitting limits.
Information disclosure about internal structure. Customers and integrators learn that LiteLLM stores keys as SHA-256 hashes — useful recon for an attacker.
Surprise vs redact_user_api_key_info: True. Users who set that flag reasonably expect "api_key info" to be redacted in user-visible surfaces, but the flag does not cover this path. The behavior is silently inconsistent.
Existing infrastructure. Internal logs of this exception already display the value as REDACTED when redact_user_api_key_info is set, suggesting the redaction logic exists and just isn't applied here.

I considered using a CustomLogger.async_post_call_failure_hook to rewrite the body, but the rate limiter raises HTTPException directly from the pre-call path, so failure hooks never see it.

Code Example

detail = (
    f"Rate limit exceeded for {descriptor_key}: {descriptor_value}. "
    f"Limit type: {rate_limit_type}. "
    f"Current limit: {current_limit}, Remaining: {remaining_display}. "
    f"Limit resets at: {reset_time_formatted}"
)

raise HTTPException(
    status_code=429,
    detail=detail,
    headers={
        "retry-after": str(self.window_size),
        ...
    },
)

---

curl -s -X POST -H "Authorization: Bearer $MASTER_KEY" -H "Content-Type: application/json" \
     "http://localhost:4000/key/generate" \
     -d '{"models":["my-model"],"rpm_limit":5,"tpm_limit":1000,"key_alias":"ratelimit-test"}'

---

VKEY="sk-..."
   for i in $(seq 1 15); do
     curl -s -o /tmp/r_$i.json -w "%{http_code} " \
       -X POST http://localhost:4000/v1/chat/completions \
       -H "Authorization: Bearer $VKEY" -H "Content-Type: application/json" \
       -d '{"model":"my-model","messages":[{"role":"user","content":"hi"}],"max_tokens":5}' &
   done
   wait
   cat /tmp/r_1.json  # or whichever 429'd

---

{
     "error": {
       "message": "Rate limit exceeded for api_key: 523544f141d47ff188ff366337ddd3c9b44968b565d83a1c9b6fa56c543d3042. Limit type: requests. Current limit: 5, Remaining: 0. Limit resets at: 2026-05-13 18:25:44 UTC",
       "type": "None",
       "param": "None",
       "code": "429"
     }
   }

---

def _safe_descriptor_value(key: str, value: str) -> str:
    if key == "api_key" and len(value) >= 16:
        return f"{value[:8]}…"  # first 8 chars only, enough for support debugging
    return value

---

from litellm import redact_user_api_key_info as _redact_flag

descriptor_display = (
    "REDACTED"
    if (descriptor_key == "api_key" and _redact_flag)
    else descriptor_value
)

---

curl -s -X POST -H "Authorization: Bearer $MASTER_KEY" -H "Content-Type: application/json" \
     "http://localhost:4000/key/generate" \
     -d '{"models":["my-model"],"rpm_limit":5,"tpm_limit":1000,"key_alias":"ratelimit-test"}'

---

VKEY="sk-..."
   for i in $(seq 1 15); do
     curl -s -o /tmp/r_$i.json -w "%{http_code} " \
       -X POST http://localhost:4000/v1/chat/completions \
       -H "Authorization: Bearer $VKEY" -H "Content-Type: application/json" \
       -d '{"model":"my-model","messages":[{"role":"user","content":"hi"}],"max_tokens":5}' &
   done
   wait
   cat /tmp/r_1.json  # or whichever 429'd

---

{
     "error": {
       "message": "Rate limit exceeded for api_key: 523544f141d47ff188ff366337ddd3c9b44968b565d83a1c9b6fa56c543d3042. Limit type: requests. Current limit: 5, Remaining: 0. Limit resets at: 2026-05-13 18:25:44 UTC",
       "type": "None",
       "param": "None",
       "code": "429"
     }
   }

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Summary

redact_user_api_key_info: True in litellm_settings does not affect this code path — only Langfuse callback metadata and a few other surfaces.

Source

litellm/proxy/hooks/parallel_request_limiter_v3.py, around line 1261 (in litellm latest as of 2026-05-13 via ghcr.io/berriai/litellm:main-stable):

detail = (
    f"Rate limit exceeded for {descriptor_key}: {descriptor_value}. "
    f"Limit type: {rate_limit_type}. "
    f"Current limit: {current_limit}, Remaining: {remaining_display}. "
    f"Limit resets at: {reset_time_formatted}"
)

raise HTTPException(
    status_code=429,
    detail=detail,
    headers={
        "retry-after": str(self.window_size),
        ...
    },
)

When descriptor_key == "api_key", the descriptor_value is the full token hash (the token field returned by /key/generate, i.e. SHA-256 of the raw sk-... key).

Reproduction

Generate a virtual key with a low rate limit:

   curl -s -X POST -H "Authorization: Bearer $MASTER_KEY" -H "Content-Type: application/json" \
     "http://localhost:4000/key/generate" \
     -d '{"models":["my-model"],"rpm_limit":5,"tpm_limit":1000,"key_alias":"ratelimit-test"}'

Exceed the rate limit with parallel requests:

   VKEY="sk-..."
   for i in $(seq 1 15); do
     curl -s -o /tmp/r_$i.json -w "%{http_code} " \
       -X POST http://localhost:4000/v1/chat/completions \
       -H "Authorization: Bearer $VKEY" -H "Content-Type: application/json" \
       -d '{"model":"my-model","messages":[{"role":"user","content":"hi"}],"max_tokens":5}' &
   done
   wait
   cat /tmp/r_1.json  # or whichever 429'd

Observe the 429 response body:

   {
     "error": {
       "message": "Rate limit exceeded for api_key: 523544f141d47ff188ff366337ddd3c9b44968b565d83a1c9b6fa56c543d3042. Limit type: requests. Current limit: 5, Remaining: 0. Limit resets at: 2026-05-13 18:25:44 UTC",
       "type": "None",
       "param": "None",
       "code": "429"
     }
   }

The 64-char hex string is the SHA-256 hash of the virtual key.

Why this matters

While the hash cannot be reversed to obtain the original key, exposing it in a customer-facing response body has real downsides:

Cross-request correlation. A third party that intercepts or aggregates 429s across customers can fingerprint which key is hitting limits.
Information disclosure about internal structure. Customers and integrators learn that LiteLLM stores keys as SHA-256 hashes — useful recon for an attacker.
Surprise vs redact_user_api_key_info: True. Users who set that flag reasonably expect "api_key info" to be redacted in user-visible surfaces, but the flag does not cover this path. The behavior is silently inconsistent.
Existing infrastructure. Internal logs of this exception already display the value as REDACTED when redact_user_api_key_info is set, suggesting the redaction logic exists and just isn't applied here.

I considered using a CustomLogger.async_post_call_failure_hook to rewrite the body, but the rate limiter raises HTTPException directly from the pre-call path, so failure hooks never see it.

Suggested fix

When constructing the detail string at the cited line, sanitize the descriptor value for sensitive descriptor keys. Minimum:

def _safe_descriptor_value(key: str, value: str) -> str:
    if key == "api_key" and len(value) >= 16:
        return f"{value[:8]}…"  # first 8 chars only, enough for support debugging
    return value

Or, more aggressively, respect litellm.redact_user_api_key_info:

from litellm import redact_user_api_key_info as _redact_flag

descriptor_display = (
    "REDACTED"
    if (descriptor_key == "api_key" and _redact_flag)
    else descriptor_value
)

Happy to send a PR if a maintainer agrees on the preferred form.

Environment

LiteLLM image: ghcr.io/berriai/litellm:main-stable (pulled 2026-05-13)
Python 3.13
Deployment: Docker, Postgres backend, Langfuse v3 callbacks
Config has redact_user_api_key_info: True set in litellm_settings

Steps to Reproduce

Generate a virtual key with a low rate limit:

   curl -s -X POST -H "Authorization: Bearer $MASTER_KEY" -H "Content-Type: application/json" \
     "http://localhost:4000/key/generate" \
     -d '{"models":["my-model"],"rpm_limit":5,"tpm_limit":1000,"key_alias":"ratelimit-test"}'

Exceed the rate limit with parallel requests:

   VKEY="sk-..."
   for i in $(seq 1 15); do
     curl -s -o /tmp/r_$i.json -w "%{http_code} " \
       -X POST http://localhost:4000/v1/chat/completions \
       -H "Authorization: Bearer $VKEY" -H "Content-Type: application/json" \
       -d '{"model":"my-model","messages":[{"role":"user","content":"hi"}],"max_tokens":5}' &
   done
   wait
   cat /tmp/r_1.json  # or whichever 429'd

Observe the 429 response body:

   {
     "error": {
       "message": "Rate limit exceeded for api_key: 523544f141d47ff188ff366337ddd3c9b44968b565d83a1c9b6fa56c543d3042. Limit type: requests. Current limit: 5, Remaining: 0. Limit resets at: 2026-05-13 18:25:44 UTC",
       "type": "None",
       "param": "None",
       "code": "429"
     }
   }

The 64-char hex string is the SHA-256 hash of the virtual key.

Relevant log output

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

1.83.10

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #agent execution #callback error #memory management #API rate limit

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Rate limit error message body leaks full SHA-256 token hash on 429 responses [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Check for existing issues

What happened?

Summary

Source

Reproduction

Why this matters

Suggested fix

Environment

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Rate limit error message body leaks full SHA-256 token hash on 429 responses [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Check for existing issues

What happened?

Summary

Source

Reproduction

Why this matters

Suggested fix

Environment

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Still need to ship something?

RELATED_DISCOVERY

TRENDING