litellm - 💡(How to fix) Fix Variant analysis around recent guardrail/prompt advisories: DoS-by-admin via untimed module exec + SSRF design note [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#28259Fetched 2026-05-20 03:40:26
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0

Error Message

This puts both the module-level exec and the function call under a single timeout. Error reporting needs a small refactor (you lose the precise compilation-vs-execution distinction unless you catch and re-raise typed exceptions inside the worker), but the overall structure is straightforward.

Root Cause

Filing as a regular issue rather than a security advisory because both findings below are admin-gated (so they're DoS-by-admin, not external attacker reachable) and the third item is a design note rather than a bug. If you'd prefer any of them be tracked privately, let me know and I'll re-route.

Fix Action

Fix / Workaround

Validate and normalize method

method = method.upper() allowed_methods = {"GET", "POST", "PUT", "DELETE", "PATCH"} if method not in allowed_methods: return _http_error_response(...)


**Suggested mitigation:** add an optional allowlist (config-driven) for HTTP destinations from custom guardrail code. Default-deny private/loopback/link-local would be a reasonable starting posture; users who genuinely need to call internal services would opt-in.

Code Example

EXECUTION_TIMEOUT_SECONDS = 5

try:
    exec_globals = build_sandbox_globals()

    try:
        compiled = compile_sandboxed(request.custom_code)
        exec(compiled, exec_globals)  # noqa: S102   <-- no timeout protection
    except SyntaxError as e:
        ...
    ...

    # Step 4: Execute the function with timeout protection
    def execute_guardrail():
        return apply_fn(test_inputs, safe_request_data, request.input_type)

    try:
        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
            future = executor.submit(execute_guardrail)
            try:
                result = future.result(timeout=EXECUTION_TIMEOUT_SECONDS)

---

while True:
    pass

def apply_guardrail(inputs, request_data, input_type):
    return allow()

---

def compile_and_execute_guardrail():
    compiled = compile_sandboxed(request.custom_code)
    exec_globals_local = build_sandbox_globals()
    exec(compiled, exec_globals_local)
    apply_fn = exec_globals_local.get("apply_guardrail")
    if not callable(apply_fn):
        raise RuntimeError("Custom code must define an 'apply_guardrail' function")
    return apply_fn(test_inputs, safe_request_data, request.input_type), exec_globals_local

with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
    future = executor.submit(compile_and_execute_guardrail)
    result, exec_globals = future.result(timeout=EXECUTION_TIMEOUT_SECONDS)

---

def _do_compile(self) -> None:
    """Internal compilation method without lock. Expected to run inside _compile_lock."""
    exec_globals = build_sandbox_globals()
    compiled = compile_sandboxed(self.custom_code)
    exec(compiled, exec_globals)  # noqa: S102

---

# Validate URL
if not is_valid_url(url):
    return _http_error_response(f"Invalid URL: {url}")

# Validate and normalize method
method = method.upper()
allowed_methods = {"GET", "POST", "PUT", "DELETE", "PATCH"}
if method not in allowed_methods:
    return _http_error_response(...)
RAW_BUFFERClick to expand / collapse

Variant analysis following GHSA-wxxx-gvqv-xp7p (custom-code sandbox escape, fixed in 1.83.11) and GHSA-xqmj-j6mv-4862 (SSTI in /prompts/test, fixed in 1.83.7). Public source code review at HEAD; no proof-of-concept, no testing against any deployed instance.

Filing as a regular issue rather than a security advisory because both findings below are admin-gated (so they're DoS-by-admin, not external attacker reachable) and the third item is a design note rather than a bug. If you'd prefer any of them be tracked privately, let me know and I'll re-route.


Finding 1 — Module-level exec() in POST /guardrails/test_custom_code runs without timeout protection (Low-Medium, DoS-by-admin)

File: litellm/proxy/guardrails/guardrail_endpoints.py, around line 2097

The endpoint correctly requires PROXY_ADMIN (line 2084) and correctly applies a 5-second concurrent.futures.ThreadPoolExecutor timeout to the apply_guardrail function call (lines 2149-2158). However, the module-level exec() at line 2097 runs in the FastAPI handler thread with no timeout:

EXECUTION_TIMEOUT_SECONDS = 5

try:
    exec_globals = build_sandbox_globals()

    try:
        compiled = compile_sandboxed(request.custom_code)
        exec(compiled, exec_globals)  # noqa: S102   <-- no timeout protection
    except SyntaxError as e:
        ...
    ...

    # Step 4: Execute the function with timeout protection
    def execute_guardrail():
        return apply_fn(test_inputs, safe_request_data, request.input_type)

    try:
        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
            future = executor.submit(execute_guardrail)
            try:
                result = future.result(timeout=EXECUTION_TIMEOUT_SECONDS)

RestrictedPython blocks imports, exec, eval, compile, dunder access, and class definitions, so a malicious admin can't escalate to RCE through this path. But RestrictedPython does not prevent infinite loops in otherwise-valid Python. A trivial payload like:

while True:
    pass

def apply_guardrail(inputs, request_data, input_type):
    return allow()

will hang the request handler indefinitely on the exec() call. The 5-second timeout never engages because the function call that's wrapped in the timeout (apply_fn(...)) is never reached.

Impact: admin can DoS a single worker thread per request. With a few concurrent requests an admin can exhaust the proxy's worker pool. Admin-only, so not exploitable by external attackers, but a defense-in-depth gap worth closing.

Suggested fix: move the exec() call inside the same timeout-protected scope as apply_fn(). One sketch:

def compile_and_execute_guardrail():
    compiled = compile_sandboxed(request.custom_code)
    exec_globals_local = build_sandbox_globals()
    exec(compiled, exec_globals_local)
    apply_fn = exec_globals_local.get("apply_guardrail")
    if not callable(apply_fn):
        raise RuntimeError("Custom code must define an 'apply_guardrail' function")
    return apply_fn(test_inputs, safe_request_data, request.input_type), exec_globals_local

with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
    future = executor.submit(compile_and_execute_guardrail)
    result, exec_globals = future.result(timeout=EXECUTION_TIMEOUT_SECONDS)

This puts both the module-level exec and the function call under a single timeout. Error reporting needs a small refactor (you lose the precise compilation-vs-execution distinction unless you catch and re-raise typed exceptions inside the worker), but the overall structure is straightforward.


Finding 2 — Same pattern in persistent custom-code guardrail at startup (Low, defense-in-depth)

File: litellm/proxy/guardrails/guardrail_hooks/custom_code/custom_code_guardrail.py, around line 146-150

def _do_compile(self) -> None:
    """Internal compilation method without lock. Expected to run inside _compile_lock."""
    exec_globals = build_sandbox_globals()
    compiled = compile_sandboxed(self.custom_code)
    exec(compiled, exec_globals)  # noqa: S102

_do_compile() is called from _compile_custom_code() which is called from __init__(). When admin-configured custom code is registered as a guardrail, the proxy executes it at instantiation time. Same issue: a module-level infinite loop in admin-configured code hangs proxy startup.

Lower severity than Finding 1 because:

  • Affects startup, not arbitrary requests
  • Admin is the only one who can supply the code
  • Easier to diagnose (proxy "doesn't start" vs "hangs intermittently")

But the same fix shape applies: bound the compilation/execution in a timeout-protected worker. Even a generous 30-second startup compile timeout would prevent permanent hangs.


Finding 3 — HTTP primitives expose unrestricted outbound URL surface (design note, not a bug)

File: litellm/proxy/guardrails/guardrail_hooks/custom_code/primitives.py, http_request / http_get / http_post

The HTTP primitives passed to sandboxed guardrail code allow arbitrary outbound URLs. The validation logic is:

# Validate URL
if not is_valid_url(url):
    return _http_error_response(f"Invalid URL: {url}")

# Validate and normalize method
method = method.upper()
allowed_methods = {"GET", "POST", "PUT", "DELETE", "PATCH"}
if method not in allowed_methods:
    return _http_error_response(...)

There is no host/domain allowlist, no blocklist for RFC1918 / 169.254.169.254 / localhost. An admin who configures a custom code guardrail that takes user input and forwards any of it through http_post(...) creates a request-driven SSRF surface. The proxy's network identity (cluster network access, possible cloud-metadata token access) is what makes the outbound request.

This isn't a LiteLLM vulnerability per se — admins can already do worse things with custom code — but it's a footgun. A guardrail framework that exposes HTTP without an allowlist makes the "admin writes a small bug" → "user-driven SSRF" path very short.

Suggested mitigation: add an optional allowlist (config-driven) for HTTP destinations from custom guardrail code. Default-deny private/loopback/link-local would be a reasonable starting posture; users who genuinely need to call internal services would opt-in.


Method note + non-findings (for transparency)

Approach: I cloned HEAD (commit cff3e0b), grepped for all exec/eval/compile sites in litellm/proxy/, all query_raw/execute_raw callsites, and all @router.post/@router.put decorators on test/preview/update-shaped endpoints. Then read the relevant files in detail.

Things I checked and found correctly handled:

  • The SSTI fix in /prompts/test (GHSA-xqmj-j6mv-4862): PromptManager.jinja_env uses ImmutableSandboxedEnvironment from jinja2.sandbox. Other prompt-manager integrations (gitlab, arize, bitbucket) all use the same sandbox class.
  • The MCP test endpoint authorization fix (GHSA-v4p8-mg3p-g94g): both POST /test/connection and POST /test/tools/list correctly check LitellmUserRoles.PROXY_ADMIN early in the handler.
  • The SQL injection in CVE-2026-42208: the auth path is now via hash_token(api_key) and Prisma parameterized lookups. The raw SQL in key_aliases() (key_management_endpoints.py) interpolates only $N placeholders into the f-string; user values flow through *query_params to query_raw. I spot-checked several other query_raw callsites in spend_management_endpoints.py and proxy_server.py; all the ones I looked at used proper parameterization.
  • The RestrictedPython sandbox setup in guardrail_hooks/custom_code/sandbox.py is well-constructed (uses compile_restricted, safer_getattr, full_write_guard; the AsyncAwareTransformer extension is minimal and delegates to safe parent methods).

Things I didn't check thoroughly:

  • The OIDC cache-key collision path (GHSA-jjhc-v7c2-5hh6) — didn't have time to trace the cache key derivation logic comprehensively.
  • The pass-through endpoint registration path (referenced in GHSA-53mr-6c8q-9789) — only verified /config/update has the admin gate.
  • The full Jinja sandbox-escape surface — ImmutableSandboxedEnvironment is the right base class but specific bypasses have been published in the past; a thorough audit would test against the published bypass corpus.

These would be reasonable next steps for a follow-up review if you want a more complete sweep.

Thank you for the rapid-disclosure cadence on the recent advisories — that's what made variant analysis on this codebase tractable from public data alone.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix Variant analysis around recent guardrail/prompt advisories: DoS-by-admin via untimed module exec + SSRF design note [1 participants]