litellm - 💡(How to fix) Fix [Bug]: OTel span hierarchy broken on /v1/messages and guardrail spans never emitted

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Root cause: _is_proxy_only_llm_api_error() in proxy/utils.py:1904 only returns True for isinstance(original_exception, HTTPException). GuardrailRaisedException extends Exception, not HTTPException, so it returns False. This means _handle_logging_proxy_only_error() is never called → Logging.async_failure_handler() never fires → OTel's _handle_failure()_create_guardrail_span() never executes.

Root Cause

Root cause: _get_span_context() in opentelemetry.py:2313-2314 only reads litellm_parent_otel_span from litellm_params["metadata"], but for /v1/messages the span is stored in litellm_params["litellm_metadata"].

Code Example

Received Proxy Server Request    (trace_id: A, span_id: 1)
├── auth                         (trace_id: A, parent_id: 1)
├── proxy_pre_call               (trace_id: A, parent_id: 1)
├── litellm_request              (trace_id: A, parent_id: 1)
│   └── guardrail                (trace_id: A, parent_id: litellm_request)

---

auth                             (trace_id: A, parent_id: X)
proxy_pre_call                   (trace_id: A, parent_id: X)
litellm_request                  (trace_id: B, parent_id: null)  ← different trace, orphan root
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Two OTel tracing bugs affect the /v1/messages (Anthropic Messages API) endpoint when using USE_OTEL_LITELLM_REQUEST_SPAN=true with the Generic Guardrail API (LiteLLM v1.83.0+).

Bug 1: Broken span hierarchy on /v1/messages

Expected (per LiteLLM docs):

Received Proxy Server Request    (trace_id: A, span_id: 1)
├── auth                         (trace_id: A, parent_id: 1)
├── proxy_pre_call               (trace_id: A, parent_id: 1)
├── litellm_request              (trace_id: A, parent_id: 1)
│   └── guardrail                (trace_id: A, parent_id: litellm_request)

Observed:

auth                             (trace_id: A, parent_id: X)
proxy_pre_call                   (trace_id: A, parent_id: X)
litellm_request                  (trace_id: B, parent_id: null)  ← different trace, orphan root

Root cause: _get_span_context() in opentelemetry.py:2313-2314 only reads litellm_parent_otel_span from litellm_params["metadata"], but for /v1/messages the span is stored in litellm_params["litellm_metadata"].

The chain:

  1. /v1/messages is in LITELLM_METADATA_ROUTES (litellm_pre_call_utils.py:109), so _get_metadata_variable_name() returns "litellm_metadata"
  2. litellm_parent_otel_span is stored in data["litellm_metadata"] (litellm_pre_call_utils.py:1657-1659)
  3. In function_setup() (utils.py:1148-1161), if the Anthropic request body contains a native metadata field (e.g. {"user_id": "..."}), litellm_params["metadata"] gets set to that native metadata, and the fallback copy from litellm_metadata at line 1160 is skipped (because metadata is already truthy)
  4. _get_span_context() reads litellm_params["metadata"] — finds Anthropic's native metadata, no litellm_parent_otel_span — falls through to create an orphan root span

The same issue affects _end_proxy_span_from_kwargs() at line 868-869 — the proxy span is never closed because it can't find it via litellm_params["metadata"].

Why OpenAI /chat/completions works: _get_metadata_variable_name() returns "metadata", so litellm_parent_otel_span is stored directly in data["metadata"] and flows through function_setup() into litellm_params["metadata"] where _get_span_context() finds it.

Related: #24945 (same litellm_metadata vs metadata confusion on /v1/messages, different symptom)

Bug 2: Guardrail spans never emitted on BLOCKED

When a guardrail blocks a request (GuardrailRaisedException), no guardrail span is emitted.

Root cause: _is_proxy_only_llm_api_error() in proxy/utils.py:1904 only returns True for isinstance(original_exception, HTTPException). GuardrailRaisedException extends Exception, not HTTPException, so it returns False. This means _handle_logging_proxy_only_error() is never called → Logging.async_failure_handler() never fires → OTel's _handle_failure()_create_guardrail_span() never executes.

The async_post_call_failure_hook at opentelemetry.py:645 does fire (via the callback loop), but it only creates a "Failed Proxy Server Request" child span — it never calls _create_guardrail_span().

Steps to Reproduce

  1. Configure a Generic Guardrail API guardrail with mode: [pre_call] and default_on: true
  2. Set USE_OTEL_LITELLM_REQUEST_SPAN=true
  3. Send a request via POST /v1/messages

For Bug 1: Observe litellm_request has parent_id: null and a different trace_id from auth/proxy_pre_call

For Bug 2: Send a request that triggers guardrail blocking. No guardrail span appears in the trace.

Suggested Fix

Bug 1: In _get_span_context() and _end_proxy_span_from_kwargs(), add a fallback to check litellm_params["litellm_metadata"] when litellm_parent_otel_span is not found in litellm_params["metadata"].

Bug 2: Either make _is_proxy_only_llm_api_error() recognize GuardrailRaisedException, or emit guardrail spans directly in async_post_call_failure_hook by reading guardrail information from request_data["metadata"].

I'm happy to submit a PR with fixes for both.

Are you a ML Ops team?

Yes

Twitter / LinkedIn details

n/a

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: OTel span hierarchy broken on /v1/messages and guardrail spans never emitted