litellm - 💡(How to fix) Fix [Bug]: OTel gen_ai.output.messages is never set for Responses API (/v1/responses) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25840Fetched 2026-04-17 08:28:46
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
labeled ×2

Root Cause

File: litellm/integrations/opentelemetry.py, method set_attributes(), line 1705:

if response_obj.get("choices"):
    transformed_choices = self._transform_choices_to_otel_semantic_conventions(
        response_obj.get("choices")
    )
    self.safe_set_attribute(
        span=span,
        key=SpanAttributes.GEN_AI_OUTPUT_MESSAGES.value,
        value=safe_dumps(transformed_choices),
    )

ResponsesAPIResponse (defined in litellm/types/llms/openai.py:1247) has an output field, not choices. So response_obj.get("choices") returns None and the entire block is skipped — including gen_ai.output.messages, gen_ai.response.finish_reasons, and tool call extraction.

Token counts work because they use a separate path at line 1626 via response_obj.get("usage"), which ResponsesAPIResponse does have (its usage is already normalized to chat-completion format with prompt_tokens/completion_tokens by _transform_usage_objects() in litellm_logging.py:1778 before the callback fires).

Code Example

litellm_settings:
     callbacks: ["otel"]

---

curl -X POST http://localhost:4000/v1/responses \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer sk-1234" \
     -d '{"model": "gpt-4o-mini", "input": "What is 2+2?"}'

---

gen_ai.output.messages: [{"role":"assistant","parts":[{"type":"text","content":"..."}]}]

---

if response_obj.get("choices"):
    transformed_choices = self._transform_choices_to_otel_semantic_conventions(
        response_obj.get("choices")
    )
    self.safe_set_attribute(
        span=span,
        key=SpanAttributes.GEN_AI_OUTPUT_MESSAGES.value,
        value=safe_dumps(transformed_choices),
    )

---

if response_obj.get("choices"):
    # existing chat/completions output extraction ...
elif response_obj.get("output"):
    # Responses API: extract text from output items
    output_messages = []
    for item in response_obj.get("output", []):
        if isinstance(item, dict) and item.get("type") == "message":
            for content in item.get("content", []):
                if isinstance(content, dict) and content.get("type") == "output_text":
                    output_messages.append({
                        "role": "assistant",
                        "parts": [{"type": "text", "content": content.get("text", "")}]
                    })
    if output_messages:
        self.safe_set_attribute(
            span=span,
            key=SpanAttributes.GEN_AI_OUTPUT_MESSAGES.value,
            value=safe_dumps(output_messages),
        )
    # Also extract finish reason from ResponsesAPIResponse.status
    status = response_obj.get("status")
    if status:
        self.safe_set_attribute(
            span=span,
            key=SpanAttributes.GEN_AI_RESPONSE_FINISH_REASONS.value,
            value=safe_dumps([status]),
        )
RAW_BUFFERClick to expand / collapse

What happened?

When using the /v1/responses (Responses API) endpoint, the OTel span is missing gen_ai.output.messages entirely. The LLM response content is lost from traces. This affects both proxy mode and the SDK (litellm.responses() / litellm.aresponses()).

Token counts (gen_ai.usage.*), costs (gen_ai.cost.*), and input messages (gen_ai.input.messages) all work correctly — only the output content is missing.

Related issues

  • #25240 / PR #25309 — Fixes llm.None.* attribute keys and the empty gen_ai.system by adding custom_llm_provider to litellm_params. Once merged, that PR resolves the provider-identification side of the problem. However, gen_ai.output.messages will still be missing even after that fix lands.
  • #24057 — Separate OTel attribute type issue for gen_ai.prompt (not directly related but in the same integration file).

Environment

  • LiteLLM version: 1.74.4 (also verified against current source)
  • Python: 3.12
  • Proxy mode with callbacks: ["otel"]
  • Tested with gpt-4o-mini via OpenAI

Steps to Reproduce

  1. Start the LiteLLM proxy with OTel enabled:

    litellm_settings:
      callbacks: ["otel"]
  2. Send a request to /v1/responses:

    curl -X POST http://localhost:4000/v1/responses \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer sk-1234" \
      -d '{"model": "gpt-4o-mini", "input": "What is 2+2?"}'
  3. Inspect the OTel span attributes on the parent span.

Actual behavior

The parent span has no gen_ai.output.messages attribute. For comparison, /chat/completions correctly produces:

gen_ai.output.messages: [{"role":"assistant","parts":[{"type":"text","content":"..."}]}]

The raw_gen_ai_request child span (for non-streaming) does contain the response text under llm.<provider>.output, confirming the response exists — it just never gets written to gen_ai.output.messages on the parent span.

Expected behavior

gen_ai.output.messages should be populated with the response content for Responses API calls, the same way it is for /chat/completions.

Root Cause

File: litellm/integrations/opentelemetry.py, method set_attributes(), line 1705:

if response_obj.get("choices"):
    transformed_choices = self._transform_choices_to_otel_semantic_conventions(
        response_obj.get("choices")
    )
    self.safe_set_attribute(
        span=span,
        key=SpanAttributes.GEN_AI_OUTPUT_MESSAGES.value,
        value=safe_dumps(transformed_choices),
    )

ResponsesAPIResponse (defined in litellm/types/llms/openai.py:1247) has an output field, not choices. So response_obj.get("choices") returns None and the entire block is skipped — including gen_ai.output.messages, gen_ai.response.finish_reasons, and tool call extraction.

Token counts work because they use a separate path at line 1626 via response_obj.get("usage"), which ResponsesAPIResponse does have (its usage is already normalized to chat-completion format with prompt_tokens/completion_tokens by _transform_usage_objects() in litellm_logging.py:1778 before the callback fires).

Suggested Fix

Add an elif branch after the existing choices check to handle the Responses API output structure:

if response_obj.get("choices"):
    # existing chat/completions output extraction ...
elif response_obj.get("output"):
    # Responses API: extract text from output items
    output_messages = []
    for item in response_obj.get("output", []):
        if isinstance(item, dict) and item.get("type") == "message":
            for content in item.get("content", []):
                if isinstance(content, dict) and content.get("type") == "output_text":
                    output_messages.append({
                        "role": "assistant",
                        "parts": [{"type": "text", "content": content.get("text", "")}]
                    })
    if output_messages:
        self.safe_set_attribute(
            span=span,
            key=SpanAttributes.GEN_AI_OUTPUT_MESSAGES.value,
            value=safe_dumps(output_messages),
        )
    # Also extract finish reason from ResponsesAPIResponse.status
    status = response_obj.get("status")
    if status:
        self.safe_set_attribute(
            span=span,
            key=SpanAttributes.GEN_AI_RESPONSE_FINISH_REASONS.value,
            value=safe_dumps([status]),
        )

Impact

Any downstream OTel consumer (Datadog, Honeycomb, Fiddler, custom pipelines, etc.) that relies on gen_ai.output.messages for response content will receive empty output for all /v1/responses calls. This makes the Responses API appear to produce no output in observability dashboards, even though the LLM call succeeds and tokens are counted correctly.

What part of LiteLLM is this about?

Proxy, OpenTelemetry integration

What LiteLLM version are you on?

1.74.4+ (current source as of April 2026)

extent analysis

TL;DR

The issue can be fixed by adding an elif branch to handle the Responses API output structure in the set_attributes() method of opentelemetry.py.

Guidance

  • The root cause is identified as the response_obj.get("choices") check in opentelemetry.py, which returns None for Responses API calls because they have an output field instead of choices.
  • To fix this, add an elif branch to handle the output structure, as suggested in the issue.
  • The new branch should extract the text from the output items and set the gen_ai.output.messages attribute accordingly.
  • Additionally, extract the finish reason from the status field of the ResponsesAPIResponse object.

Example

The suggested fix provides a code snippet that demonstrates how to handle the output structure:

elif response_obj.get("output"):
    # Responses API: extract text from output items
    output_messages = []
    for item in response_obj.get("output", []):
        if isinstance(item, dict) and item.get("type") == "message":
            for content in item.get("content", []):
                if isinstance(content, dict) and content.get("type") == "output_text":
                    output_messages.append({
                        "role": "assistant",
                        "parts": [{"type": "text", "content": content.get("text", "")}]
                    })
    if output_messages:
        self.safe_set_attribute(
            span=span,
            key=SpanAttributes.GEN_AI_OUTPUT_MESSAGES.value,
            value=safe_dumps(output_messages),
        )

Notes

The fix assumes that the output structure of the Responses API response is consistent with the expected format. If the format varies, additional handling may be necessary.

Recommendation

Apply the suggested workaround by adding the elif branch to handle the output structure, as this will fix the issue and populate the gen_ai.output.messages attribute correctly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

gen_ai.output.messages should be populated with the response content for Responses API calls, the same way it is for /chat/completions.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING