litellm - 💡(How to fix) Fix [Bug]: langfuse_otel callback drops OTEL spans under asyncio.gather() concurrency [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26376Fetched 2026-04-24 10:36:40
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

Root Cause

LiteLLM dispatches success callbacks via asyncio.create_task() through GLOBAL_LOGGING_WORKER. When multiple acompletion() calls complete near-simultaneously under asyncio.gather(), the callback tasks that create OTEL spans (_handle_success_start_primary_span) interfere with each other. The result: some spans are never created or are lost before reaching the BatchSpanProcessor.

Fix Action

Fix / Workaround

LiteLLM dispatches success callbacks via asyncio.create_task() through GLOBAL_LOGGING_WORKER. When multiple acompletion() calls complete near-simultaneously under asyncio.gather(), the callback tasks that create OTEL spans (_handle_success_start_primary_span) interfere with each other. The result: some spans are never created or are lost before reaching the BatchSpanProcessor.

  • Cost tracking: Langfuse cannot compute cost for dropped spans → batch operation costs are underreported
  • Observability: Incomplete traces make debugging LLM pipelines unreliable
  • Workaround required: Users must switch from asyncio to threading for concurrent LLM calls, losing asyncio's efficiency benefits

The callback dispatch in _client_async_logging_helper uses asyncio.create_task() which is not safe for concurrent span creation. Consider:

  1. Creating OTEL spans synchronously within the callback (before dispatching to background)
  2. Or providing a mechanism to ensure all callback tasks complete before the caller proceeds

Code Example

import asyncio
import litellm
from langfuse import observe, get_client

litellm.success_callback = ["langfuse_otel"]
os.environ["USE_OTEL_LITELLM_REQUEST_SPAN"] = "true"

async def classify(text):
    return await litellm.acompletion(
        model="claude-haiku-4-5-20251001",
        messages=[
            {"role": "system", "content": "Classify as: positive/negative. One word."},
            {"role": "user", "content": text},
        ],
        max_tokens=10,
    )

@observe(name="batch_operation")
def run_batch():
    async def _run():
        tasks = [
            classify("I love this product"),
            classify("Terrible experience"),
            classify("It was okay"),
        ]
        return await asyncio.gather(*tasks)
    return asyncio.run(_run())

run_batch()
get_client().flush()

# Force flush OTEL exporter
from opentelemetry import trace
trace.get_tracer_provider().force_flush(timeout_millis=5000)
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When multiple litellm.acompletion() calls run concurrently via asyncio.gather(), the langfuse_otel callback consistently drops ~1 in 3 OTEL spans. The spans are silently lost — no errors, no warnings. This means Langfuse traces are incomplete and cost tracking is unreliable for batch LLM operations.

This is related to #25978 (which covers the teardown/drain variant), but this is the concurrent execution variant — spans are dropped while the event loop is still alive, not during shutdown.

Root cause

LiteLLM dispatches success callbacks via asyncio.create_task() through GLOBAL_LOGGING_WORKER. When multiple acompletion() calls complete near-simultaneously under asyncio.gather(), the callback tasks that create OTEL spans (_handle_success_start_primary_span) interfere with each other. The result: some spans are never created or are lost before reaching the BatchSpanProcessor.

Steps to Reproduce

import asyncio
import litellm
from langfuse import observe, get_client

litellm.success_callback = ["langfuse_otel"]
os.environ["USE_OTEL_LITELLM_REQUEST_SPAN"] = "true"

async def classify(text):
    return await litellm.acompletion(
        model="claude-haiku-4-5-20251001",
        messages=[
            {"role": "system", "content": "Classify as: positive/negative. One word."},
            {"role": "user", "content": text},
        ],
        max_tokens=10,
    )

@observe(name="batch_operation")
def run_batch():
    async def _run():
        tasks = [
            classify("I love this product"),
            classify("Terrible experience"),
            classify("It was okay"),
        ]
        return await asyncio.gather(*tasks)
    return asyncio.run(_run())

run_batch()
get_client().flush()

# Force flush OTEL exporter
from opentelemetry import trace
trace.get_tracer_provider().force_flush(timeout_millis=5000)

Expected: 3 litellm_request GENERATION spans nested under batch_operation Actual: Only 1-2 litellm_request spans appear. The missing span(s) change between runs (race condition).

Verification that it's concurrency-specific

ScenarioSpans created
Sequential (for loop + await)3/3 ✓
asyncio.gather() (no explicit parent)1/3
asyncio.gather() + explicit litellm_parent_otel_span2/3
asyncio.gather() + inline force_flush()2/3
asyncio.gather() + no per-call flush2/3
ThreadPoolExecutor + sync completion()3/3 ✓

Threading works because each thread gets its own context copy. The issue is specific to asyncio task concurrency.

Impact

  • Cost tracking: Langfuse cannot compute cost for dropped spans → batch operation costs are underreported
  • Observability: Incomplete traces make debugging LLM pipelines unreliable
  • Workaround required: Users must switch from asyncio to threading for concurrent LLM calls, losing asyncio's efficiency benefits

Environment

  • LiteLLM version: latest (installed via pip)
  • Langfuse SDK: 4.5.0
  • Python: 3.11
  • OpenTelemetry SDK: 1.41.0
  • Runtime: AWS Lambda (512MB) and local

Suggested fix

The callback dispatch in _client_async_logging_helper uses asyncio.create_task() which is not safe for concurrent span creation. Consider:

  1. Creating OTEL spans synchronously within the callback (before dispatching to background)
  2. Or providing a mechanism to ensure all callback tasks complete before the caller proceeds

Are you a ML Ops Team?

No

Twitter / LinkedIn details

No response

extent analysis

TL;DR

To fix the issue of dropped OTEL spans when using asyncio.gather() with litellm.acompletion(), consider creating OTEL spans synchronously within the callback or ensuring all callback tasks complete before the caller proceeds.

Guidance

  • Review the _client_async_logging_helper function to identify where asyncio.create_task() is used to dispatch callbacks and consider modifying it to create OTEL spans synchronously.
  • Investigate using asyncio.wait() or asyncio.as_completed() to ensure all callback tasks complete before the caller proceeds.
  • Verify that the fix works by running the provided example code and checking that all expected litellm_request spans are created.
  • Consider using a Lock or Semaphore to synchronize access to the OTEL span creation code and prevent concurrent interference.

Example

import asyncio

# Create a lock to synchronize access to OTEL span creation
otel_span_lock = asyncio.Lock()

async def _handle_success():
    async with otel_span_lock:
        # Create OTEL span synchronously within the callback
        _start_primary_span()

Notes

The provided example code and suggested fix are specific to the litellm and langfuse libraries, and may not be applicable to other libraries or frameworks. Additionally, the fix may require modifications to the underlying library code, which could have unintended consequences.

Recommendation

Apply a workaround by creating OTEL spans synchronously within the callback or ensuring all callback tasks complete before the caller proceeds, as this is a more targeted solution to the specific issue described.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: langfuse_otel callback drops OTEL spans under asyncio.gather() concurrency [1 participants]