litellm - 💡(How to fix) Fix [Bug]: langfuse_otel callback drops OTEL spans under asyncio.gather() concurrency [1 participants]

shuttlesworthNEO · 2026-04-24T01:17:16Z

[litellm] Check for existing issues - x I have searched the existing issues and checked that my issue is not a duplicate. What happened? When multiple litellm.… ## Fix / Workaround LiteLLM dispatches success callbacks via `asyncio.create_task()` through `GLOBAL_LOGGING_WORKER`. When multiple `acompletion()` calls complete near-simultaneously under `asyncio.gather()`, the callback tasks that create OTEL spans (`_handle_success` → `_start_primary_span`) interfere with each other. The result: some spans are never created or are lost before reaching the `BatchSpanProcessor`. - **Cost tracking**: Langfuse cannot compute cost for dropped spans → batch operation costs are underreported - **Observability**: Incomplete traces make debugging LLM pipelines unreliable - **Workaround required**: Users must switch from asyncio to threading for concurrent LLM calls, losing asyncio's efficiency benefits The callback dispatch in `_client_async_logging_helper` uses `asyncio.create_task()` which is not safe for concurrent span creation. Consider: 1. Creating OTEL spans synchronously within the callback (before dispatching to background) 2. Or providing a mechanism to ensure all callback tasks complete before the caller proceeds ### Check for existing issues - [x] I have searched the existing issues and checked that my issue is not a duplicate. ### What happened? When multiple `litellm.acompletion()` calls run concurrently via `asyncio.gather()`, the `langfuse_otel` callback consistently drops ~1 in 3 OTEL spans. The spans are silently lost — no errors, no warnings. This means Langfuse traces are incomplete and cost tracking is unreliable for batch LLM operations. This is related to #25978 (which covers the teardown/drain variant), but this is the **concurrent execution** variant — spans are dropped while the event loop is still alive, not during shutdown. ### Root cause LiteLLM dispatches success callbacks via `asyncio.create_task()` through `GLOBAL_LOGGING_WORKER`. When multiple `acompletion()` calls complete near-simultaneously under `asyncio.gather()`, the callback tasks that create OTEL spans (`_handle_success` → `_start_primary_span`) interfere with each other. The result: some spans are never created or are lost before reaching the `BatchSpanProcessor`. ### Steps to Reproduce ```python import asyncio import litellm from langfuse import observe, get_client litellm.success_callback = ["langfuse_otel"] os.environ["USE_OTEL_LITELLM_REQUEST_SPAN"] = "true" async def classify(text): return await litellm.acompletion( model="claude-haiku-4-5-20251001", messages=[ {"role": "system", "content": "Classify as: positive/negative. One word."}, {"role": "user", "content": text}, ], max_tokens=10, ) @observe(name="batch_operation") def run_batch(): async def _run(): tasks = [ classify("I love this product"), classify("Terrible experience"), classify("It was okay"), ] return await asyncio.gather(*tasks) return asyncio.run(_run()) run_batch() get_client().flush() # Force flush OTEL exporter from opentelemetry import trace trace.get_tracer_provider().force_flush(timeout_millis=5000) ``` **Expected:** 3 `litellm_request` GENERATION spans nested under `batch_operation` **Actual:** Only 1-2 `litellm_request` spans appear. The missing span(s) change between runs (race condition). ### Verification that it's concurrency-specific | Scenario | Spans created | |----------|--------------| | Sequential (`for` loop + `await`) | 3/3 ✓ | | `asyncio.gather()` (no explicit parent) | 1/3 | | `asyncio.gather()` + explicit `litellm_parent_otel_span` | 2/3 | | `asyncio.gather()` + inline `force_flush()` | 2/3 | | `asyncio.gather()` + no per-call flush | 2/3 | | `ThreadPoolExecutor` + sync `completion()` | 3/3 ✓ | Threading works because each thread gets its own context copy. The issue is specific to asyncio task concurrency. ### Impact - **Cost tracking**: Langfuse cannot compute cost for dropped spans → batch operation costs are underreported - **Observability**: Incomplete traces make debugging LLM pipelines unreliable - **Workaround required**: Users must switch from asyncio to threading for concurrent LLM calls, losing asyncio's efficiency benefits ### Environment - LiteLLM version: latest (installed via pip) - Langfuse SDK: 4.5.0 - Python: 3.11 - OpenTelemetry SDK: 1.41.0 - Runtime: AWS Lambda (512MB) and local ### Suggested fix The callback dispatch in `_client_async_logging_helper` uses `asyncio.create_task()` which is not safe for concurrent span creation. Consider: 1. Creating OTEL spans synchronously within the callback (before dispatching to background) 2. Or providing a mechanism to ensure all callback tasks complete before the caller proceeds ### Are you a ML Ops Team? No ### Twitter / LinkedIn details _No response_

litellm2026-04-24 01:17:16

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26376•Fetched 2026-04-24 10:36:40

View on GitHub

Comments

Participants

Timeline

Reactions

Author

shuttlesworthNEO

Participants

shuttlesworthNEO

Timeline (top)

cross-referenced ×1

Root Cause

LiteLLM dispatches success callbacks via asyncio.create_task() through GLOBAL_LOGGING_WORKER. When multiple acompletion() calls complete near-simultaneously under asyncio.gather(), the callback tasks that create OTEL spans (_handle_success → _start_primary_span) interfere with each other. The result: some spans are never created or are lost before reaching the BatchSpanProcessor.

Fix Action

Fix / Workaround

Cost tracking: Langfuse cannot compute cost for dropped spans → batch operation costs are underreported
Observability: Incomplete traces make debugging LLM pipelines unreliable
Workaround required: Users must switch from asyncio to threading for concurrent LLM calls, losing asyncio's efficiency benefits

The callback dispatch in _client_async_logging_helper uses asyncio.create_task() which is not safe for concurrent span creation. Consider:

Creating OTEL spans synchronously within the callback (before dispatching to background)
Or providing a mechanism to ensure all callback tasks complete before the caller proceeds

Code Example

import asyncio
import litellm
from langfuse import observe, get_client

litellm.success_callback = ["langfuse_otel"]
os.environ["USE_OTEL_LITELLM_REQUEST_SPAN"] = "true"

async def classify(text):
    return await litellm.acompletion(
        model="claude-haiku-4-5-20251001",
        messages=[
            {"role": "system", "content": "Classify as: positive/negative. One word."},
            {"role": "user", "content": text},
        ],
        max_tokens=10,
    )

@observe(name="batch_operation")
def run_batch():
    async def _run():
        tasks = [
            classify("I love this product"),
            classify("Terrible experience"),
            classify("It was okay"),
        ]
        return await asyncio.gather(*tasks)
    return asyncio.run(_run())

run_batch()
get_client().flush()

# Force flush OTEL exporter
from opentelemetry import trace
trace.get_tracer_provider().force_flush(timeout_millis=5000)

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When multiple litellm.acompletion() calls run concurrently via asyncio.gather(), the langfuse_otel callback consistently drops ~1 in 3 OTEL spans. The spans are silently lost — no errors, no warnings. This means Langfuse traces are incomplete and cost tracking is unreliable for batch LLM operations.

This is related to #25978 (which covers the teardown/drain variant), but this is the concurrent execution variant — spans are dropped while the event loop is still alive, not during shutdown.

Root cause

Steps to Reproduce

import asyncio
import litellm
from langfuse import observe, get_client

litellm.success_callback = ["langfuse_otel"]
os.environ["USE_OTEL_LITELLM_REQUEST_SPAN"] = "true"

async def classify(text):
    return await litellm.acompletion(
        model="claude-haiku-4-5-20251001",
        messages=[
            {"role": "system", "content": "Classify as: positive/negative. One word."},
            {"role": "user", "content": text},
        ],
        max_tokens=10,
    )

@observe(name="batch_operation")
def run_batch():
    async def _run():
        tasks = [
            classify("I love this product"),
            classify("Terrible experience"),
            classify("It was okay"),
        ]
        return await asyncio.gather(*tasks)
    return asyncio.run(_run())

run_batch()
get_client().flush()

# Force flush OTEL exporter
from opentelemetry import trace
trace.get_tracer_provider().force_flush(timeout_millis=5000)

Expected: 3 litellm_request GENERATION spans nested under batch_operation Actual: Only 1-2 litellm_request spans appear. The missing span(s) change between runs (race condition).

Verification that it's concurrency-specific

Scenario	Spans created
Sequential (`for` loop + `await`)	3/3 ✓
`asyncio.gather()` (no explicit parent)	1/3
`asyncio.gather()` + explicit `litellm_parent_otel_span`	2/3
`asyncio.gather()` + inline `force_flush()`	2/3
`asyncio.gather()` + no per-call flush	2/3
`ThreadPoolExecutor` + sync `completion()`	3/3 ✓

Threading works because each thread gets its own context copy. The issue is specific to asyncio task concurrency.

Impact

Cost tracking: Langfuse cannot compute cost for dropped spans → batch operation costs are underreported
Observability: Incomplete traces make debugging LLM pipelines unreliable
Workaround required: Users must switch from asyncio to threading for concurrent LLM calls, losing asyncio's efficiency benefits

Environment

LiteLLM version: latest (installed via pip)
Langfuse SDK: 4.5.0
Python: 3.11
OpenTelemetry SDK: 1.41.0
Runtime: AWS Lambda (512MB) and local

Suggested fix

The callback dispatch in _client_async_logging_helper uses asyncio.create_task() which is not safe for concurrent span creation. Consider:

Creating OTEL spans synchronously within the callback (before dispatching to background)
Or providing a mechanism to ensure all callback tasks complete before the caller proceeds

Are you a ML Ops Team?

Twitter / LinkedIn details

No response

extent analysis

TL;DR

To fix the issue of dropped OTEL spans when using asyncio.gather() with litellm.acompletion(), consider creating OTEL spans synchronously within the callback or ensuring all callback tasks complete before the caller proceeds.

Guidance

Review the _client_async_logging_helper function to identify where asyncio.create_task() is used to dispatch callbacks and consider modifying it to create OTEL spans synchronously.
Investigate using asyncio.wait() or asyncio.as_completed() to ensure all callback tasks complete before the caller proceeds.
Verify that the fix works by running the provided example code and checking that all expected litellm_request spans are created.
Consider using a Lock or Semaphore to synchronize access to the OTEL span creation code and prevent concurrent interference.

Example

import asyncio

# Create a lock to synchronize access to OTEL span creation
otel_span_lock = asyncio.Lock()

async def _handle_success():
    async with otel_span_lock:
        # Create OTEL span synchronously within the callback
        _start_primary_span()

Notes

The provided example code and suggested fix are specific to the litellm and langfuse libraries, and may not be applicable to other libraries or frameworks. Additionally, the fix may require modifications to the underlying library code, which could have unintended consequences.

Recommendation

Apply a workaround by creating OTEL spans synchronously within the callback or ensuring all callback tasks complete before the caller proceeds, as this is a more targeted solution to the specific issue described.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#latency issue #model loading #dependency error #configuration error #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: langfuse_otel callback drops OTEL spans under asyncio.gather() concurrency [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Check for existing issues

What happened?

Root cause

Steps to Reproduce

Verification that it's concurrency-specific

Impact

Environment

Suggested fix

Are you a ML Ops Team?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: langfuse_otel callback drops OTEL spans under asyncio.gather() concurrency [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Check for existing issues

What happened?

Root cause

Steps to Reproduce

Verification that it's concurrency-specific

Impact

Environment

Suggested fix

Are you a ML Ops Team?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING