litellm - ✅(Solved) Fix Feature request: public API to drain GLOBAL_LOGGING_WORKER before process/loop exit [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25978Fetched 2026-04-18 05:52:40
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1

Async OTEL/observability integrations that sit on top of LiteLLM can lose callback spans when the user's asyncio.run() returns before the background logging pipeline has drained. I'd like a public, awaitable drain API so integrators don't have to peek at GLOBAL_LOGGING_WORKER._worker_task / _running_tasks / _queue._unfinished_tasks internals.

Root Cause

The _worker_task exemption is required because the worker loop awaits queue.get() forever; otherwise the drain would never idle. This means we're coupled to a private attribute name — a LiteLLM refactor would silently regress our users' telemetry.

Fix Action

Fix / Workaround

LiteLLM dispatches success/failure callbacks via asyncio.create_task(...) (see litellm/utils.py around _client_async_logging_helper) into the GLOBAL_LOGGING_WORKER queue. The OTEL integration (litellm/integrations/opentelemetry.py) creates the litellm_request and raw_gen_ai_request spans inside those callbacks.

Workaround we've shipped (and would love to delete)

  • LiteLLM integration producing spans in async callbacks: litellm/integrations/opentelemetry.py_handle_success_start_primary_span.
  • Per-request span flag we rely on: USE_OTEL_LITELLM_REQUEST_SPAN=true.
  • Our current workaround (for illustration): https://github.com/Kelet-ai/python-sdk/pull/7

PR fix notes

PR #7: fix: drain LiteLLM callback tasks before agentic_session exit

Description (problem / solution / changelog)

User description

Summary

  • Add _drain_background_logging_tasks in _context.py that polls asyncio.all_tasks() minus the current task and LiteLLM's long-lived _worker_task, waiting for a ~50ms idle window or a 5s hard timeout.
  • Call it from _AgenticSessionContext.__aexit__ before self._exit() so LiteLLM's async callbacks get a turn on the live event loop.
  • Debug-log (not silently swallow) any drain failure or missing LiteLLM internals.

Why

Single-completion scenarios (extended-thinking, simple-reasoning, etc.) were losing their litellm_request and raw_gen_ai_request spans. LiteLLM fires its OTEL callbacks via asyncio.create_task(...) through a background queue worker. When the user's async main() returned, asyncio.run() cancelled those pending tasks before they could create the spans, so the BatchSpanProcessor had nothing to export — the user's telemetry silently dropped.

The drain happens in aexit (not atexit) specifically because we need a live loop to pump the pending task set.

Upstream follow-up

Filed https://github.com/BerriAI/litellm/issues/25978 requesting a public await GLOBAL_LOGGING_WORKER.drain(...) API. Once that lands we can drop the private-attribute coupling (the one remaining probe of _worker_task).

Test plan

  • uv run pytest — 112 SDK tests pass, including 8 new drain tests:
    • idle-return path
    • pending-task wait
    • 5s timeout bound (no-hang proof)
    • LiteLLM _worker_task exemption via injected sys.modules stub
    • missing-LiteLLM-module path
    • graceful degradation when LiteLLM refactors away _worker_task (debug-log assertion)
    • aexit call-order wiring
    • aexit swallows drain exceptions + logs
  • e2e simple-reasoninglitellm_request + raw_gen_ai_request spans present (Bedrock reasoning text captured)
  • e2e flat (3 sibling agents) — all 4 request spans present
  • e2e nested — parent tool-call → sub-agent → tool-response flow intact
  • e2e deep-nesting (3 levels) — all 5 request spans present
  • e2e agent-return — two sequential completions under same agent
  • e2e instruction-override — system-instruction change between calls

Review feedback addressed

  • Moved drain into _context.py (lifecycle logic colocated; no circular-import workaround).
  • asyncio.get_event_loop()get_running_loop().
  • LiteLLM internal surface reduced from 5 private attrs → 1 (_worker_task), with a debug log on each fallback path so refactor regressions are observable.
  • except Exception: pass in aexit now _logger.debug(..., exc_info=True).
  • Drain tuning constants factored out so tests can shrink them without patching asyncio.sleep.

🤖 Generated with Claude Code


Generated description

Below is a concise technical summary of the changes proposed in this PR: Implement <code>_drain_background_logging_tasks</code> so LiteLLM instrumentation callbacks can finish on the live event loop and log any failures before <code>_AgenticSessionContext.aexit</code> calls <code>_exit</code>. Bump the SDK to 1.4.0 to ship the telemetry fix plus its regression tests.

<table><tr><th>Topic</th><th>Details</th><tr><td><a href=https://baz.co/changes/Kelet-ai/python-sdk/7?tool=ast&topic=Other>Other</a> </td><td>Other files<details><summary>Modified files (1)</summary><ul><li>src/kelet/_configure.py</li></ul></details><details><summary>Latest Contributors(2)</summary><table><tr><th>User</th><th>Commit</th><th>Date</th></tr><tr><td>[email protected]</td><td>feat: add LiteLLM and ...</td><td>April 16, 2026</td></tr> <tr><td>[email protected]</td><td>fix(KEL-374): remove d...</td><td>April 07, 2026</td></tr></table></details></td></tr> <tr><td><a href=https://baz.co/changes/Kelet-ai/python-sdk/7?tool=ast&topic=Session+exit+drain>Session exit drain</a> </td><td>Implement <code>_drain_background_logging_tasks</code> and call it from <code>_AgenticSessionContext.__aexit__</code> so pending LiteLLM OTEL callbacks on the live loop complete before <code>_exit</code> runs, while still exempting the long-lived worker task and surfacing drain failures via <code>_logger</code>.<details><summary>Modified files (1)</summary><ul><li>src/kelet/_context.py</li></ul></details><details><summary>Latest Contributors(1)</summary><table><tr><th>User</th><th>Commit</th><th>Date</th></tr><tr><td>[email protected]</td><td>fix(KEL-342): propagat...</td><td>March 23, 2026</td></tr></table></details></td></tr> <tr><td><a href=https://baz.co/changes/Kelet-ai/python-sdk/7?tool=ast&topic=Drain+tests>Drain tests</a> </td><td>Exercise <code>_drain_background_logging_tasks</code> under idle, pending, timeout, LiteLLM exemption, missing module, attribute-absence, baseline snapshot, exception, and cancellation scenarios so the agentic session lifetime reliably flushes telemetry spans.<details><summary>Modified files (1)</summary><ul><li>tests/test_drain.py</li></ul></details><details><summary>Latest Contributors(0)</summary><table><tr><th>User</th><th>Commit</th><th>Date</th></tr></table></details></td></tr> <tr><td><a href=https://baz.co/changes/Kelet-ai/python-sdk/7?tool=ast&topic=Release+bump>Release bump</a> </td><td>Publish the SDK as version 1.4.0 so users receive the new drain behavior and test suite updates.<details><summary>Modified files (1)</summary><ul><li>uv.lock</li></ul></details><details><summary>Latest Contributors(1)</summary><table><tr><th>User</th><th>Commit</th><th>Date</th></tr><tr><td>[email protected]</td><td>feat: add LiteLLM and ...</td><td>April 16, 2026</td></tr></table></details></td></tr></table> This pull request is reviewed by Baz. Review like a pro on <a href=https://baz.co/changes/Kelet-ai/python-sdk/7?tool=ast>(Baz)</a>.

Changed files

  • src/kelet/_configure.py (modified, +1/-3)
  • src/kelet/_context.py (modified, +134/-1)
  • tests/test_drain.py (added, +248/-0)
  • uv.lock (modified, +1/-1)

Code Example

# Poll asyncio.all_tasks() minus (current_task, GLOBAL_LOGGING_WORKER._worker_task)
# until the set is empty for ~50ms, or a 5s hard timeout — whichever first.

---

# Option A — awaitable on the worker itself
await litellm.GLOBAL_LOGGING_WORKER.drain(timeout: float = 5.0) -> None

# Option B — top-level convenience
await litellm.aclose(timeout: float = 5.0) -> None
RAW_BUFFERClick to expand / collapse

Summary

Async OTEL/observability integrations that sit on top of LiteLLM can lose callback spans when the user's asyncio.run() returns before the background logging pipeline has drained. I'd like a public, awaitable drain API so integrators don't have to peek at GLOBAL_LOGGING_WORKER._worker_task / _running_tasks / _queue._unfinished_tasks internals.

Motivation — what actually goes wrong today

LiteLLM dispatches success/failure callbacks via asyncio.create_task(...) (see litellm/utils.py around _client_async_logging_helper) into the GLOBAL_LOGGING_WORKER queue. The OTEL integration (litellm/integrations/opentelemetry.py) creates the litellm_request and raw_gen_ai_request spans inside those callbacks.

For single-completion or extended-thinking scenarios, the callback task is still pending when the user's async main() returns. asyncio.run() then cancels all pending tasks during loop teardown — the spans are never created, and a downstream BatchSpanProcessor has nothing to export. The observability signal silently disappears.

Workaround we've shipped (and would love to delete)

In the Kelet SDK we added a drain called from our session context manager's __aexit__:

# Poll asyncio.all_tasks() minus (current_task, GLOBAL_LOGGING_WORKER._worker_task)
# until the set is empty for ~50ms, or a 5s hard timeout — whichever first.

The _worker_task exemption is required because the worker loop awaits queue.get() forever; otherwise the drain would never idle. This means we're coupled to a private attribute name — a LiteLLM refactor would silently regress our users' telemetry.

Proposed API

# Option A — awaitable on the worker itself
await litellm.GLOBAL_LOGGING_WORKER.drain(timeout: float = 5.0) -> None

# Option B — top-level convenience
await litellm.aclose(timeout: float = 5.0) -> None

Semantics: block until the queue is empty AND every task in _running_tasks has completed (or cancelled with a deadline), OR timeout seconds elapsed. Never hang.

Why a public API beats "just add a sleep"

  • Sleeps either over- or under-shoot — the drain needs to watch queue depth + in-flight callbacks, which LiteLLM already tracks internally.
  • Integrators can't reliably detect the "worker is started" edge without peeking at GLOBAL_LOGGING_WORKER module state.
  • A public hook makes it safe for LiteLLM to refactor queue/task internals without breaking downstream OTEL integrations.

References

  • LiteLLM integration producing spans in async callbacks: litellm/integrations/opentelemetry.py_handle_success_start_primary_span.
  • Per-request span flag we rely on: USE_OTEL_LITELLM_REQUEST_SPAN=true.
  • Our current workaround (for illustration): https://github.com/Kelet-ai/python-sdk/pull/7

Happy to send a PR implementing Option A if you'd like — it's a short addition to logging_worker.py.

Thanks for LiteLLM!

extent analysis

TL;DR

Implement a public, awaitable drain API, such as litellm.GLOBAL_LOGGING_WORKER.drain(timeout: float = 5.0), to ensure that the logging pipeline is fully drained before the program exits.

Guidance

  • Implement the proposed drain API on the GLOBAL_LOGGING_WORKER to block until the queue is empty and all tasks have completed or timed out.
  • Use the drain API in the __aexit__ method of the session context manager to ensure that the logging pipeline is fully drained before the program exits.
  • Consider implementing a top-level convenience function, such as litellm.aclose(timeout: float = 5.0), to simplify the usage of the drain API.
  • Verify that the drain API works correctly by checking that all spans are properly created and exported after implementing the API.

Example

await litellm.GLOBAL_LOGGING_WORKER.drain(timeout=5.0)

This example shows how to use the proposed drain API to block until the logging pipeline is fully drained.

Notes

The implementation of the drain API should ensure that it does not hang indefinitely and should handle cases where the queue is empty and all tasks have completed or timed out.

Recommendation

Apply the workaround by implementing the proposed drain API, as it provides a reliable and safe way to ensure that the logging pipeline is fully drained before the program exits. This approach is better than using a sleep, as it takes into account the queue depth and in-flight callbacks, and provides a public hook for integrators to use.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING