litellm - ✅(Solved) Fix Feature request: public API to drain GLOBAL_LOGGING_WORKER before process/loop exit [1 pull requests, 1 participants]

AlmogBaku · 2026-04-17T20:32:12Z

[litellm] Async OTEL/observability integrations that sit on top of LiteLLM can lose callback spans when the user's asyncio.run returns before the background lo… Async OTEL/observability integrations that sit on top of LiteLLM can lose callback spans when the user's ``asyncio.run()`` returns before the background logging pipeline has drained. I'd like a public, awaitable drain API so integrators don't have to peek at ``GLOBAL_LOGGING_WORKER._worker_task`` / ``_running_tasks`` / ``_queue._unfinished_tasks`` internals. # PR #7: fix: drain LiteLLM callback tasks before agentic_session exit - Repository: Kelet-ai/python-sdk - Author: AlmogBaku - State: closed | merged: True - Link: https://github.com/Kelet-ai/python-sdk/pull/7 ## Description (problem / solution / changelog) # User description ## Summary - Add `_drain_background_logging_tasks` in `_context.py` that polls `asyncio.all_tasks()` minus the current task and LiteLLM's long-lived `_worker_task`, waiting for a ~50ms idle window or a 5s hard timeout. - Call it from `_AgenticSessionContext.__aexit__` **before** `self._exit()` so LiteLLM's async callbacks get a turn on the live event loop. - Debug-log (not silently swallow) any drain failure or missing LiteLLM internals. ## Why Single-completion scenarios (extended-thinking, `simple-reasoning`, etc.) were losing their `litellm_request` and `raw_gen_ai_request` spans. LiteLLM fires its OTEL callbacks via `asyncio.create_task(...)` through a background queue worker. When the user's async `main()` returned, `asyncio.run()` cancelled those pending tasks before they could create the spans, so the BatchSpanProcessor had nothing to export — the user's telemetry silently dropped. The drain happens in aexit (not `atexit`) specifically because we need a live loop to pump the pending task set. ## Upstream follow-up Filed https://github.com/BerriAI/litellm/issues/25978 requesting a public `await GLOBAL_LOGGING_WORKER.drain(...)` API. Once that lands we can drop the private-attribute coupling (the one remaining probe of `_worker_task`). ## Test plan - [x] `uv run pytest` — 112 SDK tests pass, including 8 new drain tests: - idle-return path - pending-task wait - 5s timeout bound (no-hang proof) - LiteLLM `_worker_task` exemption via injected `sys.modules` stub - missing-LiteLLM-module path - graceful degradation when LiteLLM refactors away `_worker_task` (debug-log assertion) - aexit call-order wiring - aexit swallows drain exceptions + logs - [x] e2e `simple-reasoning` — `litellm_request` + `raw_gen_ai_request` spans present (Bedrock reasoning text captured) - [x] e2e `flat` (3 sibling agents) — all 4 request spans present - [x] e2e `nested` — parent tool-call → sub-agent → tool-response flow intact - [x] e2e `deep-nesting` (3 levels) — all 5 request spans present - [x] e2e `agent-return` — two sequential completions under same agent - [x] e2e `instruction-override` — system-instruction change between calls ## Review feedback addressed - Moved drain into `_context.py` (lifecycle logic colocated; no circular-import workaround). - `asyncio.get_event_loop()` → `get_running_loop()`. - LiteLLM internal surface reduced from 5 private attrs → 1 (`_worker_task`), with a debug log on each fallback path so refactor regressions are observable. - `except Exception: pass` in aexit now `_logger.debug(..., exc_info=True)`. - Drain tuning constants factored out so tests can shrink them without patching `asyncio.sleep`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- # Generated description Below is a concise technical summary of the changes proposed in this PR: Implement _drain_background_logging_tasks so LiteLLM instrumentation callbacks can finish on the live event loop and log any failures before _AgenticSessionContext.__aexit__ calls _exit . Bump the SDK to 1.4.0 to ship the telemetry fix plus its regression tests. Topic Details Other Other files Modified files (1) src/kelet/_configure.py Latest Contributors(2) User Commit Date almog.baku@gmail.com feat: add LiteLLM and ... April 16, 2026 almog@kelet.ai fix(KEL-374): remove d... April 07, 2026 Session exit drain Implement _drain_background_logging_tasks and call it from _AgenticSessionContext.__aexit__ so pending LiteLLM OTEL callbacks on the live loop complete before _exit runs, while still exempting the long-lived worker task and surfacing drain failures via _logger . Modified files (1) src/kelet/_cont

litellm2026-04-17 20:32:12

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25978•Fetched 2026-04-18 05:52:40

View on GitHub

Comments

Participants

Timeline

Reactions

Author

AlmogBaku

Participants

AlmogBaku

Timeline (top)

cross-referenced ×1

Async OTEL/observability integrations that sit on top of LiteLLM can lose callback spans when the user's asyncio.run() returns before the background logging pipeline has drained. I'd like a public, awaitable drain API so integrators don't have to peek at GLOBAL_LOGGING_WORKER._worker_task / _running_tasks / _queue._unfinished_tasks internals.

Root Cause

The _worker_task exemption is required because the worker loop awaits queue.get() forever; otherwise the drain would never idle. This means we're coupled to a private attribute name — a LiteLLM refactor would silently regress our users' telemetry.

Fix Action

Fix / Workaround

LiteLLM dispatches success/failure callbacks via asyncio.create_task(...) (see litellm/utils.py around _client_async_logging_helper) into the GLOBAL_LOGGING_WORKER queue. The OTEL integration (litellm/integrations/opentelemetry.py) creates the litellm_request and raw_gen_ai_request spans inside those callbacks.

Workaround we've shipped (and would love to delete)

LiteLLM integration producing spans in async callbacks: litellm/integrations/opentelemetry.py → _handle_success → _start_primary_span.
Per-request span flag we rely on: USE_OTEL_LITELLM_REQUEST_SPAN=true.
Our current workaround (for illustration): https://github.com/Kelet-ai/python-sdk/pull/7

PR fix notes

PR #7: fix: drain LiteLLM callback tasks before agentic_session exit

Repository: Kelet-ai/python-sdk
Author: AlmogBaku
State: closed | merged: True
Link: https://github.com/Kelet-ai/python-sdk/pull/7

Description (problem / solution / changelog)

User description

Summary

Add _drain_background_logging_tasks in _context.py that polls asyncio.all_tasks() minus the current task and LiteLLM's long-lived _worker_task, waiting for a ~50ms idle window or a 5s hard timeout.
Call it from _AgenticSessionContext.__aexit__ before self._exit() so LiteLLM's async callbacks get a turn on the live event loop.
Debug-log (not silently swallow) any drain failure or missing LiteLLM internals.

Why

Single-completion scenarios (extended-thinking, simple-reasoning, etc.) were losing their litellm_request and raw_gen_ai_request spans. LiteLLM fires its OTEL callbacks via asyncio.create_task(...) through a background queue worker. When the user's async main() returned, asyncio.run() cancelled those pending tasks before they could create the spans, so the BatchSpanProcessor had nothing to export — the user's telemetry silently dropped.

The drain happens in aexit (not atexit) specifically because we need a live loop to pump the pending task set.

Upstream follow-up

Filed https://github.com/BerriAI/litellm/issues/25978 requesting a public await GLOBAL_LOGGING_WORKER.drain(...) API. Once that lands we can drop the private-attribute coupling (the one remaining probe of _worker_task).

Test plan

uv run pytest — 112 SDK tests pass, including 8 new drain tests:
- idle-return path
- pending-task wait
- 5s timeout bound (no-hang proof)
- LiteLLM _worker_task exemption via injected sys.modules stub
- missing-LiteLLM-module path
- graceful degradation when LiteLLM refactors away _worker_task (debug-log assertion)
- aexit call-order wiring
- aexit swallows drain exceptions + logs
e2e simple-reasoning — litellm_request + raw_gen_ai_request spans present (Bedrock reasoning text captured)
e2e flat (3 sibling agents) — all 4 request spans present
e2e nested — parent tool-call → sub-agent → tool-response flow intact
e2e deep-nesting (3 levels) — all 5 request spans present
e2e agent-return — two sequential completions under same agent
e2e instruction-override — system-instruction change between calls

Review feedback addressed

Moved drain into _context.py (lifecycle logic colocated; no circular-import workaround).
asyncio.get_event_loop() → get_running_loop().
LiteLLM internal surface reduced from 5 private attrs → 1 (_worker_task), with a debug log on each fallback path so refactor regressions are observable.
except Exception: pass in aexit now _logger.debug(..., exc_info=True).
Drain tuning constants factored out so tests can shrink them without patching asyncio.sleep.

🤖 Generated with Claude Code

Generated description

Below is a concise technical summary of the changes proposed in this PR: Implement <code>_drain_background_logging_tasks</code> so LiteLLM instrumentation callbacks can finish on the live event loop and log any failures before <code>_AgenticSessionContext.aexit</code> calls <code>_exit</code>. Bump the SDK to 1.4.0 to ship the telemetry fix plus its regression tests.

<table><tr><th>Topic</th><th>Details</th><tr><td><a href=https://baz.co/changes/Kelet-ai/python-sdk/7?tool=ast&topic=Other>Other</a> </td><td>Other files<details><summary>Modified files (1)</summary><ul><li>src/kelet/_configure.py</li></ul></details><details><summary>Latest Contributors(2)</summary><table><tr><th>User</th><th>Commit</th><th>Date</th></tr><tr><td>[email protected]</td><td>feat: add LiteLLM and ...</td><td>April 16, 2026</td></tr> <tr><td>[email protected]</td><td>fix(KEL-374): remove d...</td><td>April 07, 2026</td></tr></table></details></td></tr> <tr><td><a href=https://baz.co/changes/Kelet-ai/python-sdk/7?tool=ast&topic=Session+exit+drain>Session exit drain</a> </td><td>Implement <code>_drain_background_logging_tasks</code> and call it from <code>_AgenticSessionContext.__aexit__</code> so pending LiteLLM OTEL callbacks on the live loop complete before <code>_exit</code> runs, while still exempting the long-lived worker task and surfacing drain failures via <code>_logger</code>.<details><summary>Modified files (1)</summary><ul><li>src/kelet/_context.py</li></ul></details><details><summary>Latest Contributors(1)</summary><table><tr><th>User</th><th>Commit</th><th>Date</th></tr><tr><td>[email protected]</td><td>fix(KEL-342): propagat...</td><td>March 23, 2026</td></tr></table></details></td></tr> <tr><td><a href=https://baz.co/changes/Kelet-ai/python-sdk/7?tool=ast&topic=Drain+tests>Drain tests</a> </td><td>Exercise <code>_drain_background_logging_tasks</code> under idle, pending, timeout, LiteLLM exemption, missing module, attribute-absence, baseline snapshot, exception, and cancellation scenarios so the agentic session lifetime reliably flushes telemetry spans.<details><summary>Modified files (1)</summary><ul><li>tests/test_drain.py</li></ul></details><details><summary>Latest Contributors(0)</summary><table><tr><th>User</th><th>Commit</th><th>Date</th></tr></table></details></td></tr> <tr><td><a href=https://baz.co/changes/Kelet-ai/python-sdk/7?tool=ast&topic=Release+bump>Release bump</a> </td><td>Publish the SDK as version 1.4.0 so users receive the new drain behavior and test suite updates.<details><summary>Modified files (1)</summary><ul><li>uv.lock</li></ul></details><details><summary>Latest Contributors(1)</summary><table><tr><th>User</th><th>Commit</th><th>Date</th></tr><tr><td>[email protected]</td><td>feat: add LiteLLM and ...</td><td>April 16, 2026</td></tr></table></details></td></tr></table> This pull request is reviewed by Baz. Review like a pro on <a href=https://baz.co/changes/Kelet-ai/python-sdk/7?tool=ast>(Baz)</a>.

Changed files

src/kelet/_configure.py (modified, +1/-3)
src/kelet/_context.py (modified, +134/-1)
tests/test_drain.py (added, +248/-0)
uv.lock (modified, +1/-1)

Code Example

# Poll asyncio.all_tasks() minus (current_task, GLOBAL_LOGGING_WORKER._worker_task)
# until the set is empty for ~50ms, or a 5s hard timeout — whichever first.

---

# Option A — awaitable on the worker itself
await litellm.GLOBAL_LOGGING_WORKER.drain(timeout: float = 5.0) -> None

# Option B — top-level convenience
await litellm.aclose(timeout: float = 5.0) -> None

RAW_BUFFERClick to expand / collapse

Summary

Motivation — what actually goes wrong today

For single-completion or extended-thinking scenarios, the callback task is still pending when the user's async main() returns. asyncio.run() then cancels all pending tasks during loop teardown — the spans are never created, and a downstream BatchSpanProcessor has nothing to export. The observability signal silently disappears.

Workaround we've shipped (and would love to delete)

In the Kelet SDK we added a drain called from our session context manager's __aexit__:

# Poll asyncio.all_tasks() minus (current_task, GLOBAL_LOGGING_WORKER._worker_task)
# until the set is empty for ~50ms, or a 5s hard timeout — whichever first.

Proposed API

# Option A — awaitable on the worker itself
await litellm.GLOBAL_LOGGING_WORKER.drain(timeout: float = 5.0) -> None

# Option B — top-level convenience
await litellm.aclose(timeout: float = 5.0) -> None

Semantics: block until the queue is empty AND every task in _running_tasks has completed (or cancelled with a deadline), OR timeout seconds elapsed. Never hang.

Why a public API beats "just add a sleep"

Sleeps either over- or under-shoot — the drain needs to watch queue depth + in-flight callbacks, which LiteLLM already tracks internally.
Integrators can't reliably detect the "worker is started" edge without peeking at GLOBAL_LOGGING_WORKER module state.
A public hook makes it safe for LiteLLM to refactor queue/task internals without breaking downstream OTEL integrations.

References

LiteLLM integration producing spans in async callbacks: litellm/integrations/opentelemetry.py → _handle_success → _start_primary_span.
Per-request span flag we rely on: USE_OTEL_LITELLM_REQUEST_SPAN=true.
Our current workaround (for illustration): https://github.com/Kelet-ai/python-sdk/pull/7

Happy to send a PR implementing Option A if you'd like — it's a short addition to logging_worker.py.

Thanks for LiteLLM!

extent analysis

TL;DR

Implement a public, awaitable drain API, such as litellm.GLOBAL_LOGGING_WORKER.drain(timeout: float = 5.0), to ensure that the logging pipeline is fully drained before the program exits.

Guidance

Implement the proposed drain API on the GLOBAL_LOGGING_WORKER to block until the queue is empty and all tasks have completed or timed out.
Use the drain API in the __aexit__ method of the session context manager to ensure that the logging pipeline is fully drained before the program exits.
Consider implementing a top-level convenience function, such as litellm.aclose(timeout: float = 5.0), to simplify the usage of the drain API.
Verify that the drain API works correctly by checking that all spans are properly created and exported after implementing the API.

Example

await litellm.GLOBAL_LOGGING_WORKER.drain(timeout=5.0)

This example shows how to use the proposed drain API to block until the logging pipeline is fully drained.

Notes

The implementation of the drain API should ensure that it does not hang indefinitely and should handle cases where the queue is empty and all tasks have completed or timed out.

Recommendation

Apply the workaround by implementing the proposed drain API, as it provides a reliable and safe way to ensure that the logging pipeline is fully drained before the program exits. This approach is better than using a sleep, as it takes into account the queue depth and in-flight callbacks, and provides a public hook for integrators to use.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #API middleware #SSR setup #ISR setup #authentication setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix Feature request: public API to drain GLOBAL_LOGGING_WORKER before process/loop exit [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workaround we've shipped (and would love to delete)

PR fix notes

PR #7: fix: drain LiteLLM callback tasks before agentic_session exit

Description (problem / solution / changelog)

User description

Summary

Why

Upstream follow-up

Test plan

Review feedback addressed

Generated description

Changed files

Code Example

Summary

Motivation — what actually goes wrong today

Workaround we've shipped (and would love to delete)

Proposed API

Why a public API beats "just add a sleep"

References

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING