litellm - ✅(Solved) Fix [Bug]: LiteLLMAiohttpTransport can leak recycled aiohttp ClientSession instances [1 pull requests, 3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24230Fetched 2026-04-08 01:09:09
View on GitHub
Comments
3
Participants
2
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
commented ×3labeled ×2cross-referenced ×1

Fix Action

Fix / Workaround

  1. Run this script:
import asyncio
from unittest.mock import patch

with patch(
       "litellm.llms.custom_httpx.aiohttp_transport.asyncio.get_running_loop",
       side_effect=flaky_get_running_loop,
   ):
       new_session = transport._get_valid_client_session()

PR fix notes

PR #24231: fix(aiohttp): reliably close recycled client sessions

Description (problem / solution / changelog)

Relevant issues

Fixes #24230

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link: pending

  • CI run for the last commit
    Link: pending

  • Merge / cherry-pick CI run
    Links: pending

Type

🐛 Bug Fix

Changes

  • Retain strong references to background ClientSession.close() tasks in LiteLLMAiohttpTransport, so recycled aiohttp sessions are not dropped on fire-and-forget close paths.
  • Close replaced sessions across loop-mismatch, loop-inspection-error, and "Session is closed" retry paths.
  • Prune completed background close tasks and log failed background closes.
  • Add regression coverage in tests/test_litellm/test_streaming_connection_cleanup.py and tests/test_litellm/llms/custom_httpx/test_aiohttp_transport.py.
  • Targeted verification run:
poetry run env PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest -q -p pytest_asyncio.plugin tests/test_litellm/llms/custom_httpx/test_aiohttp_transport.py -k "not sock_read_timeout_triggers and not streaming_does_not_timeout_on_total_duration"
poetry run env PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 pytest -q -p pytest_asyncio.plugin tests/test_litellm/test_streaming_connection_cleanup.py tests/test_litellm/llms/custom_httpx/test_aiohttp_cleanup_closed.py
  • Observed results:
    • 17 passed, 2 deselected
    • 11 passed

Changed files

  • litellm/llms/custom_httpx/aiohttp_transport.py (modified, +52/-10)
  • tests/test_litellm/llms/custom_httpx/test_aiohttp_transport.py (modified, +161/-18)
  • tests/test_litellm/test_streaming_connection_cleanup.py (modified, +34/-0)

Code Example

import asyncio
from unittest.mock import patch

import aiohttp
from litellm.llms.custom_httpx.aiohttp_transport import LiteLLMAiohttpTransport


async def main():
    old_session = aiohttp.ClientSession()
    transport = LiteLLMAiohttpTransport(client=lambda: aiohttp.ClientSession())
    transport.client = old_session

    original_get_running_loop = asyncio.get_running_loop
    calls = 0

    def flaky_get_running_loop():
        nonlocal calls
        calls += 1
        if calls == 1:
            raise RuntimeError("simulated loop inspection failure")
        return original_get_running_loop()

    with patch(
        "litellm.llms.custom_httpx.aiohttp_transport.asyncio.get_running_loop",
        side_effect=flaky_get_running_loop,
    ):
        new_session = transport._get_valid_client_session()

    print("recycled session replaced:", new_session is not old_session)
    print("old session closed immediately:", old_session.closed)

    await asyncio.sleep(0)
    print("old session closed after loop tick:", old_session.closed)

    await new_session.close()
    if not old_session.closed:
        await old_session.close()

asyncio.run(main())

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

hen LiteLLM uses the aiohttp transport path, LiteLLMAiohttpTransport may recreate its internal aiohttp.ClientSession during loop-mismatch recovery or after a "Session is closed" retry.

The old session is currently closed via fire-and-forget asyncio.create_task(old_session.close()) without retaining a strong reference to the task, and one fallback branch can replace the session without closing the previous one first. In practice this can leave recycled ClientSession objects unclosed and produce intermittent "Unclosed client session" / connector warnings, along with unnecessary resource leakage.

I expected recycled aiohttp sessions to be closed reliably and deterministically whenever LiteLLM replaces them, across all recovery/retry paths, without relying on best-effort background task scheduling.

Steps to Reproduce

  1. Run this script:
import asyncio
from unittest.mock import patch

import aiohttp
from litellm.llms.custom_httpx.aiohttp_transport import LiteLLMAiohttpTransport


async def main():
   old_session = aiohttp.ClientSession()
   transport = LiteLLMAiohttpTransport(client=lambda: aiohttp.ClientSession())
   transport.client = old_session

   original_get_running_loop = asyncio.get_running_loop
   calls = 0

   def flaky_get_running_loop():
       nonlocal calls
       calls += 1
       if calls == 1:
           raise RuntimeError("simulated loop inspection failure")
       return original_get_running_loop()

   with patch(
       "litellm.llms.custom_httpx.aiohttp_transport.asyncio.get_running_loop",
       side_effect=flaky_get_running_loop,
   ):
       new_session = transport._get_valid_client_session()

   print("recycled session replaced:", new_session is not old_session)
   print("old session closed immediately:", old_session.closed)

   await asyncio.sleep(0)
   print("old session closed after loop tick:", old_session.closed)

   await new_session.close()
   if not old_session.closed:
       await old_session.close()

asyncio.run(main())
  1. On an affected version, the output shows:
    • recycled session replaced: True
    • old session closed after loop tick: False
  2. Expected behavior: when LiteLLM replaces an internal aiohttp.ClientSession during recovery, the previous session should be scheduled and completed for close, so it does not remain unclosed after the next event loop tick.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.2

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue of unclosed aiohttp sessions, we need to ensure that the old session is properly closed before replacing it with a new one. We can achieve this by awaiting the close task instead of using fire-and-forget.

Step-by-Step Solution

  • Modify the LiteLLMAiohttpTransport class to await the close task:
class LiteLLMAiohttpTransport:
    # ...

    async def _replace_session(self, new_session):
        if self.client:
            close_task = asyncio.create_task(self.client.close())
            await close_task  # Await the close task
        self.client = new_session
  • Alternatively, you can use a context manager to ensure the session is closed:
class LiteLLMAiohttpTransport:
    # ...

    async def _replace_session(self, new_session):
        if self.client:
            async with self.client:
                pass  # The session will be closed when exiting the context
        self.client = new_session
  • Update the _get_valid_client_session method to use the _replace_session method:
class LiteLLMAiohttpTransport:
    # ...

    async def _get_valid_client_session(self):
        new_session = await self.client_factory()
        await self._replace_session(new_session)
        return new_session

Verification

To verify the fix, run the provided script again and check the output:

print("old session closed immediately:", old_session.closed)
print("old session closed after loop tick:", old_session.closed)

The output should show that the old session is closed immediately after replacing it with a new one.

Extra Tips

  • Always await the close task when replacing an aiohttp session to prevent resource leakage.
  • Consider using a context manager to ensure the session is closed, as it provides a more explicit and Pythonic way to handle resources.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING