litellm - ✅(Solved) Fix [Feature]: Add aclose() to ResponsesAPIStreamingIterator to prevent upstream connection leak on error [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26250Fetched 2026-04-23 07:24:21
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Timeline (top)
cross-referenced ×2labeled ×2commented ×1

Error Message

class BaseResponsesAPIStreamingIterator: async def aclose(self) -> None: response = getattr(self, "response", None) if response is None: return with anyio.CancelScope(shield=True): try: await response.aclose() except BaseException as e: verbose_logger.debug( "ResponsesAPIStreamingIterator.aclose: error closing response: %s", e ) finally: self.finished = True

Fix Action

Fixed

PR fix notes

PR #26273: fix(responses_api): add aclose() to streaming iterator to prevent conection leaks.

Description (problem / solution / changelog)

Ports the fix from PR #21213 to the Responses API pipeline to properly release httpx.Response connections back to the pool on client disconnect.

Relevant issues

Fixes #26250. Ports the pattern from #21213 (same fix previously applied to CustomStreamWrapper for chat-completions streams).

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details (Note: Tests added to tests/llm_responses_api_testing/ to keep them co-located with the existing base iterator tests).
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix ✅ Test

Changes

Context: When a client disconnects mid-stream from the Responses API path (litellm.aresponses(..., stream=True)), the underlying httpx.Response is never explicitly closed. This exhausts the connection pool over time. The proxy's async_data_generator already attempts to call await response.aclose() in a finally block during disconnects, but ResponsesAPIStreamingIterator lacked this method, causing the cleanup to silently no-op.

Technical Implementation:

  • Added aclose() to BaseResponsesAPIStreamingIterator: Safely releases the httpx.Response back to the connection pool.

  • Defensive Execution (anyio.CancelScope): Wrapped the network teardown in anyio.CancelScope(shield=True) to prevent Uvicorn/ASGI asyncio.CancelledError signals from aborting the cleanup halfway through.

  • Race-Condition Prevention: Explicitly nulls out self.response before awaiting the network teardown to guarantee idempotency and prevent double-close exceptions.

  • Unit Testing: Added a deterministic, fully mocked test suite verifying successful invocations, idempotency, exception swallowing during teardown, and fallback logic for synchronous .close().

Changed files

  • litellm/responses/streaming_iterator.py (modified, +33/-1)
  • tests/llm_responses_api_testing/test_base_responses_api_streaming_iterator.py (modified, +80/-1)

PR #26292: fix(responses_api): add aclose() to streaming iterator to prevent con…

Description (problem / solution / changelog)

Ports the fix from PR #21213 to the Responses API pipeline to properly release httpx.Response connections back to the pool on client disconnect.

Relevant issues

Fixes #26250. Ports the pattern from #21213 (same fix previously applied to CustomStreamWrapper for chat-completions streams).

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details (Note: Tests added to tests/llm_responses_api_testing/ to keep them co-located with the existing base iterator tests).
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix ✅ Test

Changes

Context: When a client disconnects mid-stream from the Responses API path (litellm.aresponses(..., stream=True)), the underlying httpx.Response is never explicitly closed. This exhausts the connection pool over time. The proxy's async_data_generator already attempts to call await response.aclose() in a finally block during disconnects, but ResponsesAPIStreamingIterator lacked this method, causing the cleanup to silently no-op.

Technical Implementation:

  • Added aclose() to BaseResponsesAPIStreamingIterator: Safely releases the httpx.Response back to the connection pool.

  • Defensive Execution (anyio.CancelScope): Wrapped the network teardown in anyio.CancelScope(shield=True) to prevent Uvicorn/ASGI asyncio.CancelledError signals from aborting the cleanup halfway through.

  • Race-Condition Prevention: Explicitly nulls out self.response before awaiting the network teardown to guarantee idempotency and prevent double-close exceptions.

  • Unit Testing: Added a deterministic, fully mocked test suite verifying successful invocations, idempotency, exception swallowing during teardown, and fallback logic for synchronous .close().

Changed files

  • litellm/responses/streaming_iterator.py (modified, +33/-1)
  • tests/llm_responses_api_testing/test_base_responses_api_streaming_iterator.py (modified, +80/-1)

Code Example

class BaseResponsesAPIStreamingIterator:
    async def aclose(self) -> None:
        response = getattr(self, "response", None)
        if response is None:
            return
        with anyio.CancelScope(shield=True):
            try:
                await response.aclose()
            except BaseException as e:
                verbose_logger.debug(
                    "ResponsesAPIStreamingIterator.aclose: error closing response: %s", e
                )
            finally:
                self.finished = True
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

The Feature

Add an aclose() coroutine to litellm.responses.streaming_iterator.BaseResponsesAPIStreamingIterator (and by extension ResponsesAPIStreamingIterator) so that consumers can deterministically release the underlying httpx.Response when the stream is abandoned mid-iteration — matching the pattern introduced for chat-completions streaming in PR #21213).

Proposed implementation (mirrors CustomStreamWrapper.aclose from #21213):

class BaseResponsesAPIStreamingIterator:
    async def aclose(self) -> None:
        response = getattr(self, "response", None)
        if response is None:
            return
        with anyio.CancelScope(shield=True):
            try:
                await response.aclose()
            except BaseException as e:
                verbose_logger.debug(
                    "ResponsesAPIStreamingIterator.aclose: error closing response: %s", e
                )
            finally:
                self.finished = True

Motivation, pitch

PR #21213 fixed exactly this class of bug for CustomStreamWrapper (chat completions):

When a client disconnected mid-stream, the client → proxy connection closed, but the proxy → provider connection was never released back to the pool. Over time this filled the connection pool and this would cause requests to hang.

The same problem exists today for the Responses API streaming path, and as of main (checked 2026-04-22) it remains unfixed:

  • ResponsesAPIStreamingIterator — returned by Router.aresponses(..., stream=True) / litellm.aresponses(..., stream=True) — does not expose an aclose() method.
  • Its anext loop has no cleanup path for the underlying httpx.Response on exception or early abandonment.
  • Downstream consumers therefore cannot follow the try/finally: await stream.aclose() idiom already standard in this codebase (e.g. router.py stream_with_fallbacks, proxy async_data_generator). A naive await response_stream.aclose() raises AttributeError.

What part of LiteLLM is this about?

SDK (litellm Python package)

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

No

Twitter / LinkedIn details

No response

extent analysis

TL;DR

Add an aclose() coroutine to litellm.responses.streaming_iterator.BaseResponsesAPIStreamingIterator to allow consumers to release the underlying httpx.Response when the stream is abandoned mid-iteration.

Guidance

  • Implement the proposed aclose() method in BaseResponsesAPIStreamingIterator as shown in the issue, which mirrors the pattern introduced in PR #21213 for CustomStreamWrapper.
  • Verify that the aclose() method is correctly releasing the underlying httpx.Response by checking the connection pool usage over time.
  • Update downstream consumers to follow the try/finally: await stream.aclose() idiom to ensure proper cleanup.
  • Test the implementation with various scenarios, including normal completion, exceptions, and early abandonment, to ensure the aclose() method is working as expected.

Example

class BaseResponsesAPIStreamingIterator:
    async def aclose(self) -> None:
        response = getattr(self, "response", None)
        if response is None:
            return
        with anyio.CancelScope(shield=True):
            try:
                await response.aclose()
            except BaseException as e:
                verbose_logger.debug(
                    "ResponsesAPIStreamingIterator.aclose: error closing response: %s", e
                )
            finally:
                self.finished = True

Notes

The implementation assumes that the response attribute is set in the BaseResponsesAPIStreamingIterator instance. If this is not the case, additional modifications may be necessary.

Recommendation

Apply the proposed workaround by implementing the aclose() method in BaseResponsesAPIStreamingIterator, as it provides a deterministic way to release the underlying httpx.Response when the stream is abandoned mid-iteration.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Feature]: Add aclose() to ResponsesAPIStreamingIterator to prevent upstream connection leak on error [2 pull requests, 1 comments, 2 participants]