litellm - ✅(Solved) Fix [Feature]: Add aclose() to ResponsesAPIStreamingIterator to prevent upstream connection leak on error [2 pull requests, 1 comments, 2 participants]

thomasfloqs · 2026-04-22T13:46:02Z

[litellm] PR 26273: fix responses api : add aclose to streaming iterator to prevent conection leaks. - Repository: BerriAI/litellm - Author: alighazi288 - Stat… # PR #26273: fix(responses_api): add aclose() to streaming iterator to prevent conection leaks. - Repository: BerriAI/litellm - Author: alighazi288 - State: closed | merged: False - Link: https://github.com/BerriAI/litellm/pull/26273 ## Description (problem / solution / changelog) Ports the fix from PR #21213 to the Responses API pipeline to properly release httpx.Response connections back to the pool on client disconnect. ## Relevant issues Fixes #26250. Ports the pattern from #21213 (same fix previously applied to CustomStreamWrapper for chat-completions streams). ## Pre-Submission checklist **Please complete all items before asking a LiteLLM maintainer to review your PR** - [x] I have Added testing in the [`tests/test_litellm/`](https://github.com/BerriAI/litellm/tree/main/tests/test_litellm) directory, **Adding at least 1 test is a hard requirement** - [see details](https://docs.litellm.ai/docs/extras/contributing_code) *(Note: Tests added to tests/llm_responses_api_testing/ to keep them co-located with the existing base iterator tests).* - [x] My PR passes all unit tests on [`make test-unit`](https://docs.litellm.ai/docs/extras/contributing_code) - [x] My PR's scope is as isolated as possible, it only solves 1 specific problem - [ ] I have requested a Greptile review by commenting `@greptileai` and received a **Confidence Score of at least 4/5** before requesting a maintainer review ## Type 🐛 Bug Fix ✅ Test ## Changes **Context:** When a client disconnects mid-stream from the Responses API path (litellm.aresponses(..., stream=True)), the underlying httpx.Response is never explicitly closed. This exhausts the connection pool over time. The proxy's async\_data\_generator already attempts to call await response.aclose() in a finally block during disconnects, but ResponsesAPIStreamingIterator lacked this method, causing the cleanup to silently no-op. **Technical Implementation:** * **Added aclose() to BaseResponsesAPIStreamingIterator**: Safely releases the httpx.Response back to the connection pool. * **Defensive Execution (anyio.CancelScope)**: Wrapped the network teardown in anyio.CancelScope(shield=True) to prevent Uvicorn/ASGI asyncio.CancelledError signals from aborting the cleanup halfway through. * **Race-Condition Prevention**: Explicitly nulls out self.response before awaiting the network teardown to guarantee idempotency and prevent double-close exceptions. * **Unit Testing**: Added a deterministic, fully mocked test suite verifying successful invocations, idempotency, exception swallowing during teardown, and fallback logic for synchronous .close(). ## Changed files - `litellm/responses/streaming_iterator.py` (modified, +33/-1) - `tests/llm_responses_api_testing/test_base_responses_api_streaming_iterator.py` (modified, +80/-1) --- # PR #26292: fix(responses_api): add aclose() to streaming iterator to prevent con… - Repository: BerriAI/litellm - Author: alighazi288 - State: open | merged: False - Link: https://github.com/BerriAI/litellm/pull/26292 ## Description (problem / solution / changelog) Ports the fix from PR #21213 to the Responses API pipeline to properly release httpx.Response connections back to the pool on client disconnect. ## Relevant issues Fixes #26250. Ports the pattern from #21213 (same fix previously applied to CustomStreamWrapper for chat-completions streams). ## Pre-Submission checklist **Please complete all items before asking a LiteLLM maintainer to review your PR** - [x] I have Added testing in the [`tests/test_litellm/`](https://github.com/BerriAI/litellm/tree/main/tests/test_litellm) directory, **Adding at least 1 test is a hard requirement** - [see details](https://docs.litellm.ai/docs/extras/contributing_code) *(Note: Tests added to tests/llm_responses_api_testing/ to keep them co-located with the existing base iterator tests).* - [x] My PR passes all unit tests on [`make test-unit`](https://docs.litellm.ai/docs/extras/contributing_code) - [x] My PR's scope is as isolated as possible, it only solves 1 specific problem - [x] I have requested a Greptile review by commenting `@greptileai` and received a **Confidence Score of at least 4/5** before requesting a maintainer review ## Type 🐛 Bug Fix ✅ Test ## Changes **Context:** When a client disconnects mid-stream from the Responses API path (litellm.aresponses(..., stream=True)), the underlying httpx.Response is never explicitly closed. This exhausts the connection pool over time. The proxy's async\_data\_generator already attempts to call await response.aclose() in a finally block during disconnects, but ResponsesAPIStreamingIterator lacked this method, causing the cleanup to silently no-op. **Technical Implementation:** * **Added aclose() to BaseResponsesAPIStreamingIterator**: Safely releases the httpx.Response back to the connection pool. * **Defensive Execut

litellm2026-04-22 13:46:02

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26250•Fetched 2026-04-23 07:24:21

View on GitHub

Comments

Participants

Timeline

Reactions

Author

thomasfloqs

Participants

alighazi288

thomasfloqs

Timeline (top)

cross-referenced ×2labeled ×2commented ×1

Error Message

class BaseResponsesAPIStreamingIterator: async def aclose(self) -> None: response = getattr(self, "response", None) if response is None: return with anyio.CancelScope(shield=True): try: await response.aclose() except BaseException as e: verbose_logger.debug( "ResponsesAPIStreamingIterator.aclose: error closing response: %s", e ) finally: self.finished = True

Fix Action

Fixed

Fixed by PR: fix(responses_api): add aclose() to streaming iterator to prevent conection leaks. (https://github.com/BerriAI/litellm/pull/26273)
Fixed by PR: fix(responses_api): add aclose() to streaming iterator to prevent con… (https://github.com/BerriAI/litellm/pull/26292)

PR fix notes

PR #26273: fix(responses_api): add aclose() to streaming iterator to prevent conection leaks.

Repository: BerriAI/litellm
Author: alighazi288
State: closed | merged: False
Link: https://github.com/BerriAI/litellm/pull/26273

Description (problem / solution / changelog)

Ports the fix from PR #21213 to the Responses API pipeline to properly release httpx.Response connections back to the pool on client disconnect.

Relevant issues

Fixes #26250. Ports the pattern from #21213 (same fix previously applied to CustomStreamWrapper for chat-completions streams).

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details (Note: Tests added to tests/llm_responses_api_testing/ to keep them co-located with the existing base iterator tests).
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix ✅ Test

Changes

Context: When a client disconnects mid-stream from the Responses API path (litellm.aresponses(..., stream=True)), the underlying httpx.Response is never explicitly closed. This exhausts the connection pool over time. The proxy's async_data_generator already attempts to call await response.aclose() in a finally block during disconnects, but ResponsesAPIStreamingIterator lacked this method, causing the cleanup to silently no-op.

Technical Implementation:

Added aclose() to BaseResponsesAPIStreamingIterator: Safely releases the httpx.Response back to the connection pool.
Defensive Execution (anyio.CancelScope): Wrapped the network teardown in anyio.CancelScope(shield=True) to prevent Uvicorn/ASGI asyncio.CancelledError signals from aborting the cleanup halfway through.
Race-Condition Prevention: Explicitly nulls out self.response before awaiting the network teardown to guarantee idempotency and prevent double-close exceptions.
Unit Testing: Added a deterministic, fully mocked test suite verifying successful invocations, idempotency, exception swallowing during teardown, and fallback logic for synchronous .close().

Changed files

litellm/responses/streaming_iterator.py (modified, +33/-1)
tests/llm_responses_api_testing/test_base_responses_api_streaming_iterator.py (modified, +80/-1)

PR #26292: fix(responses_api): add aclose() to streaming iterator to prevent con…

Repository: BerriAI/litellm
Author: alighazi288
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/26292

Description (problem / solution / changelog)

Ports the fix from PR #21213 to the Responses API pipeline to properly release httpx.Response connections back to the pool on client disconnect.

Relevant issues

Fixes #26250. Ports the pattern from #21213 (same fix previously applied to CustomStreamWrapper for chat-completions streams).

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details (Note: Tests added to tests/llm_responses_api_testing/ to keep them co-located with the existing base iterator tests).
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix ✅ Test

Changes

Technical Implementation:

Added aclose() to BaseResponsesAPIStreamingIterator: Safely releases the httpx.Response back to the connection pool.
Defensive Execution (anyio.CancelScope): Wrapped the network teardown in anyio.CancelScope(shield=True) to prevent Uvicorn/ASGI asyncio.CancelledError signals from aborting the cleanup halfway through.
Race-Condition Prevention: Explicitly nulls out self.response before awaiting the network teardown to guarantee idempotency and prevent double-close exceptions.
Unit Testing: Added a deterministic, fully mocked test suite verifying successful invocations, idempotency, exception swallowing during teardown, and fallback logic for synchronous .close().

Changed files

litellm/responses/streaming_iterator.py (modified, +33/-1)
tests/llm_responses_api_testing/test_base_responses_api_streaming_iterator.py (modified, +80/-1)

Code Example

class BaseResponsesAPIStreamingIterator:
    async def aclose(self) -> None:
        response = getattr(self, "response", None)
        if response is None:
            return
        with anyio.CancelScope(shield=True):
            try:
                await response.aclose()
            except BaseException as e:
                verbose_logger.debug(
                    "ResponsesAPIStreamingIterator.aclose: error closing response: %s", e
                )
            finally:
                self.finished = True

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

The Feature

Add an aclose() coroutine to litellm.responses.streaming_iterator.BaseResponsesAPIStreamingIterator (and by extension ResponsesAPIStreamingIterator) so that consumers can deterministically release the underlying httpx.Response when the stream is abandoned mid-iteration — matching the pattern introduced for chat-completions streaming in PR #21213).

Proposed implementation (mirrors CustomStreamWrapper.aclose from #21213):

class BaseResponsesAPIStreamingIterator:
    async def aclose(self) -> None:
        response = getattr(self, "response", None)
        if response is None:
            return
        with anyio.CancelScope(shield=True):
            try:
                await response.aclose()
            except BaseException as e:
                verbose_logger.debug(
                    "ResponsesAPIStreamingIterator.aclose: error closing response: %s", e
                )
            finally:
                self.finished = True

Motivation, pitch

PR #21213 fixed exactly this class of bug for CustomStreamWrapper (chat completions):

When a client disconnected mid-stream, the client → proxy connection closed, but the proxy → provider connection was never released back to the pool. Over time this filled the connection pool and this would cause requests to hang.

The same problem exists today for the Responses API streaming path, and as of main (checked 2026-04-22) it remains unfixed:

ResponsesAPIStreamingIterator — returned by Router.aresponses(..., stream=True) / litellm.aresponses(..., stream=True) — does not expose an aclose() method.
Its anext loop has no cleanup path for the underlying httpx.Response on exception or early abandonment.
Downstream consumers therefore cannot follow the try/finally: await stream.aclose() idiom already standard in this codebase (e.g. router.py stream_with_fallbacks, proxy async_data_generator). A naive await response_stream.aclose() raises AttributeError.

What part of LiteLLM is this about?

SDK (litellm Python package)

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

No response

extent analysis

TL;DR

Add an aclose() coroutine to litellm.responses.streaming_iterator.BaseResponsesAPIStreamingIterator to allow consumers to release the underlying httpx.Response when the stream is abandoned mid-iteration.

Guidance

Implement the proposed aclose() method in BaseResponsesAPIStreamingIterator as shown in the issue, which mirrors the pattern introduced in PR #21213 for CustomStreamWrapper.
Verify that the aclose() method is correctly releasing the underlying httpx.Response by checking the connection pool usage over time.
Update downstream consumers to follow the try/finally: await stream.aclose() idiom to ensure proper cleanup.
Test the implementation with various scenarios, including normal completion, exceptions, and early abandonment, to ensure the aclose() method is working as expected.

Example

class BaseResponsesAPIStreamingIterator:
    async def aclose(self) -> None:
        response = getattr(self, "response", None)
        if response is None:
            return
        with anyio.CancelScope(shield=True):
            try:
                await response.aclose()
            except BaseException as e:
                verbose_logger.debug(
                    "ResponsesAPIStreamingIterator.aclose: error closing response: %s", e
                )
            finally:
                self.finished = True

Notes

The implementation assumes that the response attribute is set in the BaseResponsesAPIStreamingIterator instance. If this is not the case, additional modifications may be necessary.

Recommendation

Apply the proposed workaround by implementing the aclose() method in BaseResponsesAPIStreamingIterator, as it provides a deterministic way to release the underlying httpx.Response when the stream is abandoned mid-iteration.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #tokenizer error #prompt formatting #chain error #conversation history

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Feature]: Add aclose() to ResponsesAPIStreamingIterator to prevent upstream connection leak on error [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #26273: fix(responses_api): add aclose() to streaming iterator to prevent conection leaks.

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Type

Changes

Changed files

PR #26292: fix(responses_api): add aclose() to streaming iterator to prevent con…

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Type

Changes

Changed files

Code Example

Check for existing issues

The Feature

Motivation, pitch

What part of LiteLLM is this about?

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING