litellm - ✅(Solved) Fix [Bug]: Streaming responses fail in bursts aligned with httpx client ttl [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24929Fetched 2026-04-08 02:23:40
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
labeled ×3cross-referenced ×1referenced ×1

Fix Action

Fix / Workaround

I have LiteLLM Proxy deployed (self-managed) and use it for streaming completions via /v1/messages endpoint. The client uses anthropic.AsyncAnthropic => proxy => Vertex. However, the client observes periodic large spikes of httpx.ReadTimeout errors. Timeouts spike every 3600s, which matches _DEFAULT_TTL_FOR_HTTPX_CLIENTS. When I monkeypatch this TTL in my proxy deployment to 5400s, the spikers shift to every 90m, conclusively implicating the httpx client TTL mechanism in this bug.

Spikes in client-experienced httpx.ReadTimeout after TTL is monkeypatched to 90m <img width="699" height="338" alt="Image" src="https://github.com/user-attachments/assets/0c03fe24-278a-4d94-8dd9-0dda329412b7" />

A workaround is to monkeypatch _DEFAULT_TTL_FOR_HTTPX_CLIENTS to a very large number

PR fix notes

PR #24962: fix(http_handler): remove del to prevent in-flight streaming failures on TTL eviction

Description (problem / solution / changelog)

Summary

Fixes #24929

Removes __del__ handlers from both AsyncHTTPHandler and HTTPHandler to prevent in-flight streaming responses from being killed when the TTL cache evicts httpx clients.

Root Cause

When _DEFAULT_TTL_FOR_HTTPX_CLIENTS (3600s) expires, the in-memory cache drops the last reference to the handler, triggering __del__. The __del__ handler calls close(), which propagates into the TCP connection pool and kills in-flight streaming responses. Clients experience periodic httpx.ReadTimeout spikes every TTL interval.

The reporter confirmed this by monkeypatching the TTL to 5400s — the spikes shifted to every 90 minutes, conclusively implicating the TTL eviction → __del__close() chain.

Fix

Remove __del__ from both AsyncHTTPHandler and HTTPHandler. The explicit close() in __del__ is unnecessary — Python's GC already cleans up httpx.AsyncClient/httpx.Client resources via their own finalizers. The __del__ handler was actively harmful:

  1. TTL eviction drops the last reference → triggers __del__
  2. __del__ calls close() → closes the TCP connection pool
  3. In-flight streaming responses lose their connection mid-transfer
  4. Clients experience ReadTimeout bursts every TTL interval

The original PR (#2665) that introduced __del__ does not include any justification for the explicit close handler.

Tests

Added 2 tests:

  • test_async_handler_no_close_on_del: verifies deleting an AsyncHTTPHandler does NOT close the underlying httpx client
  • test_sync_handler_no_close_on_del: verifies deleting an HTTPHandler does NOT close the underlying httpx client

Changes

  • litellm/llms/custom_httpx/http_handler.py: Removed __del__ from AsyncHTTPHandler (line 728) and HTTPHandler (line 1185)
  • tests/test_litellm/llms/custom_httpx/test_http_handler.py: Added TestDelHandlerRemoved test class

Changed files

  • litellm/llms/custom_httpx/http_handler.py (modified, +0/-12)
  • tests/test_litellm/llms/custom_httpx/test_http_handler.py (modified, +46/-0)
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

I have LiteLLM Proxy deployed (self-managed) and use it for streaming completions via /v1/messages endpoint. The client uses anthropic.AsyncAnthropic => proxy => Vertex. However, the client observes periodic large spikes of httpx.ReadTimeout errors. Timeouts spike every 3600s, which matches _DEFAULT_TTL_FOR_HTTPX_CLIENTS. When I monkeypatch this TTL in my proxy deployment to 5400s, the spikers shift to every 90m, conclusively implicating the httpx client TTL mechanism in this bug.

Spikes in client-experienced httpx.ReadTimeout with default settings (TTL is 60m) <img width="697" height="339" alt="Image" src="https://github.com/user-attachments/assets/39a2bc35-a524-410d-b19e-a985c6fc9bc3" />

Spikes in client-experienced httpx.ReadTimeout after TTL is monkeypatched to 90m <img width="699" height="338" alt="Image" src="https://github.com/user-attachments/assets/0c03fe24-278a-4d94-8dd9-0dda329412b7" />

Note that we confirmed that the center of the spikes is aligned with exactly the minute that a new revision of the proxy service was deployed. So if the revision was deployed at 11:45pm, we would see spikes at 12:45am, 1:45am, etc.

I also tried an experiment with setting router_settings.client_ttl to 5400 but this did NOT change the behavior, readtimeout was still experienced in intervals of 3600s, confirming that this is NOT the source of the issue.

Researching this bug, Claude and I identified this code snippet: https://github.com/BerriAI/litellm/blob/7e4e4545c5da303ed4586a3231ccb3369d9c63d4/litellm/llms/custom_httpx/http_handler.py#L725-L729 When the httpx client is evicted at the end of the TTL, the close() method is called explictly which propagates all the way into the underlying tcp connection. So in-flight responses have their connection cut out from under them. From the client's perspective, they are left waiting for additional bytes on the streaming response, but never observe any, eventually failing at the end of the read timeout. More discussion at https://gist.github.com/micahjsmith/c990aeb1505639466d50e14f88b09b82

The likely fix is to remove the custom __del__ behavior, which doesn't seem necessary, as all the underlying resources should be cleaned up by Python GC. The PR which introduces this code https://github.com/BerriAI/litellm/pull/2665 does not include any justification for the inclusion of the explicit __del__ handler.

A workaround is to monkeypatch _DEFAULT_TTL_FOR_HTTPX_CLIENTS to a very large number

Steps to Reproduce

  1. Start litellm proxy
  2. Stream completions from /v1/messages under high concurrency for a 3h period (for example). In the client, I use httpx connect/read/write/pool timeouts of 5/30/5/5.
  3. Observe in the client that httpx.ReadTimeout is observed with spikes in occurences at the 1hr and 2hr marks.

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

1.81.12-stable.1

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The likely fix is to remove the custom __del__ behavior in the http_handler.py file to prevent the explicit closure of underlying TCP connections, causing in-flight responses to be cut off and resulting in httpx.ReadTimeout errors.

Guidance

  • Investigate the introduction of the custom __del__ handler in PR #2665 to understand the original intention and potential consequences of removing it.
  • Remove the custom __del__ behavior in http_handler.py to prevent explicit closure of underlying TCP connections.
  • As a temporary workaround, monkeypatch _DEFAULT_TTL_FOR_HTTPX_CLIENTS to a very large number to reduce the frequency of httpx.ReadTimeout errors.
  • Verify the fix by streaming completions from /v1/messages under high concurrency for an extended period and monitoring for httpx.ReadTimeout errors.

Example

No code snippet is provided as the necessary changes are specific to the LiteLLM codebase and require further investigation.

Notes

The provided analysis and suggested fix are based on the research and discussion linked in the issue. However, the root cause and fix may be specific to the LiteLLM version (1.81.12-stable.1) and custom deployment configuration.

Recommendation

Apply the workaround by monkeypatching _DEFAULT_TTL_FOR_HTTPX_CLIENTS to a very large number, as removing the custom __del__ behavior may require further investigation and testing to ensure it does not introduce other issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: Streaming responses fail in bursts aligned with httpx client ttl [1 pull requests, 1 participants]