litellm - 💡(How to fix) Fix [Bug]: max_parallel_requests not reliable with anthropic adapter

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Claude Code's per-turn pattern sends two HTTP POSTs to /v1/messages:

  • POST A: stream: true (speculative streaming)
  • POST B: stream: false (confirmation)

As soon as POST B's response starts arriving, Claude Code cancels POST A mid-stream. The cancellation propagates as asyncio.CancelledError into:

proxy/utils.py :: ProxyLogging.async_post_call_streaming_iterator_hook
        async for chunk in current_response:
            yield chunk
        # ← post-loop code (never reached on cancel)

pre_call_hook already incremented max_parallel_requests (+1) on entry. The success-event decrement (-1) is fired only when the stream completes naturally — via CustomStreamWrapper.__anext__'s terminal StopAsyncIteration branch (line 2208 of litellm_core_utils/streaming_handler.py) or via the deferred-logging path (_fire_deferred_stream_logging in proxy/utils.py). On CancelledError, neither path runs. Net per Claude Code turn: 2 increments (POST A + POST B), 1 decrement (POST B only). Counter grows by 1 per turn.

Fix Action

Workaround

In proxy/utils.py, wrap the iteration in async_post_call_streaming_iterator_hook with try/finally. On the cancellation branch, look up the v3 rate limiter via proxy_logging_obj.proxy_hook_mapping["parallel_request_limiter"] and call async_log_failure_event(...) directly with the user_api_key_hash from litellm_logging_obj.model_call_details["standard_logging_object"]["metadata"]. This fires the same -1 Redis op the rate limiter would have queued via the natural success path. After the workaround: counter balances; no 429s under sustained Claude Code load. Proper fix (suggested) The decrement-on-cancellation gap is general. A cleaner fix at the source would be either:

(a) CustomStreamWrapper.__anext__ registers a cleanup hook (via weakref.finalize or an __del__/aclose) that fires the success/failure callback if the stream didn't complete naturally; or (b) the rate limiter's pre-call increment registers a try/finally-style cleanup with the request context so the decrement is guaranteed regardless of which exit path the request takes.

Either is preferable to fixing it only in the iterator hook, because the same pattern likely affects chunk_processor in proxy/pass_through_endpoints/streaming_handler.py (the passthrough path) and any other streaming exit path.

Code Example

proxy/utils.py :: ProxyLogging.async_post_call_streaming_iterator_hook
        async for chunk in current_response:
            yield chunk
        # ← post-loop code (never reached on cancel)

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Hello,

max_parallel_requests counter in Redis monotonically increases when clients cancel streaming /v1/messages requests mid-stream. Eventually every request hits Limit type: max_parallel_requests. Current limit: N, Remaining: 0.

Root cause

Claude Code's per-turn pattern sends two HTTP POSTs to /v1/messages:

  • POST A: stream: true (speculative streaming)
  • POST B: stream: false (confirmation)

As soon as POST B's response starts arriving, Claude Code cancels POST A mid-stream. The cancellation propagates as asyncio.CancelledError into:

proxy/utils.py :: ProxyLogging.async_post_call_streaming_iterator_hook
        async for chunk in current_response:
            yield chunk
        # ← post-loop code (never reached on cancel)

pre_call_hook already incremented max_parallel_requests (+1) on entry. The success-event decrement (-1) is fired only when the stream completes naturally — via CustomStreamWrapper.__anext__'s terminal StopAsyncIteration branch (line 2208 of litellm_core_utils/streaming_handler.py) or via the deferred-logging path (_fire_deferred_stream_logging in proxy/utils.py). On CancelledError, neither path runs. Net per Claude Code turn: 2 increments (POST A + POST B), 1 decrement (POST B only). Counter grows by 1 per turn.

Workaround

In proxy/utils.py, wrap the iteration in async_post_call_streaming_iterator_hook with try/finally. On the cancellation branch, look up the v3 rate limiter via proxy_logging_obj.proxy_hook_mapping["parallel_request_limiter"] and call async_log_failure_event(...) directly with the user_api_key_hash from litellm_logging_obj.model_call_details["standard_logging_object"]["metadata"]. This fires the same -1 Redis op the rate limiter would have queued via the natural success path. After the workaround: counter balances; no 429s under sustained Claude Code load. Proper fix (suggested) The decrement-on-cancellation gap is general. A cleaner fix at the source would be either:

(a) CustomStreamWrapper.__anext__ registers a cleanup hook (via weakref.finalize or an __del__/aclose) that fires the success/failure callback if the stream didn't complete naturally; or (b) the rate limiter's pre-call increment registers a try/finally-style cleanup with the request context so the decrement is guaranteed regardless of which exit path the request takes.

Either is preferable to fixing it only in the iterator hook, because the same pattern likely affects chunk_processor in proxy/pass_through_endpoints/streaming_handler.py (the passthrough path) and any other streaming exit path.

Steps to Reproduce

  1. Configure a virtual key with max_parallel_requests set (e.g. 4) and a non-Anthropic backend model (we use a Kimi/GPT-OSS endpoint routed via OpenAI-compatible spec).
  2. Connect Claude Code to the proxy as its Anthropic endpoint.
  3. Send any prompt (even just "hello").
  4. Observe Redis: GET '{api_key:HASH}:max_parallel_requests' grows by 1 per Claude Code turn and never decreases. After enough turns: 429s.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.83.14

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: max_parallel_requests not reliable with anthropic adapter