litellm - 💡(How to fix) Fix [Bug]: Issues with stream via custom LLM provider

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When using a custom provider registered via custom_provider_map, streaming chat completions fail with TypeError: 'coroutine' object is not an iterator. Non-streaming requests work correctly. The error originates inside LiteLLM's streaming handler before it reaches the custom handler's streaming code path.

Error Message

Error: 500 litellm.MidStreamFallbackError: litellm.APIConnectionError: 'coroutine' object is not an iterator Traceback (most recent call last): File "/app/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/streaming_handler.py", line 2129, in anext chunk = await asyncio.to_thread(_next_sync_or_exhausted, self.completion_stream) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.13/asyncio/threads.py", line 25, in to_thread return await loop.run_in_executor(None, func_call) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.13/concurrent/futures/thread.py", line 59, in run result = self.fn(*self.args, **self.kwargs) File "/app/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/streaming_handler.py", line 72, in _next_sync_or_exhausted return next(it) TypeError: 'coroutine' object is not an iterator . Received Model Group=my-model Available Model Group Fallbacks=None

Root Cause

When using a custom provider registered via custom_provider_map, streaming chat completions fail with TypeError: 'coroutine' object is not an iterator. Non-streaming requests work correctly. The error originates inside LiteLLM's streaming handler before it reaches the custom handler's streaming code path.

Code Example

litellm_settings:
  custom_provider_map:
    - provider: my-custom-provider
      custom_handler: my_handler.my_custom_llm

model_list:
  - model_name: my-model
    litellm_params:
      model: my-custom-provider/deployment-some-model
      api_base: https://example.com/api
      api_version: 2024-08-01-preview
      api_key: os.environ/MY_API_KEY

---

from typing import Iterator
from litellm import CustomLLM, CustomStreamWrapper
from litellm.types.utils import ModelResponse

def _sse_stream(url, headers, payload, timeout, model) -> Iterator[ModelResponse]:
    with httpx.Client(timeout=timeout) as client:
        with client.stream("POST", url, json=payload, headers=headers) as response:
            for line in response.iter_lines():
                # parse SSE chunks, yield ModelResponse
                yield parsed_chunk

class MyCustomLLM(CustomLLM):
    async def acompletion(self, ..., stream: bool = False, **extra):
        if stream:
            generator = _sse_stream(...)
            return CustomStreamWrapper(
                completion_stream=generator,
                model=ctx.deployment,
                custom_llm_provider="my-custom-provider",
                logging_obj=None,
            )
        # non-stream path works fine
        ...

---

from openai import OpenAI

client = OpenAI(api_key="sk-1234", base_url="http://localhost:4000/v1")

response = client.chat.completions.create(
    model="my-model",
    messages=[{"role": "user", "content": "hello"}],
    stream=True,
)

for chunk in response:
    print(chunk)

---

Error: 500 litellm.MidStreamFallbackError: litellm.APIConnectionError: 'coroutine' object is not an iterator
Traceback (most recent call last):
  File "/app/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/streaming_handler.py", line 2129, in __anext__
    chunk = await asyncio.to_thread(_next_sync_or_exhausted, self.completion_stream)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/app/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/streaming_handler.py", line 72, in _next_sync_or_exhausted
    return next(it)
TypeError: 'coroutine' object is not an iterator
. Received Model Group=my-model
Available Model Group Fallbacks=None

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

LiteLLM MidStreamFallbackError'coroutine' object is not an iterator when streaming from a CustomLLM provider

Summary

When using a custom provider registered via custom_provider_map, streaming chat completions fail with TypeError: 'coroutine' object is not an iterator. Non-streaming requests work correctly. The error originates inside LiteLLM's streaming handler before it reaches the custom handler's streaming code path.

Environment

  • Python version: 3.13
  • Deployment: Azure Container Apps (also reproduced locally via litellm --config config.yaml)
  • Client: LiteLLM Playground UI and OpenAI Python SDK with stream=True

Reproduction

Config

litellm_settings:
  custom_provider_map:
    - provider: my-custom-provider
      custom_handler: my_handler.my_custom_llm

model_list:
  - model_name: my-model
    litellm_params:
      model: my-custom-provider/deployment-some-model
      api_base: https://example.com/api
      api_version: 2024-08-01-preview
      api_key: os.environ/MY_API_KEY

Custom handler (relevant parts)

The CustomLLM subclass implements completion, acompletion, embedding, aembedding, atranscription. For streaming, acompletion returns a CustomStreamWrapper wrapping a sync generator (Iterator[ModelResponse]):

from typing import Iterator
from litellm import CustomLLM, CustomStreamWrapper
from litellm.types.utils import ModelResponse

def _sse_stream(url, headers, payload, timeout, model) -> Iterator[ModelResponse]:
    with httpx.Client(timeout=timeout) as client:
        with client.stream("POST", url, json=payload, headers=headers) as response:
            for line in response.iter_lines():
                # parse SSE chunks, yield ModelResponse
                yield parsed_chunk

class MyCustomLLM(CustomLLM):
    async def acompletion(self, ..., stream: bool = False, **extra):
        if stream:
            generator = _sse_stream(...)
            return CustomStreamWrapper(
                completion_stream=generator,
                model=ctx.deployment,
                custom_llm_provider="my-custom-provider",
                logging_obj=None,
            )
        # non-stream path works fine
        ...

Client request

from openai import OpenAI

client = OpenAI(api_key="sk-1234", base_url="http://localhost:4000/v1")

response = client.chat.completions.create(
    model="my-model",
    messages=[{"role": "user", "content": "hello"}],
    stream=True,
)

for chunk in response:
    print(chunk)

Expected behavior

CustomStreamWrapper iterates over the provided sync generator and yields ModelResponse chunks to the client, matching the behavior documented at [Custom API Server (Custom Format)](https://docs.litellm.ai/docs/providers/custom_llm_server).

Actual behavior

The request fails with HTTP 500 and the following error:

Error: 500 litellm.MidStreamFallbackError: litellm.APIConnectionError: 'coroutine' object is not an iterator
Traceback (most recent call last):
  File "/app/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/streaming_handler.py", line 2129, in __anext__
    chunk = await asyncio.to_thread(_next_sync_or_exhausted, self.completion_stream)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.13/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/app/.venv/lib/python3.13/site-packages/litellm/litellm_core_utils/streaming_handler.py", line 72, in _next_sync_or_exhausted
    return next(it)
TypeError: 'coroutine' object is not an iterator
. Received Model Group=my-model
Available Model Group Fallbacks=None

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

UI Dashboard

What LiteLLM version are you on ?

v1.85.1

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

CustomStreamWrapper iterates over the provided sync generator and yields ModelResponse chunks to the client, matching the behavior documented at [Custom API Server (Custom Format)](https://docs.litellm.ai/docs/providers/custom_llm_server).

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING