litellm - 💡(How to fix) Fix [Bug]: Vertex Gemini web search streaming crashes on 3/3.1 Flash / Flash Lite models with empty choices chunk [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

litellm.MidStreamFallbackError: litellm.APIConnectionError: list index out of range

Root Cause

Root cause hypothesis

Fix Action

Fixed

Code Example

litellm.MidStreamFallbackError: litellm.APIConnectionError: list index out of range

---

vertex_ai/gemini-3.1-flash-lite-preview
vertex_ai/gemini-3.1-flash-lite
vertex_ai/gemini-3-flash-preview

---

vertex_ai/gemini-3.1-pro-preview

---

litellm.main.acompletion()
  -> litellm.main.completion()
  -> litellm.utils.get_optional_params()
  -> VertexGeminiConfig.map_openai_params()
  -> VertexGeminiConfig._map_web_search_options()
  -> VertexLLM.completion()
  -> VertexLLM.async_streaming()
  -> make_call()
  -> ModelResponseIterator.chunk_parser()
  -> CustomStreamWrapper.chunk_creator()
  -> CustomStreamWrapper.return_processed_chunk_logic()
  -> CustomStreamWrapper.raise_on_model_repetition()

---

VertexGeminiConfig._map_web_search_options()

---

model_response = ModelResponseStream(choices=[], id=response_id)

---

last_content = self.chunks[-1].choices[0].delta.content
second_to_last_content = self.chunks[-2].choices[0].delta.content

---

IndexError: list index out of range

---

MidStreamFallbackError -> APIConnectionError -> IndexError: list index out of range

---

LiteLLM version: 1.83.14
Python: 3.12
Provider: Vertex AI
Endpoint style: Vertex AI streaming / streamGenerateContent

Failing models:
  - vertex_ai/gemini-3.1-flash-lite-preview
  - vertex_ai/gemini-3.1-flash-lite
  - vertex_ai/gemini-3-flash-preview

Working comparison model:
  - vertex_ai/gemini-3.1-pro-preview

---

last_content = self.chunks[-1].choices[0].delta.content
second_to_last_content = self.chunks[-2].choices[0].delta.content

---

### Steps to Reproduce

---

### Relevant log output
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

What happened

When using Vertex AI Gemini with stream=True and web_search_options={}, LiteLLM fails mid-stream with:

litellm.MidStreamFallbackError: litellm.APIConnectionError: list index out of range

Confirmed failing models:

vertex_ai/gemini-3.1-flash-lite-preview
vertex_ai/gemini-3.1-flash-lite
vertex_ai/gemini-3-flash-preview

The same request pattern worked with:

vertex_ai/gemini-3.1-pro-preview

So this appears to affect Gemini Flash / Flash Lite streaming response shapes with web search, rather than all Gemini 3.x models.

Root cause hypothesis

The failure appears to happen inside LiteLLM's streaming pipeline.

Call flow:

litellm.main.acompletion()
  -> litellm.main.completion()
  -> litellm.utils.get_optional_params()
  -> VertexGeminiConfig.map_openai_params()
  -> VertexGeminiConfig._map_web_search_options()
  -> VertexLLM.completion()
  -> VertexLLM.async_streaming()
  -> make_call()
  -> ModelResponseIterator.chunk_parser()
  -> CustomStreamWrapper.chunk_creator()
  -> CustomStreamWrapper.return_processed_chunk_logic()
  -> CustomStreamWrapper.raise_on_model_repetition()

web_search_options={} is converted by:

VertexGeminiConfig._map_web_search_options()

into a Gemini googleSearch tool.

During streaming, ModelResponseIterator.chunk_parser() creates a stream response object with empty choices first:

model_response = ModelResponseStream(choices=[], id=response_id)

and only populates choices when candidate content is present.

For Gemini Flash / Flash Lite models with web search streaming, some metadata-only chunks appear to produce a ModelResponseStream with empty choices.

Later, CustomStreamWrapper.raise_on_model_repetition() assumes choices are non-empty:

last_content = self.chunks[-1].choices[0].delta.content
second_to_last_content = self.chunks[-2].choices[0].delta.content

This raises:

IndexError: list index out of range

LiteLLM then wraps it as APIConnectionError, then MidStreamFallbackError.

Expected behavior

LiteLLM should not fail when a Vertex Gemini streaming chunk has no choices.

Possible expected behavior:

  • skip empty-choice metadata-only chunks, or
  • guard raise_on_model_repetition() against chunks with empty choices, or
  • ensure ModelResponseIterator.chunk_parser() returns None instead of ModelResponseStream(choices=[]) when the chunk has no streamable delta.

Actual behavior

LiteLLM raises mid-stream:

MidStreamFallbackError -> APIConnectionError -> IndexError: list index out of range

Environment

LiteLLM version: 1.83.14
Python: 3.12
Provider: Vertex AI
Endpoint style: Vertex AI streaming / streamGenerateContent

Failing models:
  - vertex_ai/gemini-3.1-flash-lite-preview
  - vertex_ai/gemini-3.1-flash-lite
  - vertex_ai/gemini-3-flash-preview

Working comparison model:
  - vertex_ai/gemini-3.1-pro-preview

I also checked current LiteLLM main around version 1.85.0, and the relevant code still appears to assume non-empty choices in raise_on_model_repetition():

last_content = self.chunks[-1].choices[0].delta.content
second_to_last_content = self.chunks[-2].choices[0].delta.content

Related issues

Possibly related, but not the same failure:


### Steps to Reproduce

```python
async def main() -> None:
    vertex_credentials = os.environ.get("GOOGLE_VERTEX_AI_CREDENTIALS")
    vertex_project = os.environ.get("GOOGLE_VERTEX_AI_PROJECT")

    for model in MODELS:
        print(f"\nTesting {model}")
        response = await acompletion(
            model=model,
            messages=[
                {
                    "role": "user",
                    "content": "Search the web and summarize the latest news about OpenAI.",
                }
            ],
            stream=True,
            web_search_options={},
            stream_options={"include_usage": True},
            vertex_credentials=vertex_credentials,
            vertex_project=vertex_project,
            vertex_location="global",
        )

        async for chunk in response:
            print(chunk)


if __name__ == "__main__":
    asyncio.run(main())

Relevant log output

File ".../litellm/litellm_core_utils/streaming_handler.py", line 1587, in chunk_creator
  return self.return_processed_chunk_logic(...)

File ".../litellm/litellm_core_utils/streaming_handler.py", line 960, in return_processed_chunk_logic
  self.raise_on_model_repetition()

File ".../litellm/litellm_core_utils/streaming_handler.py", line 272, in raise_on_model_repetition
  last_content = self.chunks[-1].choices[0].delta.content
                 ~~~~~~~~~~~~~~~~~~~~~~~^^^

IndexError: list index out of range

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

1.83.14

Twitter / LinkedIn details

@secchinman

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

LiteLLM should not fail when a Vertex Gemini streaming chunk has no choices.

Possible expected behavior:

  • skip empty-choice metadata-only chunks, or
  • guard raise_on_model_repetition() against chunks with empty choices, or
  • ensure ModelResponseIterator.chunk_parser() returns None instead of ModelResponseStream(choices=[]) when the chunk has no streamable delta.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Vertex Gemini web search streaming crashes on 3/3.1 Flash / Flash Lite models with empty choices chunk [1 pull requests]