litellm - 💡(How to fix) Fix Bug: Google-native streamGenerateContent wraps each SSE event in Python b'...' bytes literal

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When proxying a streaming request to the Google-native /v1beta/models/{model}:streamGenerateContent?alt=sse endpoint, LiteLLM corrupts the SSE output by wrapping each JSON data payload in Python's b'...' bytes literal. Any client using the Google GenAI SDK (@google/genai) fails with a JSON parse error because b'...' is not valid JSON.

Error Message

When proxying a streaming request to the Google-native /v1beta/models/{model}:streamGenerateContent?alt=sse endpoint, LiteLLM corrupts the SSE output by wrapping each JSON data payload in Python's b'...' bytes literal. Any client using the Google GenAI SDK (@google/genai) fails with a JSON parse error because b'...' is not valid JSON.

Root Cause

The bug is in async_data_generator in litellm/proxy/proxy_server.py.

The Google-native streaming handler (AsyncGoogleGenAIGenerateContentStreamingIterator) yields raw bytes from the upstream Gemini SSE response. These bytes flow into async_data_generator, which wraps every chunk in an SSE data: prefix:

# https://github.com/BerriAI/litellm/blob/6ff668c7aa01a73738ed39aa64913a089a183565/litellm/proxy/proxy_server.py#L6532
yield f"data: {chunk}\n\n"

When chunk is bytes, Python's f-string calls str(bytes_obj) under the hood, which produces b'...' — the Python bytes literal representation. So instead of data: {"candidates":...}, the client receives data: b'data: {"candidates":...}'.

The fix is to decode bytes to string before the f-string:

if isinstance(chunk, bytes):
    chunk = chunk.decode("utf-8")
yield f"data: {chunk}\n\n"

Fix Action

Workaround

Use the OpenAI-format endpoint instead:

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "gemini/gemini-2.5-flash",
    "messages": [{"role": "user", "content": "say hello in 5 words"}]
  }'

Code Example

curl -s -N \
  "http://localhost:4000/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse" \
  -H "Content-Type: application/json" \
  -H "x-goog-api-key: sk-your-litellm-key" \
  -d '{
    "contents": [{"parts": [{"text": "say hello in 5 words"}]}]
  }'

---

data: {"candidates": [{"content": {"role": "model","parts": [{"text": "Hello!"}]},"finishReason": "STOP"}],"usageMetadata": {...}}

---

data: b'data: {"candidates": [{"content": {"role": "model","parts": [{"text": "Hello there, how are you?"}]},"finishReason": "STOP"}],"usageMetadata": {...}}'

---

# https://github.com/BerriAI/litellm/blob/6ff668c7aa01a73738ed39aa64913a089a183565/litellm/proxy/proxy_server.py#L6532
yield f"data: {chunk}\n\n"

---

if isinstance(chunk, bytes):
    chunk = chunk.decode("utf-8")
yield f"data: {chunk}\n\n"

---

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "gemini/gemini-2.5-flash",
    "messages": [{"role": "user", "content": "say hello in 5 words"}]
  }'
RAW_BUFFERClick to expand / collapse

Description

When proxying a streaming request to the Google-native /v1beta/models/{model}:streamGenerateContent?alt=sse endpoint, LiteLLM corrupts the SSE output by wrapping each JSON data payload in Python's b'...' bytes literal. Any client using the Google GenAI SDK (@google/genai) fails with a JSON parse error because b'...' is not valid JSON.

Affected version

v1.84.0-rc.1 (Docker)

Steps to Reproduce

Send a streaming request to the Google-native endpoint:

curl -s -N \
  "http://localhost:4000/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse" \
  -H "Content-Type: application/json" \
  -H "x-goog-api-key: sk-your-litellm-key" \
  -d '{
    "contents": [{"parts": [{"text": "say hello in 5 words"}]}]
  }'

Expected Behavior

Clean SSE:

data: {"candidates": [{"content": {"role": "model","parts": [{"text": "Hello!"}]},"finishReason": "STOP"}],"usageMetadata": {...}}

Actual Behavior

Each SSE event wraps the JSON in b'...':

data: b'data: {"candidates": [{"content": {"role": "model","parts": [{"text": "Hello there, how are you?"}]},"finishReason": "STOP"}],"usageMetadata": {...}}'

After stripping the data: prefix, the payload is b'{"candidates": ...}' instead of {"candidates": ...}. JSON.parse fails with Unexpected token 'b'.

Impact

  • @google/genai SDK — fails with SyntaxError: Unexpected token 'b', "b'data: ..." is not valid JSON
  • Any client using the Google-native streaming endpoint through LiteLLM
  • OpenAI-format (/v1/chat/completions) works fine — bug is specific to the Google-native route

Root Cause

The bug is in async_data_generator in litellm/proxy/proxy_server.py.

The Google-native streaming handler (AsyncGoogleGenAIGenerateContentStreamingIterator) yields raw bytes from the upstream Gemini SSE response. These bytes flow into async_data_generator, which wraps every chunk in an SSE data: prefix:

# https://github.com/BerriAI/litellm/blob/6ff668c7aa01a73738ed39aa64913a089a183565/litellm/proxy/proxy_server.py#L6532
yield f"data: {chunk}\n\n"

When chunk is bytes, Python's f-string calls str(bytes_obj) under the hood, which produces b'...' — the Python bytes literal representation. So instead of data: {"candidates":...}, the client receives data: b'data: {"candidates":...}'.

The fix is to decode bytes to string before the f-string:

if isinstance(chunk, bytes):
    chunk = chunk.decode("utf-8")
yield f"data: {chunk}\n\n"

Workaround

Use the OpenAI-format endpoint instead:

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "gemini/gemini-2.5-flash",
    "messages": [{"role": "user", "content": "say hello in 5 words"}]
  }'

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING