litellm - ✅(Solved) Fix [Bug]: /v1/audio/speech with stream_format=sse returns raw audio for OpenAI-compatible TTS backend instead of text/event-stream [1 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24301Fetched 2026-04-08 01:13:29
View on GitHub
Comments
1
Participants
1
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
labeled ×3cross-referenced ×1referenced ×1

Error Message

  1. Observe that the response does not behave like OpenAI speech SSE. In my setup, LiteLLM returns HTTP 500 Internal Server Error for this request instead of an SSE stream.

Fix Action

Fixed

PR fix notes

PR #24353: fix: forward stream_format param in audio speech endpoint

Description (problem / solution / changelog)

Summary

Fixes #24301.

Root cause: stream_format from the request body never reached optional_params, so the OpenAI speech call ignored it. The proxy always returned audio/mpeg regardless.

Fix: Forward stream_format into optional_params after provider mapping, and set Content-Type to text/event-stream when stream_format is sse.

Changes

  • litellm/main.py: pass stream_format from kwargs into optional_params
  • litellm/proxy/proxy_server.py: override media_type to text/event-stream for SSE requests

Testing

  • Verified the param flows through to optional_params
  • SSE requests now get the correct content type header

Changed files

  • litellm/main.py (modified, +18/-13)
  • litellm/proxy/proxy_server.py (modified, +215/-4)

Code Example

curl -i -sS "https://<litellm-host>/audio/speech" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <litellm-key>" \
  -d '{
    "input":"Hello from gpt-4o-mini-tts.",
    "voice":"alloy",
    "model":"gpt-4o-mini-tts",
    "response_format":"pcm"
  }'

---

curl -i -sS "https://<litellm-host>/audio/speech" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <litellm-key>" \
  -d '{
    "input":"Hello from gpt-4o-mini-tts.",
    "voice":"alloy",
    "model":"gpt-4o-mini-tts",
    "response_format":"pcm",
    "stream_format":"sse"
  }'

---

Working non-streaming request through LiteLLM:

HTTP/2 200
content-type: audio/mpeg
x-litellm-model-api-base: https://api.openai.com
x-litellm-response-cost: 2.75e-05
x-litellm-version: 1.81.0


Unexpected response for stream_format="sse" through LiteLLM:

HTTP/2 200
content-type: audio/mpeg
x-litellm-version: 1.81.0
<raw audio bytes...>


Expected SSE shape:

HTTP/1.1 200 OK
content-type: text/event-stream; charset=utf-8

data: {"type":"speech.audio.delta","audio":"..."}
data: {"type":"speech.audio.done","usage":{...}}
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

LiteLLM proxy handles normal /v1/audio/speech requests correctly, but stream_format="sse" does not behave correctly.

I verified that a simple non-streaming TTS request through LiteLLM works:

  • model="gpt-4o-mini-tts"
  • no explicit stream_format
  • response: 200 OK
  • x-litellm-model-api-base: https://api.openai.com

However, when testing /v1/audio/speech with stream_format="sse" against an OpenAI-compatible TTS backend behind LiteLLM, the proxy does not preserve SSE behavior.

Instead of returning Content-Type: text/event-stream and data: {...} events, the proxy returns a binary audio response.

Expected behavior:

  • stream_format="sse" should return an SSE stream of audio events. Actual behavior:
  • LiteLLM returns a normal binary audio response instead of SSE.

Steps to Reproduce

  1. Verify that normal TTS works through LiteLLM:
curl -i -sS "https://<litellm-host>/audio/speech" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <litellm-key>" \
  -d '{
    "input":"Hello from gpt-4o-mini-tts.",
    "voice":"alloy",
    "model":"gpt-4o-mini-tts",
    "response_format":"pcm"
  }'
  1. Verify that the same endpoint behaves incorrectly when SSE is requested:
curl -i -sS "https://<litellm-host>/audio/speech" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <litellm-key>" \
  -d '{
    "input":"Hello from gpt-4o-mini-tts.",
    "voice":"alloy",
    "model":"gpt-4o-mini-tts",
    "response_format":"pcm",
    "stream_format":"sse"
  }'
  1. Observe that the response does not behave like OpenAI speech SSE. In my setup, LiteLLM returns HTTP 500 Internal Server Error for this request instead of an SSE stream.

  2. Compare this with an OpenAI-compatible upstream TTS backend that supports SSE directly: when called without LiteLLM, the same /audio/speech request shape returns Content-Type: text/event-stream and data: {"type":"speech.audio.delta", ...} events.

Relevant log output

Working non-streaming request through LiteLLM:

HTTP/2 200
content-type: audio/mpeg
x-litellm-model-api-base: https://api.openai.com
x-litellm-response-cost: 2.75e-05
x-litellm-version: 1.81.0


Unexpected response for stream_format="sse" through LiteLLM:

HTTP/2 200
content-type: audio/mpeg
x-litellm-version: 1.81.0
<raw audio bytes...>


Expected SSE shape:

HTTP/1.1 200 OK
content-type: text/event-stream; charset=utf-8

data: {"type":"speech.audio.delta","audio":"..."}
data: {"type":"speech.audio.done","usage":{...}}

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

v1.81.0

Twitter / LinkedIn details

https://x.com/tg_bomze

extent analysis

Fix Plan

To fix the issue with LiteLLM not preserving SSE behavior when stream_format="sse", we need to modify the LiteLLM proxy to handle SSE requests correctly.

Here are the steps:

  • Update the LiteLLM proxy to check for the stream_format parameter in the request.
  • If stream_format="sse", set the Content-Type header to text/event-stream and return the audio events as SSE events.
  • Use a library like flask or fastapi to handle the SSE events.

Example code using fastapi:

from fastapi import FastAPI, Response
from fastapi.responses import StreamingResponse
import json

app = FastAPI()

@app.post("/audio/speech")
async def speech(request: Request):
    # Check for stream_format parameter
    if request.json().get("stream_format") == "sse":
        # Set Content-Type header to text/event-stream
        headers = {"Content-Type": "text/event-stream"}
        
        # Generate SSE events
        async def event_generator():
            # Yield SSE events
            yield "data: {}\n\n".format(json.dumps({"type": "speech.audio.delta", "audio": "..."}))
            yield "data: {}\n\n".format(json.dumps({"type": "speech.audio.done", "usage": {...}}))
        
        # Return SSE events as a StreamingResponse
        return StreamingResponse(event_generator(), headers=headers)
    else:
        # Handle non-SSE requests
        return Response(content="...", media_type="audio/mpeg")

Verification

To verify that the fix worked, test the /audio/speech endpoint with stream_format="sse" and check that the response is an SSE stream with the correct events.

Example test using curl:

curl -i -sS "https://<litellm-host>/audio/speech" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <litellm-key>" \
  -d '{
    "input":"Hello from gpt-4o-mini-tts.",
    "voice":"alloy",
    "model":"gpt-4o-mini-tts",
    "response_format":"pcm",
    "stream_format":"sse"
  }'

This should return an SSE stream with the correct events:

HTTP/1.1 200 OK
content-type: text/event-stream; charset=utf-8

data: {"type":"speech.audio.delta","audio":"..."}
data: {"type":"speech.audio.done","usage":{...}}

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING