litellm - 💡(How to fix) Fix ElevenLabs speech-to-text via LiteLLM does not appear to pass diarization options

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When calling ElevenLabs Speech-to-Text through LiteLLM, diarization-related request parameters do not appear to be honored/passed through to the upstream ElevenLabs /v1/speech-to-text API. This makes transcriptions come back without the expected speaker separation even when diarize=true and related ElevenLabs parameters are included.

Root Cause

Calling the model through LiteLLM produced transcript output that did not include the expected diarization/speaker separation, even though the request included diarization settings. The same integration is being changed to optionally call ElevenLabs directly because the LiteLLM gateway behavior appears to be the source of the missing diarization.

Code Example

model=elevenlabs/scribe_v2
response_format=verbose_json
timestamp_granularities[]=segment
timestamps_granularity=word
diarize=true
tag_audio_events=true
detect_speaker_roles=true

---

num_speakers=<int>
diarization_threshold=<float>
language=<code>
language_code=<code>
RAW_BUFFERClick to expand / collapse

Summary

When calling ElevenLabs Speech-to-Text through LiteLLM, diarization-related request parameters do not appear to be honored/passed through to the upstream ElevenLabs /v1/speech-to-text API. This makes transcriptions come back without the expected speaker separation even when diarize=true and related ElevenLabs parameters are included.

Expected behavior

LiteLLM should forward supported ElevenLabs Speech-to-Text multipart form fields to ElevenLabs, including at least:

  • diarize
  • tag_audio_events
  • timestamps_granularity
  • num_speakers
  • diarization_threshold
  • detect_speaker_roles
  • language_code

ElevenLabs documents these fields for POST https://api.elevenlabs.io/v1/speech-to-text with model_id=scribe_v2.

Docs: https://elevenlabs.io/docs/api-reference/speech-to-text/convert

Actual behavior

Calling the model through LiteLLM produced transcript output that did not include the expected diarization/speaker separation, even though the request included diarization settings. The same integration is being changed to optionally call ElevenLabs directly because the LiteLLM gateway behavior appears to be the source of the missing diarization.

Example request shape

The application sends a multipart request to LiteLLM transcription with fields like:

model=elevenlabs/scribe_v2
response_format=verbose_json
timestamp_granularities[]=segment
timestamps_granularity=word
diarize=true
tag_audio_events=true
detect_speaker_roles=true

Optional fields that should also pass through when present:

num_speakers=<int>
diarization_threshold=<float>
language=<code>
language_code=<code>

Environment/context

  • LiteLLM gateway model: elevenlabs/scribe_v2
  • ElevenLabs direct endpoint expected: POST /v1/speech-to-text
  • ElevenLabs model id: scribe_v2

Request

Please confirm whether LiteLLM's ElevenLabs Speech-to-Text provider currently forwards these extra multipart parameters. If not, please add pass-through support for the documented ElevenLabs transcription options, especially diarize, timestamps_granularity, and detect_speaker_roles.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

LiteLLM should forward supported ElevenLabs Speech-to-Text multipart form fields to ElevenLabs, including at least:

  • diarize
  • tag_audio_events
  • timestamps_granularity
  • num_speakers
  • diarization_threshold
  • detect_speaker_roles
  • language_code

ElevenLabs documents these fields for POST https://api.elevenlabs.io/v1/speech-to-text with model_id=scribe_v2.

Docs: https://elevenlabs.io/docs/api-reference/speech-to-text/convert

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix ElevenLabs speech-to-text via LiteLLM does not appear to pass diarization options