litellm - 💡(How to fix) Fix [Feature]: add support for vLLM realtime endpoint [1 participants]

litellm2026-03-08 10:57:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#23102•Fetched 2026-04-08 00:38:37

View on GitHub

Comments

Participants

Timeline

Reactions

Author

duhow

Participants

duhow

Timeline (top)

labeled ×3

Fix Action

Fix / Workaround

📝 Workaround so far is to setup provider OpenAI with custom model and custom endpoint.

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

The Feature

vLLM supports /v1/realtime https://docs.vllm.ai/en/latest/serving/openai_compatible_server/?h=realtime#realtime-api

📝 Workaround so far is to setup provider OpenAI with custom model and custom endpoint.

Motivation, pitch

Goal is to setup Voxtral Realtime model https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602

What part of LiteLLM is this about?

SDK (litellm Python package)

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To integrate the Voxtral Realtime model with LiteLLM, follow these steps:

Setup a custom provider with the Voxtral Realtime model
Configure the custom endpoint for the model
Update the LiteLLM SDK to use the custom provider and endpoint

Code Changes

Example code snippet to setup the custom provider and endpoint:

import litellm

# Define the custom provider and endpoint
provider = litellm.Provider(
    name="Voxtral Realtime",
    model="mistralai/Voxtral-Mini-4B-Realtime-2602",
    endpoint="https://your-custom-endpoint.com/v1/inference"
)

# Update the LiteLLM SDK to use the custom provider and endpoint
litellm.setup_provider(provider)

Replace "https://your-custom-endpoint.com/v1/inference" with your actual custom endpoint URL.

Verification

To verify the fix, test the LiteLLM SDK with the custom provider and endpoint:

# Test the LiteLLM SDK with the custom provider and endpoint
response = litellm.query("Your test query", provider=provider)
print(response)

If the response is successful, the fix has worked.

Extra Tips

Ensure the custom endpoint is correctly configured to handle the Voxtral Realtime model's requirements.
Test the custom provider and endpoint thoroughly to avoid any regressions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #docker error #permission error #memory optimization #batch processing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Feature]: add support for vLLM realtime endpoint [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

Check for existing issues

The Feature

Motivation, pitch

What part of LiteLLM is this about?

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Feature]: add support for vLLM realtime endpoint [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

Check for existing issues

The Feature

Motivation, pitch

What part of LiteLLM is this about?

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING