litellm - 💡(How to fix) Fix [Bug]: LiteLLM triton provider adds ~5s delay to time-to-first-token (TTFT) [1 participants]

litellm2026-04-28 13:04:56

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26699•Fetched 2026-04-29 06:12:42

View on GitHub

Comments

Participants

Timeline

Reactions

Author

djangodesmet

Participants

djangodesmet

Timeline (top)

labeled ×3

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When using LiteLLM with the Triton provider to proxy requests to a Triton server (vLLM backend), every request shows an extra ~5.0s delay to TTFT compared to calling Triton directly.

What I see:

Request via LiteLLM → Triton: TTFT 5.10s
Same request directly to Triton: TTFT 0.10s
The ~5s delay occurs for every model served by Triton when proxied through LiteLLM.
Using LiteLLM → vLLM (with openai provider) shows no meaningful extra delay versus direct requests.

Environment

LiteLLM version: v1.83.7-stable
Triton with vLLM backend
Both LiteLLM and Triton running in same Kubernetes namespace
Fully air-gapped (no Internet access)

Debugging done

Checked LiteLLM logs: no obvious errors or warnings pointing to the delay.
Observed a repeated internal API call that includes header Authorization: Bearer <invalid JWT>; unclear whether this call is related.
Timing is consistently ~5.0s, suggesting a timeout or retry behavior.

Impact

This consistent 5s TTFT addition prevents using LiteLLM + Triton in production.

Requested help

Any pointers where LiteLLM might add a fixed ~5s wait (timeouts, retries, health checks, auth calls)?
Guidance on which additional logs/traces or configuration settings would be most useful to capture next

Steps to Reproduce

Make LiteLLM request to model hosted in triton via triton provider

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.83.7-stable

Twitter / LinkedIn details

No response

extent analysis

TL;DR

Investigate and adjust the timeout or retry settings in LiteLLM's Triton provider configuration to mitigate the consistent 5-second delay.

Guidance

Review LiteLLM's configuration for any fixed timeouts or retry mechanisms that could be introducing the 5-second delay, particularly in the Triton provider settings.
Enable more detailed logging in LiteLLM to capture the internal API call with the Authorization: Bearer <invalid JWT> header and determine its relation to the delay.
Compare the configuration and logs of the LiteLLM → vLLM (with OpenAI provider) setup, which shows no significant delay, to identify potential differences in timeout or retry settings.
Consider capturing network traffic or API call traces between LiteLLM and Triton to further diagnose the delay.

Example

No specific code snippet can be provided without more context on LiteLLM's configuration or internal API calls.

Notes

The delay seems to be specific to the Triton provider and not present with the OpenAI provider, suggesting a provider-specific configuration issue. Further investigation into LiteLLM's configuration and logging is necessary to pinpoint the cause.

Recommendation

Apply workaround: Adjust the timeout or retry settings in the Triton provider configuration to reduce or eliminate the delay, as this seems to be the most likely cause based on the consistent timing of the delay.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #retrieval issue #search optimization #API routing #API middleware

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: LiteLLM triton provider adds ~5s delay to time-to-first-token (TTFT) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: LiteLLM triton provider adds ~5s delay to time-to-first-token (TTFT) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING