litellm - 💡(How to fix) Fix [Bug]: LiteLLM triton provider adds ~5s delay to time-to-first-token (TTFT) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26699Fetched 2026-04-29 06:12:42
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
labeled ×3
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When using LiteLLM with the Triton provider to proxy requests to a Triton server (vLLM backend), every request shows an extra ~5.0s delay to TTFT compared to calling Triton directly.

What I see:

  • Request via LiteLLM → Triton: TTFT 5.10s
  • Same request directly to Triton: TTFT 0.10s
  • The ~5s delay occurs for every model served by Triton when proxied through LiteLLM.
  • Using LiteLLM → vLLM (with openai provider) shows no meaningful extra delay versus direct requests.

Environment

  • LiteLLM version: v1.83.7-stable
  • Triton with vLLM backend
  • Both LiteLLM and Triton running in same Kubernetes namespace
  • Fully air-gapped (no Internet access)

Debugging done

  • Checked LiteLLM logs: no obvious errors or warnings pointing to the delay.
  • Observed a repeated internal API call that includes header Authorization: Bearer <invalid JWT>; unclear whether this call is related.
  • Timing is consistently ~5.0s, suggesting a timeout or retry behavior.

Impact

This consistent 5s TTFT addition prevents using LiteLLM + Triton in production.

Requested help

  • Any pointers where LiteLLM might add a fixed ~5s wait (timeouts, retries, health checks, auth calls)?
  • Guidance on which additional logs/traces or configuration settings would be most useful to capture next

Steps to Reproduce

  1. Make LiteLLM request to model hosted in triton via triton provider

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.83.7-stable

Twitter / LinkedIn details

No response

extent analysis

TL;DR

Investigate and adjust the timeout or retry settings in LiteLLM's Triton provider configuration to mitigate the consistent 5-second delay.

Guidance

  • Review LiteLLM's configuration for any fixed timeouts or retry mechanisms that could be introducing the 5-second delay, particularly in the Triton provider settings.
  • Enable more detailed logging in LiteLLM to capture the internal API call with the Authorization: Bearer <invalid JWT> header and determine its relation to the delay.
  • Compare the configuration and logs of the LiteLLM → vLLM (with OpenAI provider) setup, which shows no significant delay, to identify potential differences in timeout or retry settings.
  • Consider capturing network traffic or API call traces between LiteLLM and Triton to further diagnose the delay.

Example

No specific code snippet can be provided without more context on LiteLLM's configuration or internal API calls.

Notes

The delay seems to be specific to the Triton provider and not present with the OpenAI provider, suggesting a provider-specific configuration issue. Further investigation into LiteLLM's configuration and logging is necessary to pinpoint the cause.

Recommendation

Apply workaround: Adjust the timeout or retry settings in the Triton provider configuration to reduce or eliminate the delay, as this seems to be the most likely cause based on the consistent timing of the delay.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: LiteLLM triton provider adds ~5s delay to time-to-first-token (TTFT) [1 participants]