litellm - 💡(How to fix) Fix [Feature]: Auto-populate max_input_tokens/max_output_tokens for hosted vLLM/OpenAI-like models [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#27830Fetched 2026-05-14 03:30:23
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
1
Author
Participants
Timeline (top)
labeled ×2cross-referenced ×1
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

The Feature

When LiteLLM proxies self-hosted/openai-compatible backends (especially vLLM), automatically populate model_info.max_input_tokens and model_info.max_output_tokens using upstream model metadata (for vLLM: max_model_len) instead of leaving them null.

Suggested behavior:

  1. On model registration / router refresh / periodic cache refresh, LiteLLM calls upstream model metadata endpoint (/v1/models for OpenAI-compatible backends).
  2. If upstream exposes a context window (max_model_len or equivalent), LiteLLM maps it to max_input_tokens.
  3. If no explicit output limit is available, set max_output_tokens conservatively (or allow configurable derivation strategy).
  4. Persist derived values in runtime model info (and optionally DB if configured), while still allowing explicit model_info overrides to take priority.

This would make /v1/model/info, /model_group/info, routing decisions, and UI model tables much more accurate for hosted models.

Motivation, pitch

Today, hosted vLLM/OpenAI-like models frequently show:

  • max_input_tokens: null
  • max_output_tokens: null

unless admins manually add model_info in config for every model.

This creates operational issues:

  • clients cannot discover real context limits from LiteLLM endpoints,
  • router/token budget checks have less accurate metadata,
  • model onboarding requires repetitive manual metadata management.

Related context:

  • #10096 asked to expose provider model info (including context size/max_model_len), but was auto-closed stale.
  • #13009 also reflects manual model metadata pain for hosted vLLM models.

A built-in discovery + mapping path would reduce configuration drift and improve correctness for self-hosted deployments.

What part of LiteLLM is this about?

Proxy

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

No

Twitter / LinkedIn details

@

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Feature]: Auto-populate max_input_tokens/max_output_tokens for hosted vLLM/OpenAI-like models [1 participants]