litellm - 💡(How to fix) Fix [Feature]: Auto-populate max_input_tokens/max_output_tokens for hosted vLLM/OpenAI-like models [1 participants]

renne · 2026-05-13T09:42:39Z

[litellm] Check for existing issues - x I have searched the existing issues and checked that my issue is not a duplicate. The Feature When LiteLLM proxies self… ### Check for existing issues - [x] I have searched the existing issues and checked that my issue is not a duplicate. ### The Feature When LiteLLM proxies self-hosted/openai-compatible backends (especially vLLM), automatically populate `model_info.max_input_tokens` and `model_info.max_output_tokens` using upstream model metadata (for vLLM: `max_model_len`) instead of leaving them `null`. Suggested behavior: 1. On model registration / router refresh / periodic cache refresh, LiteLLM calls upstream model metadata endpoint (`/v1/models` for OpenAI-compatible backends). 2. If upstream exposes a context window (`max_model_len` or equivalent), LiteLLM maps it to `max_input_tokens`. 3. If no explicit output limit is available, set `max_output_tokens` conservatively (or allow configurable derivation strategy). 4. Persist derived values in runtime model info (and optionally DB if configured), while still allowing explicit `model_info` overrides to take priority. This would make `/v1/model/info`, `/model_group/info`, routing decisions, and UI model tables much more accurate for hosted models. ### Motivation, pitch Today, hosted vLLM/OpenAI-like models frequently show: - `max_input_tokens: null` - `max_output_tokens: null` unless admins manually add `model_info` in config for every model. This creates operational issues: - clients cannot discover real context limits from LiteLLM endpoints, - router/token budget checks have less accurate metadata, - model onboarding requires repetitive manual metadata management. Related context: - #10096 asked to expose provider model info (including context size/max_model_len), but was auto-closed stale. - #13009 also reflects manual model metadata pain for hosted vLLM models. A built-in discovery + mapping path would reduce configuration drift and improve correctness for self-hosted deployments. ### What part of LiteLLM is this about? Proxy ### LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users? No ### Twitter / LinkedIn details @

litellm2026-05-13 09:42:39

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#27830•Fetched 2026-05-14 03:30:23

View on GitHub

Comments

Participants

Timeline

Reactions

Author

renne

Participants

renne

Timeline (top)

labeled ×2cross-referenced ×1

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

The Feature

When LiteLLM proxies self-hosted/openai-compatible backends (especially vLLM), automatically populate model_info.max_input_tokens and model_info.max_output_tokens using upstream model metadata (for vLLM: max_model_len) instead of leaving them null.

Suggested behavior:

On model registration / router refresh / periodic cache refresh, LiteLLM calls upstream model metadata endpoint (/v1/models for OpenAI-compatible backends).
If upstream exposes a context window (max_model_len or equivalent), LiteLLM maps it to max_input_tokens.
If no explicit output limit is available, set max_output_tokens conservatively (or allow configurable derivation strategy).
Persist derived values in runtime model info (and optionally DB if configured), while still allowing explicit model_info overrides to take priority.

This would make /v1/model/info, /model_group/info, routing decisions, and UI model tables much more accurate for hosted models.

Motivation, pitch

Today, hosted vLLM/OpenAI-like models frequently show:

max_input_tokens: null
max_output_tokens: null

unless admins manually add model_info in config for every model.

This creates operational issues:

clients cannot discover real context limits from LiteLLM endpoints,
router/token budget checks have less accurate metadata,
model onboarding requires repetitive manual metadata management.

Related context:

#10096 asked to expose provider model info (including context size/max_model_len), but was auto-closed stale.
#13009 also reflects manual model metadata pain for hosted vLLM models.

A built-in discovery + mapping path would reduce configuration drift and improve correctness for self-hosted deployments.

What part of LiteLLM is this about?

Proxy

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#runtime error #dependency conflict #environment setup #docker error #permission error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Feature]: Auto-populate max_input_tokens/max_output_tokens for hosted vLLM/OpenAI-like models [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Check for existing issues

The Feature

Motivation, pitch

What part of LiteLLM is this about?

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Feature]: Auto-populate max_input_tokens/max_output_tokens for hosted vLLM/OpenAI-like models [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Check for existing issues

The Feature

Motivation, pitch

What part of LiteLLM is this about?

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Twitter / LinkedIn details

Still need to ship something?

RELATED_DISCOVERY

TRENDING