vllm - 💡(How to fix) Fix [Bug]: --max-model-len 0 silently accepted; engine starts cleanly, requests scheduled with negative num_new_tokens [2 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

ValueError: max_model_len must be a positive integer or -1 (auto), got 0.

Fix Action

Fixed

Code Example

num_new_tokens = min(num_new_tokens, self.max_model_len - 1 - request.num_computed_tokens)

---

from vllm.config.model import ModelConfig
  cfg = ModelConfig(model="facebook/opt-125m", max_model_len=0)
  assert cfg.max_model_len == 0

---

INFO [model.py:617] Resolved architecture: OPTForCausalLM
  INFO [model.py:1752] Using max model len 0

---

from vllm.utils.math_utils import cdiv
  max_model_len, num_computed_tokens, num_new_tokens = 0, 0, 1
  num_new_tokens = min(num_new_tokens, max_model_len - 1 - num_computed_tokens)
  assert num_new_tokens == -1
  assert cdiv(num_new_tokens, 16) == 0

---

ValueError: max_model_len must be a positive integer or -1 (auto), got 0.

---

# vllm/config/model.py:200
  max_model_len: int = Field(default=None, ge=-1)

---

# vllm/config/model.py:2119  (_get_and_verify_max_len)
  if max_model_len is None or max_model_len == -1:
      max_model_len = int(derived_max_model_len)              # rewrite branch
      max_model_len = current_platform.check_max_model_len(max_model_len)
  elif max_model_len > derived_max_model_len:
      raise ValueError(...)                                   # over-cap branch
  return int(max_model_len)                                   # passthrough — 0 lands here

---

@field_validator("max_model_len", mode="after")
  @classmethod
  def _check_positive_or_sentinel(cls, v):
      if v is None or v == -1:
          return v
      if v <= 0:
          raise ValueError(
              f"max_model_len must be a positive integer or -1 (auto), got {v}."
          )
      return v
RAW_BUFFERClick to expand / collapse

Your current environment

OS : macOS 26.5 (arm64) Python : 3.12.11 PyTorch : 2.11.0 vLLM : 0.1.dev1+g4438b6e7d (git sha: 4438b6e7d, matches current main) Transformers : 5.9.0

Built from source via VLLM_TARGET_DEVICE=empty pip install -e . against pinned upstream commit 4438b6e7d. The bug is in pure-Python config / scheduler arithmetic; no kernel build needed.

🐛 Describe the bug

--max-model-len 0 is silently accepted by the entire config-validation chain. ModelConfig keeps the field as 0 after __post_init__ and the engine even logs Using max model len 0 as if it's a valid configuration. The 0 then reachesvllm/v1/core/sched/scheduler.py:397:

num_new_tokens = min(num_new_tokens, self.max_model_len - 1 - request.num_computed_tokens)

For a fresh request (num_computed_tokens = 0), the rhs is -1, so num_new_tokens becomes negative. It passes the if num_new_tokens == 0: early-return at line 425 (it's -1, not 0), flows into kv_cache_manager.allocate_slots, and reaches cdiv(-1, block_size) = 0, which returns 0 silently, not raises. The engine schedules the request with negative tokens and zero KV-cache blocks.

Same bug class as #43521 / #43496 (CLI int param with insufficient validation flowing into arithmetic), but a strictly more dangerous failure mode:

#43496 (--block-size 0)#43521 (--hash-block-size 0)This (--max-model-len 0)
Engine startupCrashes (ZeroDivisionError)Crashes (ZeroDivisionError)Succeeds — no warning
When user noticesAt server bootAt server bootPossibly never — silent misbehaviour on every request
Diagnostic difficultyTrivial (stack trace names cdiv)TrivialHardest — server appears healthy
ReachabilityAny modelHybrid Mamba+Attention + prefix cachingAny model

Distinct fix from #43521: max_model_len's -1 is a legitimate auto-derive sentinel, so the check has to whitelist -1 rather than the bare <= 0 predicate that closes #43521.

Reproduction

End-to-end via the public ModelConfig API (3 lines, no model load required beyond the cached HF config):

from vllm.config.model import ModelConfig
cfg = ModelConfig(model="facebook/opt-125m", max_model_len=0)
assert cfg.max_model_len == 0

Output:

INFO [model.py:617] Resolved architecture: OPTForCausalLM
INFO [model.py:1752] Using max model len 0

No exception, no warning. The engine confirms it accepted 0.

The downstream scheduler arithmetic and silent cdiv are pure Python; the same 0 then produces num_new_tokens = -1 at scheduler.py:397 and cdiv(-1, 16) = 0 at the downstream call site:

from vllm.utils.math_utils import cdiv
max_model_len, num_computed_tokens, num_new_tokens = 0, 0, 1
num_new_tokens = min(num_new_tokens, max_model_len - 1 - num_computed_tokens)
assert num_new_tokens == -1
assert cdiv(num_new_tokens, 16) == 0

Expected

A clean rejection at config-construction time:

ValueError: max_model_len must be a positive integer or -1 (auto), got 0.

Why the validation chain misses 0

# vllm/config/model.py:200
max_model_len: int = Field(default=None, ge=-1)

ge=-1 admits -1 (the auto-derive sentinel) but also admits 0.

# vllm/config/model.py:2119  (_get_and_verify_max_len)
if max_model_len is None or max_model_len == -1:
    max_model_len = int(derived_max_model_len)              # rewrite branch
    max_model_len = current_platform.check_max_model_len(max_model_len)
elif max_model_len > derived_max_model_len:
    raise ValueError(...)                                   # over-cap branch
return int(max_model_len)                                   # passthrough — 0 lands here

Three branches: rewrite for None/-1, raise for > derived, passthrough otherwise. 0 matches neither rewrite nor raise.

Additional validators that don't catch 0:

  • _skip_none_validation (model.py:766) only filters None.
  • Platform.check_max_model_len (platforms/interface.py:986) only runs in the rewrite branch (for None/-1), not for the passthrough; the base implementation is a no-op anyway.

Trace (against main @ 4438b6e7d)

StepFile:lineWhatVerified
CLIvllm/engine/arg_utils.py:802--max-model-lenModelConfig.max_model_len. Field is int = Field(default=None, ge=-1).
Pydantic TypeAdapter(Annotated[int, Field(ge=-1)]).validate_python(0) returns 0
Validator chainvllm/config/model.py:1729_get_and_verify_max_len (line 2119)None of the three branches match 0; final
return int(max_model_len) = 0.ModelConfig(model="facebook/opt-125m", max_model_len=0).max_model_len == 0
Schedulervllm/v1/core/sched/scheduler.py:397num_new_tokens = min(_, max_model_len - 1 - num_computed_tokens). With
max_model_len = 0 and num_computed_tokens = 0, the rhs is -1.Pure Python
Silent propagationkv_cache_manager.allocate_slots, vllm/utils/math_utils.py:10 (cdiv)-1 passes the == 0 early-return at
scheduler.py:425; flows into allocate_slots; cdiv(-1, 16) = 0 does not raise.cdiv(-1, 16) == 0

How this was found

ESBMC-Python verification PoC for vLLM (lucasccordeiro/vllm, AUDIT.md Finding #3). ESBMC counterexample at harness/max_model_len_zero_cli_path.py produced max_model_len = 0, num_computed_tokens = 0, num_new_tokens (in) = 1, num_new_tokens (out) = -1. The end-to-end reproducer above empirically confirms.

Proposed fix

Add a field_validator on ModelConfig.max_model_len:

@field_validator("max_model_len", mode="after")
@classmethod
def _check_positive_or_sentinel(cls, v):
    if v is None or v == -1:
        return v
    if v <= 0:
        raise ValueError(
            f"max_model_len must be a positive integer or -1 (auto), got {v}."
        )
    return v

Field-level ge=-1 can't express "positive OR exactly -1" without a custom validator (Pydantic has no ne=0 alongside ge=-1). Happy to PR if useful.

Pre-submission checklist:

  • Searched existing issues for max_model_len 0 and max_model_len validation — no prior report.
  • Same bug class as #43521 / #43496 but separately fileable: max_model_len's -1 sentinel requires a different fix predicate than the <= 0 check that closes #43521.
  • Reproduced end-to-end via the public ModelConfig API (cfg = ModelConfig(model=..., max_model_len=0); assert cfg.max_model_len == 0); engine itself logs using max model len 0.
  • Witnessed independently by an ESBMC counterexample.
  • Trace verified against pinned commit 4438b6e7d.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING