vllm - 💡(How to fix) Fix [Bug]: --max-model-len 0 silently accepted; engine starts cleanly, requests scheduled with negative num_new

Code Example

num_new_tokens = min(num_new_tokens, self.max_model_len - 1 - request.num_computed_tokens)

---

from vllm.config.model import ModelConfig
  cfg = ModelConfig(model="facebook/opt-125m", max_model_len=0)
  assert cfg.max_model_len == 0

---

INFO [model.py:617] Resolved architecture: OPTForCausalLM
  INFO [model.py:1752] Using max model len 0

---

from vllm.utils.math_utils import cdiv
  max_model_len, num_computed_tokens, num_new_tokens = 0, 0, 1
  num_new_tokens = min(num_new_tokens, max_model_len - 1 - num_computed_tokens)
  assert num_new_tokens == -1
  assert cdiv(num_new_tokens, 16) == 0

---

ValueError: max_model_len must be a positive integer or -1 (auto), got 0.

---

# vllm/config/model.py:200
  max_model_len: int = Field(default=None, ge=-1)

---

# vllm/config/model.py:2119  (_get_and_verify_max_len)
  if max_model_len is None or max_model_len == -1:
      max_model_len = int(derived_max_model_len)              # rewrite branch
      max_model_len = current_platform.check_max_model_len(max_model_len)
  elif max_model_len > derived_max_model_len:
      raise ValueError(...)                                   # over-cap branch
  return int(max_model_len)                                   # passthrough — 0 lands here

---

@field_validator("max_model_len", mode="after")
  @classmethod
  def _check_positive_or_sentinel(cls, v):
      if v is None or v == -1:
          return v
      if v <= 0:
          raise ValueError(
              f"max_model_len must be a positive integer or -1 (auto), got {v}."
          )
      return v

Your current environment

OS : macOS 26.5 (arm64) Python : 3.12.11 PyTorch : 2.11.0 vLLM : 0.1.dev1+g4438b6e7d (git sha: 4438b6e7d, matches current main) Transformers : 5.9.0

Built from source via VLLM_TARGET_DEVICE=empty pip install -e . against pinned upstream commit 4438b6e7d. The bug is in pure-Python config / scheduler arithmetic; no kernel build needed.

🐛 Describe the bug

--max-model-len 0 is silently accepted by the entire config-validation chain. ModelConfig keeps the field as 0 after __post_init__ and the engine even logs Using max model len 0 as if it's a valid configuration. The 0 then reachesvllm/v1/core/sched/scheduler.py:397:

num_new_tokens = min(num_new_tokens, self.max_model_len - 1 - request.num_computed_tokens)

For a fresh request (num_computed_tokens = 0), the rhs is -1, so num_new_tokens becomes negative. It passes the if num_new_tokens == 0: early-return at line 425 (it's -1, not 0), flows into kv_cache_manager.allocate_slots, and reaches cdiv(-1, block_size) = 0, which returns 0 silently, not raises. The engine schedules the request with negative tokens and zero KV-cache blocks.

Same bug class as #43521 / #43496 (CLI int param with insufficient validation flowing into arithmetic), but a strictly more dangerous failure mode:

	#43496 (`--block-size 0`)	#43521 (`--hash-block-size 0`)	This (`--max-model-len 0`)
Engine startup	Crashes (`ZeroDivisionError`)	Crashes (`ZeroDivisionError`)	Succeeds — no warning
When user notices	At server boot	At server boot	Possibly never — silent misbehaviour on every request
Diagnostic difficulty	Trivial (stack trace names `cdiv`)	Trivial	Hardest — server appears healthy
Reachability	Any model	Hybrid Mamba+Attention + prefix caching	Any model

Distinct fix from #43521: max_model_len's -1 is a legitimate auto-derive sentinel, so the check has to whitelist -1 rather than the bare <= 0 predicate that closes #43521.

Reproduction

End-to-end via the public ModelConfig API (3 lines, no model load required beyond the cached HF config):

from vllm.config.model import ModelConfig
cfg = ModelConfig(model="facebook/opt-125m", max_model_len=0)
assert cfg.max_model_len == 0

Output:

INFO [model.py:617] Resolved architecture: OPTForCausalLM
INFO [model.py:1752] Using max model len 0

No exception, no warning. The engine confirms it accepted 0.

The downstream scheduler arithmetic and silent cdiv are pure Python; the same 0 then produces num_new_tokens = -1 at scheduler.py:397 and cdiv(-1, 16) = 0 at the downstream call site:

from vllm.utils.math_utils import cdiv
max_model_len, num_computed_tokens, num_new_tokens = 0, 0, 1
num_new_tokens = min(num_new_tokens, max_model_len - 1 - num_computed_tokens)
assert num_new_tokens == -1
assert cdiv(num_new_tokens, 16) == 0

Expected

A clean rejection at config-construction time:

ValueError: max_model_len must be a positive integer or -1 (auto), got 0.

Why the validation chain misses 0

# vllm/config/model.py:200
max_model_len: int = Field(default=None, ge=-1)

ge=-1 admits -1 (the auto-derive sentinel) but also admits 0.

# vllm/config/model.py:2119  (_get_and_verify_max_len)
if max_model_len is None or max_model_len == -1:
    max_model_len = int(derived_max_model_len)              # rewrite branch
    max_model_len = current_platform.check_max_model_len(max_model_len)
elif max_model_len > derived_max_model_len:
    raise ValueError(...)                                   # over-cap branch
return int(max_model_len)                                   # passthrough — 0 lands here

Three branches: rewrite for None/-1, raise for > derived, passthrough otherwise. 0 matches neither rewrite nor raise.

Additional validators that don't catch 0:

_skip_none_validation (model.py:766) only filters None.
Platform.check_max_model_len (platforms/interface.py:986) only runs in the rewrite branch (for None/-1), not for the passthrough; the base implementation is a no-op anyway.

Trace (against `main` @ `4438b6e7d`)

Step	File:line	What
CLI	`vllm/engine/arg_utils.py:802`	`--max-model-len` → `ModelConfig.max_model_len`. Field is `int = Field(default=None, ge=-1)`.
Pydantic `TypeAdapter(Annotated[int, Field(ge=-1)]).validate_python(0)` returns `0`
Validator chain	`vllm/config/model.py:1729` → `_get_and_verify_max_len` (line 2119)	None of the three branches match `0`; final
`return int(max_model_len) = 0`.	`ModelConfig(model="facebook/opt-125m", max_model_len=0).max_model_len == 0`
Scheduler	`vllm/v1/core/sched/scheduler.py:397`	`num_new_tokens = min(_, max_model_len - 1 - num_computed_tokens)`. With
`max_model_len = 0` and `num_computed_tokens = 0`, the rhs is `-1`.	Pure Python
Silent propagation	`kv_cache_manager.allocate_slots`, `vllm/utils/math_utils.py:10` (`cdiv`)	`-1` passes the `== 0` early-return at
scheduler.py:425; flows into `allocate_slots`; `cdiv(-1, 16) = 0` does not raise.	`cdiv(-1, 16) == 0`

How this was found

ESBMC-Python verification PoC for vLLM (lucasccordeiro/vllm, AUDIT.md Finding #3). ESBMC counterexample at harness/max_model_len_zero_cli_path.py produced max_model_len = 0, num_computed_tokens = 0, num_new_tokens (in) = 1, num_new_tokens (out) = -1. The end-to-end reproducer above empirically confirms.

Proposed fix

Add a field_validator on ModelConfig.max_model_len:

@field_validator("max_model_len", mode="after")
@classmethod
def _check_positive_or_sentinel(cls, v):
    if v is None or v == -1:
        return v
    if v <= 0:
        raise ValueError(
            f"max_model_len must be a positive integer or -1 (auto), got {v}."
        )
    return v

Field-level ge=-1 can't express "positive OR exactly -1" without a custom validator (Pydantic has no ne=0 alongside ge=-1). Happy to PR if useful.

Pre-submission checklist:

Searched existing issues for max_model_len 0 and max_model_len validation — no prior report.
Same bug class as #43521 / #43496 but separately fileable: max_model_len's -1 sentinel requires a different fix predicate than the <= 0 check that closes #43521.
Reproduced end-to-end via the public ModelConfig API (cfg = ModelConfig(model=..., max_model_len=0); assert cfg.max_model_len == 0); engine itself logs using max model len 0.
Witnessed independently by an ESBMC counterexample.
Trace verified against pinned commit 4438b6e7d.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: --max-model-len 0 silently accepted; engine starts cleanly, requests scheduled with negative num_new_tokens [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

Code Example

Your current environment

🐛 Describe the bug

Reproduction

Expected

Why the validation chain misses 0

Trace (against `main` @ `4438b6e7d`)

How this was found

Proposed fix

Before submitting a new issue...

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: --max-model-len 0 silently accepted; engine starts cleanly, requests scheduled with negative num_new_tokens [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

Code Example

Your current environment

🐛 Describe the bug

Reproduction

Expected

Why the validation chain misses 0

Trace (against main @ 4438b6e7d)

How this was found

Proposed fix

Before submitting a new issue...

Still need to ship something?

TRENDING

Trace (against `main` @ `4438b6e7d`)