vllm - 💡(How to fix) Fix [Bug]: Negative max_num_scheduled_tokens bypasses validation (guard gated behind speculative decoding) → bare AssertionError in the scheduler [1 pull requests]

Q: Expected behavior

A non-positive `max_num_scheduled_tokens` should be rejected with a clear `ValueError` at config-construction time **regardless** of whether speculative decoding is enabled — matching the behavior already present on the spec-decoding path.

vllm2026-05-31 17:09:30

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

import vllm from vllm.config.scheduler import SchedulerConfig from vllm.config.vllm import VllmConfig

1. SchedulerConfig accepts a negative value with no validation.

sched = SchedulerConfig(max_num_scheduled_tokens=-1, max_model_len=2048, is_encoder_decoder=False) print("[1] SchedulerConfig(max_num_scheduled_tokens=-1) ->", sched.max_num_scheduled_tokens, "(no validation)")

2. A real VllmConfig with speculative_config is None leaves the -1 intact

after post_init -> _set_max_num_scheduled_tokens (the <= 0 guard is

gated behind `speculative_config is not None`).

vc = VllmConfig(scheduler_config=sched) print("[2] VllmConfig built; speculative_config is None:", vc.speculative_config is None, "; field ==", vc.scheduler_config.max_num_scheduled_tokens, "(guard skipped)")

3. Scheduler.init truthiness fallback (scheduler.py:104): negative is truthy.

sc = vc.scheduler_config effective = (sc.max_num_scheduled_tokens if sc.max_num_scheduled_tokens else sc.max_num_batched_tokens) print("[3] scheduler.py:104 fallback -> effective =", effective, "(propagates)")

4. token_budget = effective (scheduler.py:348); assert token_budget >= 0 (829).

token_budget = effective try: assert token_budget >= 0 print("[4] assertion held (unexpected)") except AssertionError: print("[4] token_budget =", token_budget, "-> assert token_budget >= 0 FAILS (bare AssertionError)")

Root Cause

Root cause: the `<= 0` guard is gated behind spec decoding

Fix Action

Fixed

Fixed by PR: fix(scheduler): reject non-positive max_num_scheduled_tokens at construction time (https://github.com/vllm-project/vllm/pull/44125)

Code Example

max_num_scheduled_tokens: int | None = None

---

if self.scheduler_config.max_num_scheduled_tokens <= 0:   # vllm.py:1566
    raise ValueError(...)

---

if self.speculative_config is not None:                   # vllm.py:1555
    ...

---

self.max_num_scheduled_tokens = (
    self.scheduler_config.max_num_scheduled_tokens
    if self.scheduler_config.max_num_scheduled_tokens   # truthiness test, not "== 0"
    else self.scheduler_config.max_num_batched_tokens
)

---

token_budget = self.max_num_scheduled_tokens   # = negative
...
assert token_budget >= 0                        # bare AssertionError, no message

---

import vllm
from vllm.config.scheduler import SchedulerConfig
from vllm.config.vllm import VllmConfig

# 1. SchedulerConfig accepts a negative value with no validation.
sched = SchedulerConfig(max_num_scheduled_tokens=-1,
                        max_model_len=2048, is_encoder_decoder=False)
print("[1] SchedulerConfig(max_num_scheduled_tokens=-1) ->",
      sched.max_num_scheduled_tokens, "(no validation)")

# 2. A real VllmConfig with speculative_config is None leaves the -1 intact
#    after __post_init__ -> _set_max_num_scheduled_tokens (the <= 0 guard is
#    gated behind `speculative_config is not None`).
vc = VllmConfig(scheduler_config=sched)
print("[2] VllmConfig built; speculative_config is None:",
      vc.speculative_config is None, "; field ==",
      vc.scheduler_config.max_num_scheduled_tokens, "(guard skipped)")

# 3. Scheduler.__init__ truthiness fallback (scheduler.py:104): negative is truthy.
sc = vc.scheduler_config
effective = (sc.max_num_scheduled_tokens
             if sc.max_num_scheduled_tokens else sc.max_num_batched_tokens)
print("[3] scheduler.py:104 fallback -> effective =", effective, "(propagates)")

# 4. token_budget = effective (scheduler.py:348); assert token_budget >= 0 (829).
token_budget = effective
try:
    assert token_budget >= 0
    print("[4] assertion held (unexpected)")
except AssertionError:
    print("[4] token_budget =", token_budget,
          "-> assert token_budget >= 0 FAILS (bare AssertionError)")

---

[1] SchedulerConfig(max_num_scheduled_tokens=-1) -> -1 (no validation)
[2] VllmConfig built; speculative_config is None: True ; field == -1 (guard skipped)
[3] scheduler.py:104 fallback -> effective = -1 (propagates)
[4] token_budget = -1 -> assert token_budget >= 0 FAILS (bare AssertionError)

RAW_BUFFERClick to expand / collapse

Your current environment

Found by bounded model checking (ESBMC-Python - https://github.com/esbmc/esbmc) and confirmed with a behavioral reproduction on real vLLM config objects (below), in the same vein as #43521 / #43532 / #43842 / #43985. The reachability caveat is stated up front: this field is not CLI-settable, so this is a lower-severity, integrator-facing report rather than a vllm serve crash.

🐛 Describe the bug

SchedulerConfig.max_num_scheduled_tokens accepts a negative value with no validation in the non-speculative-decoding path, and that negative value then becomes the scheduler's token_budget, tripping a bare assert token_budget >= 0 deep inside schedule(). The defect is not a missing guard but a gated one — the validation exists, but only runs under speculative decoding.

Root cause: the `<= 0` guard is gated behind spec decoding

Step 1 — Field (vllm/config/scheduler.py:56):

max_num_scheduled_tokens: int | None = None

No Field(gt=/ge=), and SchedulerConfig.__post_init__ raises ValueError for several other fields but never references this one. Any int (including negatives) survives construction.

Step 2 — The only <= 0 guard (vllm/config/vllm.py), inside _set_max_num_scheduled_tokens:

if self.scheduler_config.max_num_scheduled_tokens <= 0:   # vllm.py:1566
    raise ValueError(...)

…but the entire body of that method is gated:

if self.speculative_config is not None:                   # vllm.py:1555
    ...

So without speculative decoding, the method is a no-op for this field and the <= 0 check never executes.

Step 3 — Truthiness fallback propagates the negative (vllm/v1/core/sched/scheduler.py:104):

self.max_num_scheduled_tokens = (
    self.scheduler_config.max_num_scheduled_tokens
    if self.scheduler_config.max_num_scheduled_tokens   # truthiness test, not "== 0"
    else self.scheduler_config.max_num_batched_tokens
)

0 and None are falsy and fall back to max_num_batched_tokens (safe). A negative value is truthy, so it propagates unchanged instead of falling back.

Step 4 — Bare assert (vllm/v1/core/sched/scheduler.py:348 then 829):

token_budget = self.max_num_scheduled_tokens   # = negative
...
assert token_budget >= 0                        # bare AssertionError, no message

(and assert total_num_scheduled_tokens <= self.max_num_scheduled_tokens at scheduler.py:827).

The net effect is an inconsistency: the same invalid value is a clean ValueError under speculative decoding, but a cryptic internal AssertionError without it — the same bare-AssertionError UX class as #43842.

Reproduction (programmatic; the field is not CLI-wired)

Reproduced against the source tree at 4438b6e7d (vllm.__version__ == 0.1.dev1+g4438b6e7d), CPU-only, no model/GPU needed:

import vllm
from vllm.config.scheduler import SchedulerConfig
from vllm.config.vllm import VllmConfig

# 1. SchedulerConfig accepts a negative value with no validation.
sched = SchedulerConfig(max_num_scheduled_tokens=-1,
                        max_model_len=2048, is_encoder_decoder=False)
print("[1] SchedulerConfig(max_num_scheduled_tokens=-1) ->",
      sched.max_num_scheduled_tokens, "(no validation)")

# 2. A real VllmConfig with speculative_config is None leaves the -1 intact
#    after __post_init__ -> _set_max_num_scheduled_tokens (the <= 0 guard is
#    gated behind `speculative_config is not None`).
vc = VllmConfig(scheduler_config=sched)
print("[2] VllmConfig built; speculative_config is None:",
      vc.speculative_config is None, "; field ==",
      vc.scheduler_config.max_num_scheduled_tokens, "(guard skipped)")

# 3. Scheduler.__init__ truthiness fallback (scheduler.py:104): negative is truthy.
sc = vc.scheduler_config
effective = (sc.max_num_scheduled_tokens
             if sc.max_num_scheduled_tokens else sc.max_num_batched_tokens)
print("[3] scheduler.py:104 fallback -> effective =", effective, "(propagates)")

# 4. token_budget = effective (scheduler.py:348); assert token_budget >= 0 (829).
token_budget = effective
try:
    assert token_budget >= 0
    print("[4] assertion held (unexpected)")
except AssertionError:
    print("[4] token_budget =", token_budget,
          "-> assert token_budget >= 0 FAILS (bare AssertionError)")

Observed output:

[1] SchedulerConfig(max_num_scheduled_tokens=-1) -> -1 (no validation)
[2] VllmConfig built; speculative_config is None: True ; field == -1 (guard skipped)
[3] scheduler.py:104 fallback -> effective = -1 (propagates)
[4] token_budget = -1 -> assert token_budget >= 0 FAILS (bare AssertionError)

Steps 1–2 are behavioral on real vLLM objects; step 3 evaluates the verbatim scheduler.py:104 expression on the resolved config. (Instantiating the full Scheduler/schedule() additionally needs a KVCacheConfig; line 104 only reads self.scheduler_config.max_num_scheduled_tokens, shown negative above.)

Expected behavior

A non-positive max_num_scheduled_tokens should be rejected with a clear ValueError at config-construction time regardless of whether speculative decoding is enabled — matching the behavior already present on the spec-decoding path.

Proposed fix (either)

Add a field constraint: max_num_scheduled_tokens: int | None = Field(default=None, ge=1); or
Ungate the <= 0 check — move it out of the if self.speculative_config is not None: block in _set_max_num_scheduled_tokens so it validates the field on every path.

Either makes the truthiness fallback at scheduler.py:104 safe (only None/0 remain falsy, both intended).

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: Negative max_num_scheduled_tokens bypasses validation (guard gated behind speculative decoding) → bare AssertionError in the scheduler [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

1. SchedulerConfig accepts a negative value with no validation.

2. A real VllmConfig with speculative_config is None leaves the -1 intact

after post_init -> _set_max_num_scheduled_tokens (the <= 0 guard is

gated behind `speculative_config is not None`).

3. Scheduler.init truthiness fallback (scheduler.py:104): negative is truthy.

4. token_budget = effective (scheduler.py:348); assert token_budget >= 0 (829).

Root Cause

Root cause: the `<= 0` guard is gated behind spec decoding

Fix Action

Fixed

Code Example

Your current environment

🐛 Describe the bug

Root cause: the `<= 0` guard is gated behind spec decoding

Reproduction (programmatic; the field is not CLI-wired)

Expected behavior

Proposed fix (either)

Before submitting a new issue...

FAQ

Expected behavior

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: Negative max_num_scheduled_tokens bypasses validation (guard gated behind speculative decoding) → bare AssertionError in the scheduler [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

1. SchedulerConfig accepts a negative value with no validation.

2. A real VllmConfig with speculative_config is None leaves the -1 intact

after post_init -> _set_max_num_scheduled_tokens (the <= 0 guard is

gated behind speculative_config is not None).

3. Scheduler.init truthiness fallback (scheduler.py:104): negative is truthy.

4. token_budget = effective (scheduler.py:348); assert token_budget >= 0 (829).

Root Cause

Root cause: the <= 0 guard is gated behind spec decoding

Fix Action

Fixed

Code Example

Your current environment

🐛 Describe the bug

Root cause: the <= 0 guard is gated behind spec decoding

Reproduction (programmatic; the field is not CLI-wired)

Expected behavior

Proposed fix (either)

Before submitting a new issue...

FAQ

Expected behavior

Still need to ship something?

TRENDING

gated behind `speculative_config is not None`).

Root cause: the `<= 0` guard is gated behind spec decoding

Root cause: the `<= 0` guard is gated behind spec decoding