vllm - 💡(How to fix) Fix [Bug]: Negative max_num_scheduled_tokens bypasses validation (guard gated behind speculative decoding) → bare AssertionError in the scheduler [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

import vllm from vllm.config.scheduler import SchedulerConfig from vllm.config.vllm import VllmConfig

1. SchedulerConfig accepts a negative value with no validation.

sched = SchedulerConfig(max_num_scheduled_tokens=-1, max_model_len=2048, is_encoder_decoder=False) print("[1] SchedulerConfig(max_num_scheduled_tokens=-1) ->", sched.max_num_scheduled_tokens, "(no validation)")

2. A real VllmConfig with speculative_config is None leaves the -1 intact

after post_init -> _set_max_num_scheduled_tokens (the <= 0 guard is

gated behind speculative_config is not None).

vc = VllmConfig(scheduler_config=sched) print("[2] VllmConfig built; speculative_config is None:", vc.speculative_config is None, "; field ==", vc.scheduler_config.max_num_scheduled_tokens, "(guard skipped)")

3. Scheduler.init truthiness fallback (scheduler.py:104): negative is truthy.

sc = vc.scheduler_config effective = (sc.max_num_scheduled_tokens if sc.max_num_scheduled_tokens else sc.max_num_batched_tokens) print("[3] scheduler.py:104 fallback -> effective =", effective, "(propagates)")

4. token_budget = effective (scheduler.py:348); assert token_budget >= 0 (829).

token_budget = effective try: assert token_budget >= 0 print("[4] assertion held (unexpected)") except AssertionError: print("[4] token_budget =", token_budget, "-> assert token_budget >= 0 FAILS (bare AssertionError)")

Root Cause

Root cause: the <= 0 guard is gated behind spec decoding

Fix Action

Fixed

Code Example

max_num_scheduled_tokens: int | None = None

---

if self.scheduler_config.max_num_scheduled_tokens <= 0:   # vllm.py:1566
    raise ValueError(...)

---

if self.speculative_config is not None:                   # vllm.py:1555
    ...

---

self.max_num_scheduled_tokens = (
    self.scheduler_config.max_num_scheduled_tokens
    if self.scheduler_config.max_num_scheduled_tokens   # truthiness test, not "== 0"
    else self.scheduler_config.max_num_batched_tokens
)

---

token_budget = self.max_num_scheduled_tokens   # = negative
...
assert token_budget >= 0                        # bare AssertionError, no message

---

import vllm
from vllm.config.scheduler import SchedulerConfig
from vllm.config.vllm import VllmConfig

# 1. SchedulerConfig accepts a negative value with no validation.
sched = SchedulerConfig(max_num_scheduled_tokens=-1,
                        max_model_len=2048, is_encoder_decoder=False)
print("[1] SchedulerConfig(max_num_scheduled_tokens=-1) ->",
      sched.max_num_scheduled_tokens, "(no validation)")

# 2. A real VllmConfig with speculative_config is None leaves the -1 intact
#    after __post_init__ -> _set_max_num_scheduled_tokens (the <= 0 guard is
#    gated behind `speculative_config is not None`).
vc = VllmConfig(scheduler_config=sched)
print("[2] VllmConfig built; speculative_config is None:",
      vc.speculative_config is None, "; field ==",
      vc.scheduler_config.max_num_scheduled_tokens, "(guard skipped)")

# 3. Scheduler.__init__ truthiness fallback (scheduler.py:104): negative is truthy.
sc = vc.scheduler_config
effective = (sc.max_num_scheduled_tokens
             if sc.max_num_scheduled_tokens else sc.max_num_batched_tokens)
print("[3] scheduler.py:104 fallback -> effective =", effective, "(propagates)")

# 4. token_budget = effective (scheduler.py:348); assert token_budget >= 0 (829).
token_budget = effective
try:
    assert token_budget >= 0
    print("[4] assertion held (unexpected)")
except AssertionError:
    print("[4] token_budget =", token_budget,
          "-> assert token_budget >= 0 FAILS (bare AssertionError)")

---

[1] SchedulerConfig(max_num_scheduled_tokens=-1) -> -1 (no validation)
[2] VllmConfig built; speculative_config is None: True ; field == -1 (guard skipped)
[3] scheduler.py:104 fallback -> effective = -1 (propagates)
[4] token_budget = -1 -> assert token_budget >= 0 FAILS (bare AssertionError)
RAW_BUFFERClick to expand / collapse

Your current environment

Found by bounded model checking (ESBMC-Python - https://github.com/esbmc/esbmc) and confirmed with a behavioral reproduction on real vLLM config objects (below), in the same vein as #43521 / #43532 / #43842 / #43985. The reachability caveat is stated up front: this field is not CLI-settable, so this is a lower-severity, integrator-facing report rather than a vllm serve crash.

🐛 Describe the bug

SchedulerConfig.max_num_scheduled_tokens accepts a negative value with no validation in the non-speculative-decoding path, and that negative value then becomes the scheduler's token_budget, tripping a bare assert token_budget >= 0 deep inside schedule(). The defect is not a missing guard but a gated one — the validation exists, but only runs under speculative decoding.

Root cause: the <= 0 guard is gated behind spec decoding

Step 1 — Field (vllm/config/scheduler.py:56):

max_num_scheduled_tokens: int | None = None

No Field(gt=/ge=), and SchedulerConfig.__post_init__ raises ValueError for several other fields but never references this one. Any int (including negatives) survives construction.

Step 2 — The only <= 0 guard (vllm/config/vllm.py), inside _set_max_num_scheduled_tokens:

if self.scheduler_config.max_num_scheduled_tokens <= 0:   # vllm.py:1566
    raise ValueError(...)

…but the entire body of that method is gated:

if self.speculative_config is not None:                   # vllm.py:1555
    ...

So without speculative decoding, the method is a no-op for this field and the <= 0 check never executes.

Step 3 — Truthiness fallback propagates the negative (vllm/v1/core/sched/scheduler.py:104):

self.max_num_scheduled_tokens = (
    self.scheduler_config.max_num_scheduled_tokens
    if self.scheduler_config.max_num_scheduled_tokens   # truthiness test, not "== 0"
    else self.scheduler_config.max_num_batched_tokens
)

0 and None are falsy and fall back to max_num_batched_tokens (safe). A negative value is truthy, so it propagates unchanged instead of falling back.

Step 4 — Bare assert (vllm/v1/core/sched/scheduler.py:348 then 829):

token_budget = self.max_num_scheduled_tokens   # = negative
...
assert token_budget >= 0                        # bare AssertionError, no message

(and assert total_num_scheduled_tokens <= self.max_num_scheduled_tokens at scheduler.py:827).

The net effect is an inconsistency: the same invalid value is a clean ValueError under speculative decoding, but a cryptic internal AssertionError without it — the same bare-AssertionError UX class as #43842.

Reproduction (programmatic; the field is not CLI-wired)

Reproduced against the source tree at 4438b6e7d (vllm.__version__ == 0.1.dev1+g4438b6e7d), CPU-only, no model/GPU needed:

import vllm
from vllm.config.scheduler import SchedulerConfig
from vllm.config.vllm import VllmConfig

# 1. SchedulerConfig accepts a negative value with no validation.
sched = SchedulerConfig(max_num_scheduled_tokens=-1,
                        max_model_len=2048, is_encoder_decoder=False)
print("[1] SchedulerConfig(max_num_scheduled_tokens=-1) ->",
      sched.max_num_scheduled_tokens, "(no validation)")

# 2. A real VllmConfig with speculative_config is None leaves the -1 intact
#    after __post_init__ -> _set_max_num_scheduled_tokens (the <= 0 guard is
#    gated behind `speculative_config is not None`).
vc = VllmConfig(scheduler_config=sched)
print("[2] VllmConfig built; speculative_config is None:",
      vc.speculative_config is None, "; field ==",
      vc.scheduler_config.max_num_scheduled_tokens, "(guard skipped)")

# 3. Scheduler.__init__ truthiness fallback (scheduler.py:104): negative is truthy.
sc = vc.scheduler_config
effective = (sc.max_num_scheduled_tokens
             if sc.max_num_scheduled_tokens else sc.max_num_batched_tokens)
print("[3] scheduler.py:104 fallback -> effective =", effective, "(propagates)")

# 4. token_budget = effective (scheduler.py:348); assert token_budget >= 0 (829).
token_budget = effective
try:
    assert token_budget >= 0
    print("[4] assertion held (unexpected)")
except AssertionError:
    print("[4] token_budget =", token_budget,
          "-> assert token_budget >= 0 FAILS (bare AssertionError)")

Observed output:

[1] SchedulerConfig(max_num_scheduled_tokens=-1) -> -1 (no validation)
[2] VllmConfig built; speculative_config is None: True ; field == -1 (guard skipped)
[3] scheduler.py:104 fallback -> effective = -1 (propagates)
[4] token_budget = -1 -> assert token_budget >= 0 FAILS (bare AssertionError)

Steps 1–2 are behavioral on real vLLM objects; step 3 evaluates the verbatim scheduler.py:104 expression on the resolved config. (Instantiating the full Scheduler/schedule() additionally needs a KVCacheConfig; line 104 only reads self.scheduler_config.max_num_scheduled_tokens, shown negative above.)

Expected behavior

A non-positive max_num_scheduled_tokens should be rejected with a clear ValueError at config-construction time regardless of whether speculative decoding is enabled — matching the behavior already present on the spec-decoding path.

Proposed fix (either)

  • Add a field constraint: max_num_scheduled_tokens: int | None = Field(default=None, ge=1); or
  • Ungate the <= 0 check — move it out of the if self.speculative_config is not None: block in _set_max_num_scheduled_tokens so it validates the field on every path.

Either makes the truthiness fallback at scheduler.py:104 safe (only None/0 remain falsy, both intended).

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A non-positive max_num_scheduled_tokens should be rejected with a clear ValueError at config-construction time regardless of whether speculative decoding is enabled — matching the behavior already present on the spec-decoding path.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: Negative max_num_scheduled_tokens bypasses validation (guard gated behind speculative decoding) → bare AssertionError in the scheduler [1 pull requests]