vllm - ✅(Solved) Fix [Feature]: Priority scheduling supports preemption of requests in the running queue by requests in the waiting queue [1 pull requests, 1 comments, 2 participants]

vllm2026-04-16 11:12:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#40004•Fetched 2026-04-17 08:27:43

View on GitHub

Comments

Participants

Timeline

Reactions

Author

kelliaao

Participants

kelliaao

robertgshaw2-redhat

Timeline (top)

labeled ×2commented ×1cross-referenced ×1

Fix Action

Fixed

Fixed by PR: [Scheduler] Priority scheduling supports preemption of requests in the running queue by requests in the waiting queue (https://github.com/vllm-project/vllm/pull/40087)

PR fix notes

PR #40087: [Scheduler] Priority scheduling supports preemption of requests in the running queue by requests in the waiting queue

Repository: vllm-project/vllm
Author: sts07142
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/40087

Description (problem / solution / changelog)

Purpose

Priority scheduling supports preemption of requests in the running queue by requests in the waiting queue

Fixes: #40004

Test Plan

pytest tests/v1/core/test_scheduler.py::test_priority_preemption_at_max_num_seqs

Test Result

======================================================== test session starts =========================================================
platform linux -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0
rootdir: /home/name/.test/.gpu/vllm
configfile: pyproject.toml
plugins: anyio-4.13.0
collected 1 item                                                                                                                     

tests/v1/core/test_scheduler.py .                                                                                              [100%]

========================================================== warnings summary ==========================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

.venv/lib/python3.12/site-packages/torch/jit/_script.py:365: 14 warnings
  /home/name/.test/.gpu/vllm/.venv/lib/python3.12/site-packages/torch/jit/_script.py:365: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

tests/v1/core/test_scheduler.py::test_priority_preemption_at_max_num_seqs
  /home/name/.test/.gpu/vllm/.venv/lib/python3.12/site-packages/transformers/models/gpt2/tokenization_gpt2.py:110: DeprecationWarning: Deprecated in 0.9.0: BPE.__init__ will not create from files anymore, try `BPE.from_file` instead
    BPE(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================== 1 passed, 17 warnings in 3.86s ===================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

</details>

Changed files

tests/v1/core/test_scheduler.py (modified, +119/-0)
vllm/v1/core/sched/scheduler.py (modified, +77/-0)

RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

The current priority scheduling only supports evicting low-priority requests from the running queue when resources are insufficient. However, when scheduling the waiting queue, if pending requests cannot be scheduled (for example, when the number of requests in the running queue has reached max_num_seqs), even high-priority requests cannot preempt requests in the running queue.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Modify the priority scheduling algorithm to allow high-priority requests to preempt requests in the running queue when resources are insufficient.

Guidance

Review the current priority scheduling implementation to identify where low-priority requests are evicted from the running queue and consider extending this logic to allow high-priority requests to preempt existing requests.
Investigate the max_num_seqs variable and its impact on scheduling pending requests to determine if adjustments can be made to accommodate high-priority requests.
Consider adding a mechanism to dynamically adjust the running queue based on priority levels, ensuring high-priority requests are not blocked by lower-priority ones.
Evaluate the trade-offs between fairness, throughput, and responsiveness in the scheduling algorithm to ensure the modified approach aligns with system requirements.

Notes

The exact implementation details of the priority scheduling algorithm and the max_num_seqs variable are not provided, so the suggested modifications are high-level and may require further refinement based on the specific codebase and system constraints.

Recommendation

Apply workaround: Modify the priority scheduling algorithm to allow for more dynamic adjustment of the running queue based on request priorities, as this seems to be the core issue preventing high-priority requests from being scheduled appropriately.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#latency issue #model loading #dependency error #configuration error #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Feature]: Priority scheduling supports preemption of requests in the running queue by requests in the waiting queue [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #40087: [Scheduler] Priority scheduling supports preemption of requests in the running queue by requests in the waiting queue

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Feature]: Priority scheduling supports preemption of requests in the running queue by requests in the waiting queue [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #40087: [Scheduler] Priority scheduling supports preemption of requests in the running queue by requests in the waiting queue

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING