vllm - ✅(Solved) Fix [Feature]: Priority scheduling supports preemption of requests in the running queue by requests in the waiting queue [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#40004Fetched 2026-04-17 08:27:43
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Timeline (top)
labeled ×2commented ×1cross-referenced ×1

Fix Action

Fixed

PR fix notes

PR #40087: [Scheduler] Priority scheduling supports preemption of requests in the running queue by requests in the waiting queue

Description (problem / solution / changelog)

Purpose

Priority scheduling supports preemption of requests in the running queue by requests in the waiting queue

Fixes: #40004

Test Plan

pytest tests/v1/core/test_scheduler.py::test_priority_preemption_at_max_num_seqs

Test Result

======================================================== test session starts =========================================================
platform linux -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0
rootdir: /home/name/.test/.gpu/vllm
configfile: pyproject.toml
plugins: anyio-4.13.0
collected 1 item                                                                                                                     

tests/v1/core/test_scheduler.py .                                                                                              [100%]

========================================================== warnings summary ==========================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

.venv/lib/python3.12/site-packages/torch/jit/_script.py:365: 14 warnings
  /home/name/.test/.gpu/vllm/.venv/lib/python3.12/site-packages/torch/jit/_script.py:365: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

tests/v1/core/test_scheduler.py::test_priority_preemption_at_max_num_seqs
  /home/name/.test/.gpu/vllm/.venv/lib/python3.12/site-packages/transformers/models/gpt2/tokenization_gpt2.py:110: DeprecationWarning: Deprecated in 0.9.0: BPE.__init__ will not create from files anymore, try `BPE.from_file` instead
    BPE(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================== 1 passed, 17 warnings in 3.86s ===================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
</details>

Changed files

  • tests/v1/core/test_scheduler.py (modified, +119/-0)
  • vllm/v1/core/sched/scheduler.py (modified, +77/-0)
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

The current priority scheduling only supports evicting low-priority requests from the running queue when resources are insufficient. However, when scheduling the waiting queue, if pending requests cannot be scheduled (for example, when the number of requests in the running queue has reached max_num_seqs), even high-priority requests cannot preempt requests in the running queue.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Modify the priority scheduling algorithm to allow high-priority requests to preempt requests in the running queue when resources are insufficient.

Guidance

  • Review the current priority scheduling implementation to identify where low-priority requests are evicted from the running queue and consider extending this logic to allow high-priority requests to preempt existing requests.
  • Investigate the max_num_seqs variable and its impact on scheduling pending requests to determine if adjustments can be made to accommodate high-priority requests.
  • Consider adding a mechanism to dynamically adjust the running queue based on priority levels, ensuring high-priority requests are not blocked by lower-priority ones.
  • Evaluate the trade-offs between fairness, throughput, and responsiveness in the scheduling algorithm to ensure the modified approach aligns with system requirements.

Notes

The exact implementation details of the priority scheduling algorithm and the max_num_seqs variable are not provided, so the suggested modifications are high-level and may require further refinement based on the specific codebase and system constraints.

Recommendation

Apply workaround: Modify the priority scheduling algorithm to allow for more dynamic adjustment of the running queue based on request priorities, as this seems to be the core issue preventing high-priority requests from being scheduled appropriately.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Feature]: Priority scheduling supports preemption of requests in the running queue by requests in the waiting queue [1 pull requests, 1 comments, 2 participants]