vllm - ✅(Solved) Fix [RFC]: Make kernel/op and component tests device-agnostic for OOT plugins [2 pull requests, 7 comments, 4 participants]

vllm2026-03-10 07:15:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36602•Fetched 2026-04-08 00:36:03

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×7subscribed ×6mentioned ×4cross-referenced ×3

Fix Action

Fix / Workaround

Some tests use layer.forward_native() as the reference, but when an OOT plugin registers a custom op, forward_native() can dispatch to the override rather than the base implementation, polluting the reference:

PR fix notes

PR #20169: [UT][intel GPU] use current_platform instead of device hardcode in v1 tests

Repository: vllm-project/vllm
Author: Liangliang-Ma
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/20169

Description (problem / solution / changelog)

Purpose

We went through all the tests under v1/ with some local modification on xpu device and we found that to reuse v1 tests, the biggest gap is device hardcode. So in this PR we use current_platform class attributes and methods to take place them. After this one we will keep contributing some changes to let xpu(intel GPU) users get guaranteed by vllm tests.

Test Plan

To reuse tests on intel gpu.

Test Result

passed on cuda/xpu.

Changed files

tests/conftest.py (modified, +2/-2)
tests/v1/sample/test_rejection_sampler.py (modified, +5/-4)
tests/v1/sample/test_topk_topp_sampler.py (modified, +6/-5)
tests/v1/spec_decode/test_eagle.py (modified, +13/-10)
tests/v1/worker/test_gpu_input_batch.py (modified, +3/-1)
tests/v1/worker/test_gpu_model_runner.py (modified, +2/-1)
vllm/platforms/cuda.py (modified, +5/-1)
vllm/platforms/rocm.py (modified, +5/-0)
vllm/platforms/xpu.py (modified, +1/-1)
vllm/v1/attention/backends/mla/common.py (modified, +2/-1)

PR #36246: [CI/Build] Updated rmsnorm test to improve OOT device coverage

Repository: vllm-project/vllm
Author: romitjain
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/36246

Description (problem / solution / changelog)

Purpose

The purpose of this PR is to update the RMSNorm test (test_rms_norm) to make it more generic across devices. Specifically, I have updated the device parameterization for the test to be CPU as the default. This enables OOT hardware plugins to also run the same test. The PR uses forward_static as the reference implementation instead of forward_native. forward_static is a staticmethod; hence, that should be used as the gold standard response.

Test Plan

This is an updated test, so no new tests are required

pytest tests/kernels/core/test_layernorm.py::test_rms_norm on CPU installation
pytest tests/kernels/core/test_layernorm.py::test_rms_norm on CUDA installation

Test Result

The same test runs fine for CUDA devices, as well as for CPU devices

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

</details>

Edit: I have added a RFC for broader changes for similar ops for tests

Changed files

tests/kernels/core/test_layernorm.py (modified, +16/-7)

Code Example

CUDA_DEVICES = [f"cuda:{i}" for i in range(1 if torch.cuda.device_count() == 1 else 2)]

---

from vllm.platforms import current_platform
CUDA_DEVICES = [
    f"{current_platform.device_type}:{i}"
    for i in range(min(current_platform.device_count(), 2))
]

---

opcheck(fn, (out, x))

---

if current_platform.is_cuda_alike():
    opcheck(fn, (out, x))

---

ref_out = layer.forward_native(..)

---

ref_out = layer.forward_static(..)

---

device = torch.device("cuda:0")

---

device = torch.device(f"{current_platform.device_type}:0")

RAW_BUFFERClick to expand / collapse

Motivation.

vLLM's kernel/op tests and component tests provide extensive coverage for correctness and regression. However, many of these tests hardcode CUDA assumptions — device strings, opcheck() calls, torch.cuda.* APIs — which prevents OOT device plugins from reusing them. This forces OOT plugin developers to duplicate test logic for ops and components that are already well-tested upstream. Making these tests device-agnostic would allow any OOT plugin to validate its implementations against the same upstream suite by simply installing the plugin and running pytest.

Prior art: #20169 applied this pattern to v1 worker and sampler tests for Intel XPU. The kernel/op tests remain unfixed.

Proposed Change.

There are recurring CUDA-hardcoded patterns across the test suite that I could find:

1. CUDA_DEVICES list generation

Many test files generate a device list that only includes CUDA devices:

CUDA_DEVICES = [f"cuda:{i}" for i in range(1 if torch.cuda.device_count() == 1 else 2)]

Proposed change:

from vllm.platforms import current_platform
CUDA_DEVICES = [
    f"{current_platform.device_type}:{i}"
    for i in range(min(current_platform.device_count(), 2))
]

Eg: test_layernorm.py, test_activation.py, test_pos_encoding.py, test_apply_rotary_emb.py.

2. Unconditional opcheck() calls

opcheck() validates CUDA custom op contracts and fails on non-CUDA devices:

opcheck(fn, (out, x))

Proposed change:

if current_platform.is_cuda_alike():
    opcheck(fn, (out, x))

3. Reference implementation via forward_native()

ref_out = layer.forward_native(..)

Proposed change:

ref_out = layer.forward_static(..)

4. Hardcoded torch.device("cuda:0") in component tests

Some component tests hardcode the device directly instead of using platform abstraction:

device = torch.device("cuda:0")

Proposed change:

device = torch.device(f"{current_platform.device_type}:0")

Examples

tests/kernels/core/test_layernorm.py**: Has patterns 1, 2, and 3. PoC PR: #36246.
tests/kernels/core/test_activation.py**: Tests SiLU, GELU, FatReLU, and other activations. Has patterns 1 and 2 — CUDA_DEVICES list and unconditional opcheck() calls.

Feedback Period.

2 weeks

CC List.

TBA

Any Other Things.

PoC PR for test_layernorm.py: #36246
Prior art for v1 worker/sampler tests: #20169

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To make the tests device-agnostic, apply the following changes:

Replace hardcoded CUDA device lists with device-agnostic lists:

from vllm.platforms import current_platform CUDA_DEVICES = [ f"{current_platform.device_type}:{i}" for i in range(min(current_platform.device_count(), 2)) ]

* Condition `opcheck()` calls to only run on CUDA-like devices:
  ```python
if current_platform.is_cuda_alike():
    opcheck(fn, (out, x))

Use forward_static() instead of forward_native() for reference implementations:

ref_out = layer.forward_static(..)

* Replace hardcoded `torch.device("cuda:0")` with device-agnostic devices:
  ```python
device = torch.device(f"{current_platform.device_type}:0")

Verification

To verify the fix, run the modified tests with different devices (e.g., CUDA, XPU) and ensure that they pass without errors.

Extra Tips

Use the current_platform object to access device information and make device-agnostic decisions.
Consider adding more device-agnostic tests to cover different scenarios and edge cases.
Review prior art (e.g., #20169) for inspiration and guidance on making tests device-agnostic.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #agent execution #callback error #memory management #API rate limit

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [RFC]: Make kernel/op and component tests device-agnostic for OOT plugins [2 pull requests, 7 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

PR fix notes

PR #20169: [UT][intel GPU] use current_platform instead of device hardcode in v1 tests

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

PR #36246: [CI/Build] Updated rmsnorm test to improve OOT device coverage

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

The same test runs fine for CUDA devices, as well as for CPU devices

Changed files

Code Example

Motivation.

Proposed Change.

Examples

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING