vllm - ✅(Solved) Fix [RFC]: Replace Hardcoded Device Strings with current_platform and Implement Linting [2 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39871Fetched 2026-04-17 08:24:04
View on GitHub
Comments
3
Participants
3
Timeline
40
Reactions
3
Author
Timeline (top)
mentioned ×18subscribed ×18commented ×3labeled ×1

Error Message

Portability: Developers porting vLLM to new hardware must manually find and replace strings, which is error-prone.

PR fix notes

PR #37566: refactor hard coded device string in test files under tests/v1 and tests/lora

Description (problem / solution / changelog)

This PR replaces hardcoded "cuda" device strings with dynamic platform checks across tests/v1 and tests/lora. By utilizing current_platform.device_type, we enable these test suites to be reused across different hardware accelerators (e.g., ROCm, Gaudi, XPU) without manual modification. Currently, many tests in the V1 engine and LoRA modules are coupled specifically to CUDA. This makes it difficult to verify feature parity on non-NVIDIA hardware. This PR generalizes the device handling to ensure that "cuda-centric" code becomes "accelerator-agnostic." Proposed Changes I have implemented the following systematic replacements:

  • use DEVICE_TYPE (inferred from current_platform.device_type) replace hardcode cuda
  • use DEVICES ([f"{DEVICE_TYPE}:{i}" for i in range(1 if current_platform.device_count() == 1 else 2)]) replace CUDA_DEVICES ([f"cuda:{i} for i in range(1 if current_platform.device_count() == 1 else 2)"]

Impact

Test Parity: Allows non-CUDA CI pipelines to run the exact same V1 and LoRA validation logic used for NVIDIA GPUs.

Test Plan

CI

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • tests/lora/test_fused_moe_lora_kernel.py (modified, +1/-1)
  • tests/lora/test_layers.py (modified, +6/-2)
  • tests/lora/test_lora_manager.py (modified, +2/-2)
  • tests/lora/test_moe_lora_align_sum.py (modified, +16/-6)
  • tests/lora/test_punica_ops.py (modified, +10/-3)
  • tests/lora/test_punica_ops_fp8.py (modified, +3/-1)
  • tests/lora/test_worker.py (modified, +4/-1)
  • tests/lora/utils.py (modified, +6/-3)
  • tests/v1/attention/test_attention_backends.py (modified, +3/-1)
  • tests/v1/attention/test_chunked_local_attention.py (modified, +4/-1)
  • tests/v1/attention/test_mla_backends.py (modified, +3/-1)
  • tests/v1/attention/test_sparse_mla_backends.py (modified, +6/-4)
  • tests/v1/attention/test_trtllm_attention_integration.py (modified, +2/-1)
  • tests/v1/cudagraph/test_cudagraph_dispatch.py (modified, +11/-9)
  • tests/v1/determinism/test_rms_norm_batch_invariant.py (modified, +10/-7)
  • tests/v1/e2e/general/test_mamba_prefix_cache.py (modified, +15/-5)
  • tests/v1/kv_offload/test_cpu_gpu.py (modified, +4/-2)
  • tests/v1/logits_processors/test_correctness.py (modified, +4/-3)
  • tests/v1/sample/test_rejection_sampler.py (modified, +60/-30)
  • tests/v1/sample/test_sampler.py (modified, +8/-7)
  • tests/v1/sample/test_topk_topp_sampler.py (modified, +8/-9)
  • tests/v1/spec_decode/test_eagle.py (modified, +10/-9)
  • tests/v1/spec_decode/test_eagle_step_kernel.py (modified, +6/-3)
  • tests/v1/spec_decode/test_extract_hidden_states.py (modified, +6/-5)
  • tests/v1/spec_decode/test_mtp.py (modified, +4/-3)
  • tests/v1/spec_decode/test_tree_attention.py (modified, +5/-3)
  • tests/v1/worker/test_gpu_input_batch.py (modified, +5/-7)
  • tests/v1/worker/test_gpu_model_runner.py (modified, +17/-17)

PR #38901: refactor hard coded device string in test files under tests/compile tests/quantization tests/models and tests/model_executor

Description (problem / solution / changelog)

This PR replaces hardcoded "cuda" device strings with dynamic platform checks across tests/compile, tests/quantization, tests/models, tests/model_executor and tests/basic_correctness. By utilizing current_platform.device_type, we enable these test suites to be reused across different hardware accelerators (e.g., ROCm, Gaudi, XPU) without manual modification. Currently, many tests in the V1 engine and LoRA modules are coupled specifically to CUDA. This makes it difficult to verify feature parity on non-NVIDIA hardware. This PR generalizes the device handling to ensure that "cuda-centric" code becomes "accelerator-agnostic." Proposed Changes I have implemented the following systematic replacements:

use DEVICE_TYPE (inferred from current_platform.device_type) replace hardcode cuda use DEVICES ([f"{DEVICE_TYPE}:{i}" for i in range(1 if current_platform.device_count() == 1 else 2)]) replace CUDA_DEVICES ([f"cuda:{i} for i in range(1 if current_platform.device_count() == 1 else 2)"] Impact

Test Parity: Allows non-CUDA CI pipelines to run the exact same validation logic used for NVIDIA GPUs.

Test Plan

CI

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • tests/basic_correctness/test_cumem.py (modified, +10/-8)
  • tests/compile/passes/distributed/test_async_tp.py (modified, +3/-2)
  • tests/compile/passes/distributed/test_fusion_all_reduce.py (modified, +4/-2)
  • tests/compile/passes/distributed/test_sequence_parallelism.py (modified, +4/-2)
  • tests/compile/passes/test_fusion_attn.py (modified, +2/-1)
  • tests/compile/passes/test_mla_attn_quant_fusion.py (modified, +2/-1)
  • tests/compile/passes/test_noop_elimination.py (modified, +5/-2)
  • tests/compile/passes/test_scatter_split_replace.py (modified, +4/-1)
  • tests/compile/passes/test_split_coalescing.py (modified, +4/-1)
  • tests/compile/test_config.py (modified, +4/-2)
  • tests/compile/test_graph_partition.py (modified, +5/-2)
  • tests/compile/test_rotary_embedding_compile.py (modified, +3/-1)
  • tests/compile/test_structured_logging.py (modified, +3/-1)
  • tests/model_executor/test_eagle_quantization.py (modified, +3/-2)
  • tests/models/multimodal/pooling/test_intern_vit.py (modified, +8/-4)
  • tests/models/multimodal/pooling/test_radio.py (modified, +8/-4)
  • tests/models/test_utils.py (modified, +8/-2)
  • tests/quantization/test_fp8.py (modified, +7/-3)
  • tests/quantization/test_per_token_kv_cache.py (modified, +8/-6)
  • tests/quantization/test_quark.py (modified, +7/-5)
  • tests/quantization/test_torchao.py (modified, +9/-8)
  • tests/test_config.py (modified, +4/-2)
  • tests/v1/sample/test_topk_topp_sampler.py (modified, +1/-1)
  • tests/v1/test_tensor_ipc_queue.py (modified, +6/-3)
RAW_BUFFERClick to expand / collapse

Motivation.

Currently, the vLLM codebase contains numerous instances of hardcoded device strings such as "cuda", "cuda:0", and .to("cuda"). This hinders our goal of being a truly multi-platform LLM engine (supporting ROCm, TPU, Gaudi/HPU, etc.)

Hardcoding "cuda" creates several issues: Portability: Developers porting vLLM to new hardware must manually find and replace strings, which is error-prone. Consistency: Some parts of the code use cuda, while others use current_platform. We need a single source of truth. Future-Proofing: As we move toward better abstraction, the device type should be a property of the environment, not a static string in the logic.

for those that want to limit the test on cuda or specific platform, we better use decorators to skip it on other platform instead of giving a hardcode device string. Refer https://github.com/vllm-project/vllm/issues/39158

Proposed Change.

  1. We will replace all instances of:
  1. implement lint check for string "cuda" and "cuda*", except tests/kernel, vllm/platforms, etc.

Feedback Period.

No response

CC List.

No response

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Replace hardcoded "cuda" strings with DEVICE_TYPE inferred from current_platform.device_type to improve portability and consistency.

Guidance

  • Identify and replace all instances of hardcoded "cuda" strings with DEVICE_TYPE to create a single source of truth for device types.
  • Implement a lint check to detect and prevent future usage of hardcoded "cuda" strings, excluding specific directories like tests/kernel and vllm/platforms.
  • Use decorators to skip tests on specific platforms instead of hardcoding device strings, as suggested in the referenced GitHub issue.
  • Review the proposed changes and referenced pull requests (e.g., #37566 and #38901) to understand the implementation details.

Example

# Before
device = "cuda:0"

# After
DEVICE_TYPE = current_platform.device_type
device = f"{DEVICE_TYPE}:0"

Notes

The proposed change aims to improve the codebase's portability and consistency by replacing hardcoded device strings. However, the implementation details and potential edge cases should be carefully reviewed and tested.

Recommendation

Apply the proposed workaround by replacing hardcoded "cuda" strings with DEVICE_TYPE to improve the codebase's portability and consistency. This change is necessary to support multiple platforms and devices, making the code more future-proof.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [RFC]: Replace Hardcoded Device Strings with current_platform and Implement Linting [2 pull requests, 3 comments, 3 participants]