vllm - 💡(How to fix) Fix [RFC]: Unified Device Capability Abstraction for Cross-Platform Feature Detection [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#40620Fetched 2026-04-23 07:23:52
View on GitHub
Comments
1
Participants
2
Timeline
11
Reactions
0
Timeline (top)
mentioned ×3subscribed ×3labeled ×2added_to_project_v2 ×1

Error Message

This is an AI-generated proposal, there may be some error. appreciate if you can point out. Add pre-commit hook to warn on new has_device_capability usage in test files without a is_cuda() / is_rocm() guard.

Root Cause

This is particularly problematic because:

Fix Action

Fix / Workaround

LocationUsage PatternNotes
tests/ and benchmarks/Test skip / gating~60% of call sites. vLLM CI always uses NVIDIA server GPUs, so numeric checks "happen to work" but are not correct for community contributors on consumer/workstation GPUs
Kernel selection code (model_executor/layers/)Dispatch to optimal kernel implementation~30%. Inherently platform-specific (CUTLASS, Marlin, etc.)
Config / runtime (vllm/config/, vllm/v1/)Feature availability checks~10%. Should be cross-platform

This distribution suggests the migration can be prioritized: fix tests/config first (highest cross-platform impact), then gradually migrate kernel dispatch code.

These are CUDA-kernel-specific dispatch decisions. They are inherently platform-specific and do not need cross-platform abstraction — they are always guarded behind is_cuda() check already. However, even here the Blackwell SKU split causes issues (see Section 1.2).

Code Example

# "Does this device support FP8?"
current_platform.has_device_capability(89)   # Actually means: supports_fp8()

# "Does this device support BF16?"
current_platform.has_device_capability(80)   # Actually means: supports_bf16()

# "Does this device support TMA / warp-group MMA?" 
current_platform.has_device_capability(90)   # Actually means: supports_hopper_features()

# "Does this device support Blackwell features (FP4, TMA v2)?"
current_platform.is_device_capability_family(100)  # Actually means: supports_blackwell_features()

---

# "Use CUTLASS 3.x path for SM90+"
if current_platform.has_device_capability(90): use_cutlass3()

# "Use Blackwell-specific GEMM kernel"
if current_platform.is_device_capability_family(100): use_blackwell_gemm()

# "Use CUTLASS MoE for Blackwell server OR workstation"
if p.is_device_capability_family(100) or p.is_device_capability_family(120): ...

---

# fbgemm_fp8.pyMarlin WOQ fallback for non-FP8 hardware
self.use_marlin = not current_platform.has_device_capability(89)

# marlin.pyFP8 Marlin works on SM7.5 (Turing+)
def is_fp8_marlin_supported():
    return current_platform.has_device_capability(75)

---

# vllm/platforms/interface.py — base class
Platform.supports_fp8()    → bool   # ✅ Cross-platform
Platform.supports_mx()     → bool   # ✅ Cross-platform
Platform.support_deep_gemm() → bool # ✅ Cross-platform
Platform.fp8_dtype()       → dtype  # ✅ Cross-platform
Platform.is_fp8_fnuz()     → bool   # ✅ Cross-platform

---

# vllm/platforms/interface.py

class Platform:
    # === Existing (keep) ===
    def supports_fp8(cls) -> bool: ...      # keep for backward compat, alias to supports_fp8_native()
    def supports_mx(cls) -> bool: ...
    def support_deep_gemm(cls) -> bool: ...
    
    # === New feature queries ===
    
    @classmethod
    def supports_bf16(cls) -> bool:
        """Returns whether the current platform supports BF16 compute."""
        return False
    
    @classmethod
    def supports_fp8_native(cls) -> bool:
        """Returns whether the current platform has native FP8 tensor core compute.
        
        This means the hardware can natively perform matrix multiplication in FP8
        (e.g., NVIDIA SM89 Ada/Hopper/Blackwell server, AMD gfx942/gfx950).
        Used for: native FP8 GEMM, FP8 KV cache, FP8 attention.
        """
        return False

    @classmethod
    def supports_fp8_woq(cls) -> bool:
        """Returns whether the platform supports FP8 weight-only quantization.
        
        This means kernels like Marlin can load FP8-quantized weights and 
        dequantize them to FP16/BF16 for compute. Works on significantly older
        hardware than native FP8 (e.g., NVIDIA SM75 Turing+).
        Used for: Marlin FP8, fbgemm FP8 fallback path.
        """
        return False

    @classmethod
    def supports_fp4(cls) -> bool:
        """Returns whether the current platform supports FP4 quantization 
        (native compute or equivalent)."""
        return False

    @classmethod
    def supports_tma(cls) -> bool:
        """Returns whether the current platform supports 
        Tensor Memory Accelerator (or equivalent async copy engine)."""
        return False

    @classmethod
    def supports_fp8_kv_cache(cls) -> bool:
        """Returns whether the current platform supports FP8 KV cache."""
        return cls.supports_fp8_native()

    @classmethod
    def get_architecture_family(cls) -> str:
        """Returns human-readable architecture family name.
        
        Examples: 'hopper', 'blackwell', 'blackwell_consumer','cdna3', 
                  'rdna4', 'ponte_vecchio', 'unknown'
        """
        return "unknown"

---

# fbgemm_fp8.py — decides whether to use native FP8 or Marlin fallback
self.use_marlin = not current_platform.has_device_capability(89)

# marlin_utils_fp8.pyMarlin FP8 works on much older hardware
def is_fp8_marlin_supported():
    return current_platform.has_device_capability(75)  # Turing+!

---

# fbgemm_fp8.py — clear intent
self.use_marlin = not current_platform.supports_fp8_native()

# marlin_utils_fp8.py — explicit WOQ check
def is_fp8_marlin_supported():
    return current_platform.supports_fp8_woq()

---

# Before
if current_platform.has_device_capability(89):
    # FP8 native path
elif current_platform.has_device_capability(75):
    # FP8 Marlin WOQ fallback

# After  
if current_platform.supports_fp8_native():
    # FP8 native path
elif current_platform.supports_fp8_woq():
    # FP8 Marlin WOQ fallback

---

# Before (CUDA-only, broken on ROCm, wrong on consumer GPUs)
@pytest.mark.skipif(not current_platform.has_device_capability(89), reason="need fp8")

# After (cross-platform, correct on all SKUs)
@requires_feature("fp8_native")
RAW_BUFFERClick to expand / collapse

Motivation.

This is an AI-generated proposal, there may be some error. appreciate if you can point out.

1. Problem Summary

As raised by @tjtanaa in #39158, the current has_device_capability(int) / is_device_capability_family(int) API is fundamentally CUDA-centric and does not translate correctly to ROCm or XPU.

1.1 Problem A: device_capability is inherently a CUDA concept

torch.cuda.get_device_capability() returns (major, minor) tied to NVIDIA's SM (Streaming Multiprocessor) versioning. ROCm and XPU have completely different hardware models — any mapping to CUDA-style numbers is artificial and lossy:

PlatformCapability ModelHow it works
CUDASM version (major, minor) e.g. (8,9), (9,0), (10,0)Native torch.cuda.get_device_capability()
ROCmGCN arch string (e.g. gfx942) → artificially mapped to (major, minor)Semantic mismatch: gfx90a maps to (9,0) but has NO FP8, while CUDA's (9,0) = Hopper = has FP8
XPUNo capability model → always returns None → all checks = FalseAll feature gates broken
CPU/TPUNo capability model → returns NoneN/A
OOTNo capability model → returns NoneN/A

1.2 Problem B: Same CUDA generation, different capability numbers across SKU tiers

Even within NVIDIA's own ecosystem, the same architecture generation has different compute capability numbers depending on the product tier (server / workstation / client):

PlatformDeviceCompute CapabilityData Types Supported
ServerB200 / B300(10, 0) / (10, 3)bf16 / fp8 / fp4
WorkstationRTX PRO 6000 Blackwell(12, 0)bf16 / fp8 / fp4
ClientRTX 5090(12, 0)bf16 only

This is particularly problematic because:

  1. Same generation, different numbers: B200 (10,0) and RTX PRO 6000 (12,0) are both Blackwell, both support FP8/FP4, but have completely different capability numbers. Code using is_device_capability_family(100) to gate Blackwell features will miss RTX PRO 6000.

  2. Same number, different features: RTX PRO 6000 and RTX 5090 both report (12,0), but RTX PRO supports FP8/FP4 while consumer RTX 5090 does not. Code using has_device_capability(120) to gate FP8 would be wrong on RTX 5090.

  3. Maintenance burden: Every new GPU SKU tier requires auditing all numeric capability checks. The current codebase already handles is_device_capability_family(100) and is_device_capability_family(120) separately (e.g., in cutlass_moe.py), and this will only get worse.

1.3 Problem C: Cross-platform semantic mismatch

The same numeric value means completely different things on different platforms:

Capability ValueCUDA MeaningROCm Meaning
(9, 0)Hopper (H100/H200) — FP8 ✅, TMA ✅MI200 (gfx90a) — FP8 ❌, BF16 only
(9, 4)N/AMI300 (gfx942) — FP8 ✅ (FNUZ)
(9, 5)N/AMI355 (gfx950) — FP8 ✅, FP4 ✅
(12, 0)Blackwell consumer (RTX 5090) — BF16 onlyRDNA4 (gfx1201) — BF16 only

1.4 Scope observation: where capability checks are used

Most usage of has_device_capability / get_device_capability falls into:

LocationUsage PatternNotes
tests/ and benchmarks/Test skip / gating~60% of call sites. vLLM CI always uses NVIDIA server GPUs, so numeric checks "happen to work" but are not correct for community contributors on consumer/workstation GPUs
Kernel selection code (model_executor/layers/)Dispatch to optimal kernel implementation~30%. Inherently platform-specific (CUTLASS, Marlin, etc.)
Config / runtime (vllm/config/, vllm/v1/)Feature availability checks~10%. Should be cross-platform

This distribution suggests the migration can be prioritized: fix tests/config first (highest cross-platform impact), then gradually migrate kernel dispatch code.


2. Analysis of Current Codebase

2.1 How capability is actually used (282 call sites)

Analyzing all has_device_capability / is_device_capability* call sites across the codebase, they fall into three distinct categories:

Category A: Testing for a hardware feature (~60%)

# "Does this device support FP8?"
current_platform.has_device_capability(89)   # Actually means: supports_fp8()

# "Does this device support BF16?"
current_platform.has_device_capability(80)   # Actually means: supports_bf16()

# "Does this device support TMA / warp-group MMA?" 
current_platform.has_device_capability(90)   # Actually means: supports_hopper_features()

# "Does this device support Blackwell features (FP4, TMA v2)?"
current_platform.is_device_capability_family(100)  # Actually means: supports_blackwell_features()

These should be feature queries, not numeric comparisons.

Category B: Selecting a kernel implementation (~30%)

# "Use CUTLASS 3.x path for SM90+"
if current_platform.has_device_capability(90): use_cutlass3()

# "Use Blackwell-specific GEMM kernel"
if current_platform.is_device_capability_family(100): use_blackwell_gemm()

# "Use CUTLASS MoE for Blackwell server OR workstation"
if p.is_device_capability_family(100) or p.is_device_capability_family(120): ...

These are CUDA-kernel-specific dispatch decisions. They are inherently platform-specific and do not need cross-platform abstraction — they are always guarded behind is_cuda() check already. However, even here the Blackwell SKU split causes issues (see Section 1.2).

Category C: FP8 weight-only quantization fallback (~10%)

# fbgemm_fp8.py — Marlin WOQ fallback for non-FP8 hardware
self.use_marlin = not current_platform.has_device_capability(89)

# marlin.py — FP8 Marlin works on SM ≥ 7.5 (Turing+)
def is_fp8_marlin_supported():
    return current_platform.has_device_capability(75)

This pattern reveals that FP8 "support" is not binary — there are two distinct levels:

  • Native FP8 compute: Hardware tensor cores that natively compute in FP8 (SM ≥ 89 on CUDA, gfx942/gfx950 on ROCm)
  • FP8 weight-only quantization (WOQ): Kernels like Marlin that can load FP8-quantized weights and dequantize them to FP16/BF16 for compute (works on SM ≥ 75 on CUDA, i.e., Turing+)

The current supports_fp8() doesn't distinguish between these, which leads to suboptimal decisions. For example, FBGEMMFp8Config uses has_device_capability(89) to decide whether to use native FP8 compute vs Marlin fallback, but this is incorrect on RTX 5090 ((12,0)) which has has_device_capability(89) = True but may not actually have native FP8 tensor cores for all operations.

2.2 Existing feature-level APIs (the right pattern)

The codebase already has the correct abstraction for some features:

# vllm/platforms/interface.py — base class
Platform.supports_fp8()bool   # ✅ Cross-platform
Platform.supports_mx()bool   # ✅ Cross-platform
Platform.support_deep_gemm()bool # ✅ Cross-platform
Platform.fp8_dtype()       → dtype  # ✅ Cross-platform
Platform.is_fp8_fnuz()bool   # ✅ Cross-platform

Each platform overrides these with the correct implementation:

  • CUDA: supports_fp8() → has_device_capability(89)
  • ROCm: supports_fp8() → "gfx94" in arch or "gfx95" in arch or "gfx12" in arch
  • XPU: inherits False (or could override when XPU gains FP8)

This is the pattern that should be expanded — but with finer granularity (see Section 3).

2.3 Missing feature-level APIs

FeatureHow it's checked todayMissing Platform method
BF16 supporthas_device_capability(80) (CUDA-only)supports_bf16()
FP8 native compute vs WOQImplicit: has_device_capability(89) vs has_device_capability(75)supports_fp8_native() / supports_fp8_woq()
FP4 / NVFP4C++ cutlass_scaled_mm_supports_fp4() (CUDA-only)supports_fp4()
Hopper features (TMA, wgmma)has_device_capability(90)supports_tma() or architecture family check
Blackwell featuresis_device_capability_family(100) — misses SM 12.0 workstationsupports_blackwell_features()
Flash attention FP8has_device_capability(89)Covered by supports_fp8_native()
CUTLASS GEMM dispatchhas_device_capability(75/80/89/90/100)Platform-specific, keep as-is

Proposed Change.

3. Proposal: Two-Layer Capability Model

Layer 1: Feature-Based Queries (Cross-Platform) — PRIMARY

Expand the existing Platform base class with semantic feature queries that each platform implements correctly:

# vllm/platforms/interface.py

class Platform:
    # === Existing (keep) ===
    def supports_fp8(cls) -> bool: ...      # keep for backward compat, alias to supports_fp8_native()
    def supports_mx(cls) -> bool: ...
    def support_deep_gemm(cls) -> bool: ...
    
    # === New feature queries ===
    
    @classmethod
    def supports_bf16(cls) -> bool:
        """Returns whether the current platform supports BF16 compute."""
        return False
    
    @classmethod
    def supports_fp8_native(cls) -> bool:
        """Returns whether the current platform has native FP8 tensor core compute.
        
        This means the hardware can natively perform matrix multiplication in FP8
        (e.g., NVIDIA SM ≥ 89 Ada/Hopper/Blackwell server, AMD gfx942/gfx950).
        Used for: native FP8 GEMM, FP8 KV cache, FP8 attention.
        """
        return False

    @classmethod
    def supports_fp8_woq(cls) -> bool:
        """Returns whether the platform supports FP8 weight-only quantization.
        
        This means kernels like Marlin can load FP8-quantized weights and 
        dequantize them to FP16/BF16 for compute. Works on significantly older
        hardware than native FP8 (e.g., NVIDIA SM ≥ 75 Turing+).
        Used for: Marlin FP8, fbgemm FP8 fallback path.
        """
        return False

    @classmethod
    def supports_fp4(cls) -> bool:
        """Returns whether the current platform supports FP4 quantization 
        (native compute or equivalent)."""
        return False

    @classmethod
    def supports_tma(cls) -> bool:
        """Returns whether the current platform supports 
        Tensor Memory Accelerator (or equivalent async copy engine)."""
        return False

    @classmethod
    def supports_fp8_kv_cache(cls) -> bool:
        """Returns whether the current platform supports FP8 KV cache."""
        return cls.supports_fp8_native()

    @classmethod
    def get_architecture_family(cls) -> str:
        """Returns human-readable architecture family name.
        
        Examples: 'hopper', 'blackwell', 'blackwell_consumer','cdna3', 
                  'rdna4', 'ponte_vecchio', 'unknown'
        """
        return "unknown"

Why split FP8 into native vs WOQ?

The current supports_fp8() is ambiguous. In the codebase today:

# fbgemm_fp8.py — decides whether to use native FP8 or Marlin fallback
self.use_marlin = not current_platform.has_device_capability(89)

# marlin_utils_fp8.py — Marlin FP8 works on much older hardware
def is_fp8_marlin_supported():
    return current_platform.has_device_capability(75)  # Turing+!

With the split:

# fbgemm_fp8.py — clear intent
self.use_marlin = not current_platform.supports_fp8_native()

# marlin_utils_fp8.py — explicit WOQ check
def is_fp8_marlin_supported():
    return current_platform.supports_fp8_woq()

This distinction matters for:

  • Testing: A test for native FP8 GEMM should use @requires_feature("fp8_native"), while a test for Marlin FP8 WOQ should use @requires_feature("fp8_woq")
  • Consumer GPUs: RTX 5090 (SM 12.0) may support FP8 WOQ via Marlin but not native FP8 tensor core compute
  • ROCm: MI200 (gfx90a) has no FP8 at all, MI300 (gfx942) has native FP8-FNUZ, different Marlin support

Layer 2: Numeric Capability (Platform-Specific) — KEEP BUT DEPRECATE for cross-platform use

Keep has_device_capability() / is_device_capability() / is_device_capability_family() as-is for platform-specific kernel dispatch, but:

  1. Document that these are CUDA/ROCm-specific and must always be guarded by is_cuda() / is_rocm().
  2. Deprecation warning in tests when used without platform guard (enforce via lint rule).
  3. XPU/CPU/TPU continue to return None / False — this is correct behavior.
  4. For kernel dispatch: These remain the correct API when selecting between CUDA-specific kernel implementations (e.g., CUTLASS 2.x vs 3.x). Such code is inherently platform-specific and doesn't need cross-platform abstraction.

5. Migration Plan

Phase 0: Add feature methods to Platform (this RFC)

TaskEffort
Add supports_bf16(), supports_fp8_native(), supports_fp8_woq(), supports_fp4(), supports_tma(), supports_wgmma(), get_architecture_family(), get_device_tier() to base PlatformSmall
Implement in CudaPlatformBase, RocmPlatform, XPUPlatformSmall
Keep supports_fp8() as backward-compat alias → supports_fp8_native()Trivial
Add requires_feature() to tests/utils.pySmall

Phase 1: Convert feature-gated capability checks in vllm/ source

Convert ~60% of has_device_capability calls that are actually feature checks:

# Before
if current_platform.has_device_capability(89):
    # FP8 native path
elif current_platform.has_device_capability(75):
    # FP8 Marlin WOQ fallback

# After  
if current_platform.supports_fp8_native():
    # FP8 native path
elif current_platform.supports_fp8_woq():
    # FP8 Marlin WOQ fallback

Priority conversion targets (most impactful):

  • has_device_capability(89)supports_fp8_native() (~15 sites in vllm/)
  • has_device_capability(75) for Marlin → supports_fp8_woq() (~3 sites)
  • has_device_capability(80)supports_bf16() (~8 sites)
  • is_device_capability_family(100)supports_fp4() or architecture check (~20 sites)
  • not has_device_capability(89) for use_marlinnot supports_fp8_native() and supports_fp8_woq() (~5 sites)

Phase 2: Convert test skip patterns

Integrate with the test skip RFC (#39158) to migrate test files:

# Before (CUDA-only, broken on ROCm, wrong on consumer GPUs)
@pytest.mark.skipif(not current_platform.has_device_capability(89), reason="need fp8")

# After (cross-platform, correct on all SKUs)
@requires_feature("fp8_native")

Note: Since vLLM CI always uses NVIDIA server GPUs, the migration can be incremental — existing tests won't break, but new tests should use the feature-based API.

Phase 3: Lint enforcement

Add pre-commit hook to warn on new has_device_capability usage in test files without a is_cuda() / is_rocm() guard.


Feedback Period.

1-2 weeks.

CC List.

@tjtanaa

Any Other Things.

6. Comprehensive Device Capability Reference

6.1 NVIDIA CUDA — torch.cuda.get_device_capability()

Server / Data Center GPUs

DeviceCompute CapabilityArchitectureBF16FP8 NativeFP4TMA
GB300 / B300(10, 3)Blackwell
GB200 / B200(10, 0)Blackwell
H100 / H200(9, 0)Hopper
L4 / L40 / L40S(8, 9)Ada Lovelace
A40 / A10 / A16 / A2(8, 6)Ampere
A100(8, 0)Ampere
T4(7, 5)Turing
V100(7, 0)Volta

Workstation / Pro GPUs

DeviceCompute CapabilityArchitectureBF16FP8 NativeFP4TMA
RTX PRO 6000 Blackwell(12, 0)Blackwell (consumer SM)✅*
RTX 6000 Ada(8, 9)Ada Lovelace
RTX A6000 / A5000 / A4000(8, 6)Ampere
Quadro RTX(7, 5)Turing

Consumer / GeForce GPUs

DeviceCompute CapabilityArchitectureBF16FP8 NativeFP4TMA
RTX 5090/5080/5070/5060/5050(12, 0)Blackwell (consumer SM)❌*
RTX 4090/4080/4070/4060(8, 9)Ada Lovelace
RTX 3090/3080/3070/3060/3050(8, 6)Ampere
RTX 2080/2070/2060, Titan RTX(7, 5)Turing

Key insight: RTX PRO 6000 and RTX 5090 both report (12, 0), but RTX PRO supports FP8/FP4 while consumer RTX 5090 does not. Capability number alone is insufficient for feature detection on SM 12.0.

6.2 AMD ROCm — GCN Architecture → Mapped Capability

DeviceGCN ArchMapped CapabilityArchitectureBF16FP8 NativeFP4
MI4xx (future)gfx1250TBDCDNA next
MI355gfx950(9, 5)CDNA4✅ (OCP+FNUZ)
MI300/MI325gfx942(9, 4)CDNA3✅ (FNUZ)
MI200gfx90a(9, 0)CDNA2
Radeon (RDNA4)gfx12xx(12, 0)RDNA4
Radeon (RDNA3)gfx11xx(11, x)RDNA3

Key insight: ROCm's (9, 0) = MI200 (NO FP8) vs CUDA's (9, 0) = Hopper (HAS FP8). The same number has opposite meanings.

6.3 Feature to Capability Mapping — Why Numeric Checks Fail

FeatureCUDA Numeric GateWhy It Fails
FP8 Nativehas_device_capability(89)❌ RTX 5090 is (12,0)(8,9) but has no native FP8. ROCm MI200 maps to (9,0)(8,9) but has no FP8.
FP4is_device_capability_family(100)❌ Misses RTX PRO 6000 at (12,0).
BF16has_device_capability(80)⚠️ Works for CUDA server/workstation, but returns None on XPU (which does support BF16).
Blackwell featuresis_device_capability_family(100)❌ Misses workstation Blackwell at (12,0). Includes some non-Blackwell if future families reuse.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The most likely fix involves replacing numeric device capability checks with semantic feature queries to ensure cross-platform compatibility and accuracy.

Guidance

  1. Introduce feature-based queries: Expand the Platform base class with methods like supports_fp8_native(), supports_fp8_woq(), supports_bf16(), and supports_fp4() to provide a clear and cross-platform way to check for specific hardware features.
  2. Deprecate numeric capability checks for cross-platform use: While keeping has_device_capability() and similar methods for platform-specific kernel dispatch, deprecate their use for cross-platform feature checks and encourage the use of feature-based queries instead.
  3. Migrate existing code: Gradually convert existing has_device_capability() calls to use the new feature-based queries, prioritizing tests and config/runtime code for the highest cross-platform impact.
  4. Enforce lint rules: Implement pre-commit hooks to warn against new usage of has_device_capability() without proper platform guards, ensuring that future code adheres to the new guidelines.

Example

# Before
if current_platform.has_device_capability(89):
    # FP8 native path

# After
if current_platform.supports_fp8_native():
    # FP8 native path

Notes

  • The migration should be incremental, starting with the most impactful areas such as tests and config/runtime code.
  • The introduction of feature-based queries does not immediately render numeric capability checks obsolete for all use cases, especially within platform-specific kernel dispatch code.
  • Documentation and clear guidelines are crucial for a smooth transition to the new feature-based API.

Recommendation

Apply the workaround by introducing and gradually migrating to the feature-based queries, ensuring a more robust and cross-platform compatible codebase. This approach allows for clearer intent in code, reduces maintenance burdens due to changing hardware capabilities, and improves the overall reliability of feature checks across different platforms.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [RFC]: Unified Device Capability Abstraction for Cross-Platform Feature Detection [1 comments, 2 participants]