vllm - 💡(How to fix) Fix [vLLM IR] Op test & benchmark infra [1 comments, 2 participants]

vllm2026-04-02 03:07:56

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38782•Fetched 2026-04-08 02:22:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Assignees

Timeline (top)

assigned ×1commented ×1issue_type_added ×1labeled ×1

RAW_BUFFERClick to expand / collapse

We should create infra & utilities to make it very easy to add tests & benchmarks for new ops. I think we should have tests for every op that test all supported providers and check their semantics match native and that native semantics are roughly as expected (e.g. norm(2x) ~ norm(x)). Also we should opcheck each provider to check args are not mutated and outputs do not alias inputs (except in the maybe_inplace case).

Then we should also create utilities such that each provider can write their own tests comparing to native if they want, especially validating that supports_args works as intended. Not sure how to reduce duplication here yet.

For benchmarks, we should have a single driver file, and each op should register a get_inputs(case: BenchmarkCase) -> list[torch.Tensor] function (BenchmarkCase(num_tokens: int, hidden_size: int)). And then we should have a fixed set of benchmark cases corresponding to hidden_sizes from common models. This should be enough to start, and then later ops can extend BenchmarkCase to add more parameters (e.g. group_size for group quant).

extent analysis

TL;DR

Create infrastructure and utilities to simplify adding tests and benchmarks for new operations, including tests for provider semantics and benchmarks with a standardized driver file.

Guidance

Develop a test framework that checks the semantics of each operation across all supported providers, ensuring consistency with native implementations.
Design a utility for providers to write custom tests, particularly for validating supports_args functionality, to reduce duplication.
Establish a benchmarking system with a single driver file, where each operation registers a get_inputs function for generating benchmark inputs based on BenchmarkCase parameters.
Define a set of standard BenchmarkCase parameters, such as hidden_size, to ensure consistent benchmarking across operations.

Example

class BenchmarkCase:
    def __init__(self, num_tokens: int, hidden_size: int):
        self.num_tokens = num_tokens
        self.hidden_size = hidden_size

def get_inputs(case: BenchmarkCase) -> list[torch.Tensor]:
    # Operation-specific implementation to generate inputs based on BenchmarkCase
    pass

Notes

The proposed solution focuses on creating a structured approach to testing and benchmarking operations. However, the exact implementation details, such as how to reduce duplication in provider tests, are left to be determined.

Recommendation

Apply workaround by starting with the proposed infrastructure and utilities, and iteratively refine them based on the needs of each operation and provider, as the issue suggests a phased approach to developing these tests and benchmarks.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#serialization error #model compatibility #GPU setup #container setup #orchestration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [vLLM IR] Op test & benchmark infra [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [vLLM IR] Op test & benchmark infra [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING