vllm - 💡(How to fix) Fix [Feature]: General LL GEMMs with PDL Support [3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38772Fetched 2026-04-08 02:22:47
View on GitHub
Comments
3
Participants
3
Timeline
4
Reactions
0
Timeline (top)
commented ×3labeled ×1
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

We currently have two highly specialized low latency GEMMS that support PDL after AR

These:

  • currently only support specific shapes
  • only support bf16

We should create generalized versions of these low-latency PDL enabled GEMMS, which support:

  • bf16
  • fp8
  • nvfp4
  • arbitrary shapes

Alternatives

None

Additional context

PDL fix in flashinfer: https://github.com/flashinfer-ai/flashinfer/issues/2887

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Create generalized versions of the low-latency GEMMS to support multiple data types (bf16, fp8, nvfp4) and arbitrary shapes.

Guidance

  • Identify the current limitations of dsv3_a_gemm and dsv3_router_gemm in terms of supported shapes and data types.
  • Determine the requirements for the generalized GEMMS, including support for bf16, fp8, nvfp4, and arbitrary shapes.
  • Review the PDL fix in flashinfer (issue #2887) for potential insights or solutions that can be applied to the generalized GEMMS.
  • Consider the potential impact of supporting arbitrary shapes on performance and latency.

Example

No specific code snippet can be provided without more context, but the generalized GEMMS could potentially involve modifying the existing dsv3_a_gemm and dsv3_router_gemm functions to accept more flexible shape and data type parameters.

Notes

The creation of generalized GEMMS may require significant changes to the existing codebase, and careful consideration should be given to ensuring that the new implementations maintain the low-latency performance of the current specialized GEMMS.

Recommendation

Apply workaround: Create generalized versions of the low-latency GEMMS to support multiple data types and arbitrary shapes, as this will provide the most flexibility and compatibility for future use cases.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING