vllm - ✅(Solved) Fix Request for attribution: Multi-ISA CPU dispatcher work (PR #35466) [3 pull requests, 6 comments, 2 participants]

vllm2026-04-03 20:42:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38942•Fetched 2026-04-08 02:44:48

View on GitHub

Comments

Participants

Timeline

Reactions

Author

MekayelAnik

Participants

MekayelAnik

wjhrdy

Timeline (top)

commented ×6subscribed ×5mentioned ×4cross-referenced ×2

My original work on a Python dispatcher for multi-ISA CPU support — contributed to dtrifiro/vllm PR #9 in December 2025 — was used without attribution in vllm-project/vllm PR #35466, merged on 2026-02-28. A clear lineage exists through an intermediate PR (#35346) that explicitly rebased my commits, was closed, and replaced the next day with a reimplementation that gave no credit.

Root Cause

Fix Action

Fix / Workaround

Attribution Report: Multi-ISA CPU Dispatcher Contribution

Detail	Value
Repository	dtrifiro/vllm
PR	#9 — "fix: Add Python dispatcher for multi-ISA CPU support"
Author	MekayelAnik (MD. MEKAYEL ANIK)
Created	2025-12-19
Merged	2025-12-22 (into `cpu-build-dispatcher` branch)
Commits	3 commits, all authored by MekayelAnik

PR fix notes

PR #9: fix: Add Python dispatcher for multi-ISA CPU support

Repository: dtrifiro/vllm
Author: MekayelAnik
State: closed | merged: True
Link: https://github.com/dtrifiro/vllm/pull/9

Description (problem / solution / changelog)

Summary

This PR fixes the namespace mismatch issue that causes both AVX2 and AVX512 paths to crash during inference.

Root Cause: The two extensions register to different namespaces:

_C.so (AVX2) registers → torch.ops._C.*
_C_avx512.so (AVX512) registers → torch.ops._C_avx512.*

But vLLM Python code hardcodes calls to torch.ops._C.something(), which fails when the AVX512 extension is loaded (and vice versa).

Changes

File	Changes
`vllm/_ops_dispatch.py`	NEW - Dispatcher module with `get_ops()`, `get_utils()`, `has_op()`
`vllm/_custom_ops.py`	Updated 83 `torch.ops._C.` → `get_ops().`, 13 `hasattr` → `has_op()`
`vllm/v1/worker/cpu_worker.py`	Replaced try/except hack with `get_utils()`
`docs/runtime-isa-dispatch.md`	Full documentation and implementation guide

How It Works

# _ops_dispatch.py detects which extension is loaded
def get_ops():
    if hasattr(torch.ops._C_avx512, 'silu_and_mul'):
        return torch.ops._C_avx512
    return torch.ops._C

# Call sites now use dispatcher
get_ops().silu_and_mul(out, x)  # Routes to correct extension

Testing Checklist

AVX2 system: import vllm succeeds
AVX2 system: LLM("facebook/opt-125m") loads
AVX2 system: llm.generate("Hello") completes
AVX512 system: import vllm succeeds
AVX512 system: LLM("facebook/opt-125m") loads
AVX512 system: llm.generate("Hello") completes

Test Command

python -c "
from vllm._ops_dispatch import _detect_cpu_extension
print(f'Detected extension: {_detect_cpu_extension()}')

from vllm import LLM
llm = LLM('facebook/opt-125m')
print(llm.generate('Hello'))
"

Changed files

docs/runtime-isa-dispatch.md (added, +1928/-0)
vllm/_custom_ops.py (modified, +101/-97)
vllm/_ops_dispatch.py (added, +199/-0)
vllm/compilation/activation_quant_fusion.py (modified, +6/-5)
vllm/compilation/collective_fusion.py (modified, +5/-4)
vllm/compilation/fix_functionalization.py (modified, +12/-11)
vllm/compilation/fusion.py (modified, +18/-17)
vllm/compilation/fusion_attn.py (modified, +2/-1)
vllm/compilation/matcher_utils.py (modified, +12/-11)
vllm/compilation/qk_norm_rope_fusion.py (modified, +2/-1)
vllm/distributed/device_communicators/cpu_communicator.py (modified, +9/-8)
vllm/distributed/parallel_state.py (modified, +3/-2)
vllm/model_executor/layers/activation.py (modified, +10/-9)
vllm/model_executor/layers/fused_moe/batched_deep_gemm_moe.py (modified, +2/-1)
vllm/model_executor/layers/fused_moe/cpu_fused_moe.py (modified, +3/-2)
vllm/model_executor/layers/fused_moe/cutlass_moe.py (modified, +3/-2)
vllm/model_executor/layers/fused_moe/fused_marlin_moe.py (modified, +3/-2)
vllm/model_executor/layers/fused_moe/fused_moe.py (modified, +4/-3)
vllm/model_executor/layers/fused_moe/modular_kernel.py (modified, +4/-3)
vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py (modified, +3/-2)
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py (modified, +2/-1)
vllm/model_executor/layers/quantization/gguf.py (modified, +3/-2)
vllm/model_executor/layers/quantization/kernels/scaled_mm/cpu.py (modified, +3/-2)
vllm/model_executor/layers/quantization/utils/fp8_utils.py (modified, +3/-2)
vllm/model_executor/layers/quantization/utils/int8_utils.py (modified, +2/-1)
vllm/model_executor/layers/utils.py (modified, +3/-2)
vllm/model_executor/models/deepseek_v2.py (modified, +3/-2)
vllm/utils/torch_utils.py (modified, +3/-2)
vllm/v1/worker/cpu_worker.py (modified, +3/-6)

PR #35466: [CI/Build] CPU release supports both of AVX2 and AVX512

Repository: vllm-project/vllm
Author: majian4work
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/35466

Description (problem / solution / changelog)

A simple version to support multiple ISAs in one wheel.

Changed files

cmake/cpu_extension.cmake (modified, +118/-132)
csrc/cpu/torch_bindings.cpp (modified, +7/-10)
setup.py (modified, +10/-1)
vllm/_custom_ops.py (modified, +1/-3)
vllm/platforms/cpu.py (modified, +24/-0)
vllm/v1/worker/cpu_worker.py (modified, +1/-1)

PR #35346: Cpu dispatcher

Repository: vllm-project/vllm
Author: majian4work
State: closed | merged: False
Link: https://github.com/vllm-project/vllm/pull/35346

Description (problem / solution / changelog)

Rebase https://github.com/dtrifiro/vllm/tree/cpu-build-dispatcher-cleanup

Changed files

CMakeLists.txt (modified, +18/-17)
benchmarks/kernels/benchmark_2d_silu_mul_fp8_quant.py (modified, +2/-1)
benchmarks/kernels/benchmark_fused_collective.py (modified, +6/-7)
cmake/cpu_extension.cmake (modified, +120/-117)
requirements/common.txt (modified, +1/-1)
setup.py (modified, +3/-1)
vllm/_custom_ops.py (modified, +116/-120)
vllm/_ops_dispatch.py (added, +203/-0)
vllm/compilation/passes/fusion/act_quant_fusion.py (modified, +6/-5)
vllm/compilation/passes/fusion/allreduce_rms_fusion.py (modified, +3/-2)
vllm/compilation/passes/fusion/attn_quant_fusion.py (modified, +2/-1)
vllm/compilation/passes/fusion/collective_fusion.py (modified, +3/-2)
vllm/compilation/passes/fusion/matcher_utils.py (modified, +12/-11)
vllm/compilation/passes/fusion/qk_norm_rope_fusion.py (modified, +2/-1)
vllm/compilation/passes/fusion/rms_quant_fusion.py (modified, +18/-18)
vllm/compilation/passes/utility/fix_functionalization.py (modified, +12/-11)
vllm/compilation/passes/utility/scatter_split_replace.py (modified, +4/-3)
vllm/distributed/device_communicators/cpu_communicator.py (modified, +9/-8)
vllm/distributed/parallel_state.py (modified, +3/-2)
vllm/kernels/helion/ops/silu_mul_fp8.py (modified, +2/-1)
vllm/model_executor/kernels/linear/scaled_mm/cpu.py (modified, +3/-2)
vllm/model_executor/layers/activation.py (modified, +10/-9)
vllm/model_executor/layers/fused_moe/activation.py (modified, +5/-3)
vllm/model_executor/layers/fused_moe/batched_deep_gemm_moe.py (modified, +2/-1)
vllm/model_executor/layers/fused_moe/cpu_fused_moe.py (modified, +3/-2)
vllm/model_executor/layers/fused_moe/cutlass_moe.py (modified, +3/-2)
vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py (modified, +3/-4)
vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py (modified, +2/-1)
vllm/model_executor/layers/quantization/utils/fp8_utils.py (modified, +3/-2)
vllm/model_executor/layers/quantization/utils/int8_utils.py (modified, +2/-1)
vllm/model_executor/layers/sparse_attn_indexer.py (modified, +5/-4)
vllm/model_executor/layers/utils.py (modified, +3/-2)
vllm/platforms/cpu.py (modified, +50/-0)
vllm/utils/torch_utils.py (modified, +4/-3)
vllm/v1/attention/ops/rocm_aiter_mla_sparse.py (modified, +3/-2)
vllm/v1/worker/cpu_worker.py (modified, +4/-1)

RAW_BUFFERClick to expand / collapse

Attribution Report: Multi-ISA CPU Dispatcher Contribution

Prepared by: Mohammad Mekayel Anik (@MekayelAnik) Date: 2026-04-04 Regarding: Unattributed use of prior work in vllm-project/vllm#35466

Summary

Timeline of Events

1. Original Contribution — December 2025

Detail	Value
Repository	dtrifiro/vllm
PR	#9 — "fix: Add Python dispatcher for multi-ISA CPU support"
Author	MekayelAnik (MD. MEKAYEL ANIK)
Created	2025-12-19
Merged	2025-12-22 (into `cpu-build-dispatcher` branch)
Commits	3 commits, all authored by MekayelAnik

What the contribution did:

Created vllm/_ops_dispatch.py — a new Python dispatcher module providing get_ops(), get_utils(), has_op(), and _detect_cpu_extension() functions
Updated 83+ torch.ops._C.* call sites across 23+ files to use the dispatcher
Replaced hasattr(torch.ops._C, ...) checks with has_op() across compilation, distributed, and model executor files

Problem solved:

The two CPU extensions (_C.so for AVX2/generic and _C_avx512.so for AVX512) registered to different torch.ops namespaces, causing runtime crashes when code assumed a single namespace. My dispatcher detected which extension was loaded and routed calls to the correct namespace at runtime — enabling multi-ISA CPU support in a single wheel.

2. First Upstream Attempt — 2026-02-26

Detail	Value
Repository	vllm-project/vllm
PR	#35346 (closed, not merged)
Author	`majian4work` (Ma Jian, Intel)
Created	2026-02-26
Status	Closed without merging

Key evidence:

The PR description explicitly states: "Rebase https://github.com/dtrifiro/vllm/tree/cpu-build-dispatcher-cleanup"
This PR directly contains all 3 of my commits with my authorship preserved
My file vllm/_ops_dispatch.py is included in this PR
All commits are re-signed as Signed-off-by: Ma Jian <[email protected]> but retain my original authorship
The PR was closed due to build issues flagged by CI/review bots

3. Reimplementation Without Attribution — 2026-02-27

Detail	Value
Repository	vllm-project/vllm
PR	#35466 — "[CI/Build] CPU release supports both of AVX2 and AVX512"
Author	`majian4work` (Ma Jian, Intel)
Created	2026-02-27 (one day after PR #35346)
Merged	2026-02-28

Key facts:

Opened one day after the first attempt (PR #35346) that contained my commits was closed
Solves the exact same problem: multi-ISA CPU support in a single wheel
Description is simply: "A simple version to support multiple ISAs in one wheel"
Zero attribution to me (MekayelAnik), Daniele Trifiro (dtrifiro), PR #35346, or the cpu-build-dispatcher branch
Commits authored by jiang1.li / Li, Jiang (Intel) with only Signed-off-by: jiang1.li <[email protected]>

Technical Comparison

Aspect	My Approach (dtrifiro/vllm PR #9)	Merged Upstream (PR #35466)
Problem solved	Multi-ISA CPU support (AVX2 + AVX512 in one wheel)	Identical
Dispatch mechanism	Python-level dispatcher (`_ops_dispatch.py`) routing `torch.ops._C.*` calls to correct namespace at runtime	C++ level: `#define TORCH_EXTENSION_NAME _C` forces both extensions to register under `torch.ops._C`, then Python `import_kernels()` loads the right `.so`
ISA detection	Python `_detect_cpu_extension()` checking `hasattr(torch.ops, '_C_avx512')`	`torch.cpu._is_avx512_supported()` in `CpuPlatform.import_kernels()`
Python call site changes	83+ call sites modified to use `get_ops().xxx`	No Python call site changes needed
Files modified	29 files	6 files
Scope	Comprehensive Python-side refactor	Build-system-focused fix

While the final implementation uses a different mechanism (C++ macro vs. Python dispatcher), the underlying problem identification, the concept of runtime ISA detection, and the goal of multi-ISA CPU support in a single wheel are directly derived from the work on the cpu-build-dispatcher branch where I was a key contributor.

Evidence of Lineage

The chain of evidence is unambiguous:

2025-12-19: I create the multi-ISA dispatcher solution on dtrifiro/vllm
2025-12-22: My PR #9 is merged into dtrifiro's cpu-build-dispatcher branch
2026-02-26: majian4work opens upstream PR #35346, explicitly rebasing dtrifiro's cpu-build-dispatcher branch — which includes my 3 commits
2026-02-27: PR #35346 is closed; majian4work opens PR #35466 the next day — a "simple version" solving the same problem with no attribution whatsoever
2026-02-28: PR #35466 is merged into vllm-project/vllm

The one-day turnaround between closing PR #35346 (which contained my work) and opening PR #35466 (which reimplemented it) strongly demonstrates that PR #35466 was directly informed by my prior work.

Request

I respectfully request that the vLLM project:

Acknowledge that the multi-ISA CPU dispatcher work in PR #35466 was informed by prior work on the cpu-build-dispatcher branch of dtrifiro/vllm, to which I (MekayelAnik) was a contributor
Add attribution in the form of a comment on PR #35466, a mention in release notes, or a Co-authored-by acknowledgment
Grant contributor status — my work directly contributed to solving this problem and I should be recognized as a contributor to the vLLM project. My commits exist in PR #35346 with my authorship intact, proving my contribution. I request to be added to the project's contributors list
Establish clear guidelines for attributing prior art when reimplementing community contributions

Appendix: Verifiable Links

My original PR: https://github.com/dtrifiro/vllm/pull/9
PR containing my commits (closed): https://github.com/vllm-project/vllm/pull/35346
Merged PR without attribution: https://github.com/vllm-project/vllm/pull/35466
dtrifiro/vllm cpu-build-dispatcher branch: https://github.com/dtrifiro/vllm/tree/cpu-build-dispatcher
My GitHub profile: https://github.com/MekayelAnik

extent analysis

TL;DR

The vLLM project should acknowledge and add attribution for the multi-ISA CPU dispatcher work in PR #35466, which was informed by prior work on the cpu-build-dispatcher branch of dtrifiro/vllm.

Guidance

Review the commit history and PR descriptions to verify the lineage of the multi-ISA CPU dispatcher work.
Add a comment on PR #35466 acknowledging the prior work and attributing the contribution to MekayelAnik.
Consider adding a Co-authored-by acknowledgment or mentioning the contribution in release notes.
Establish clear guidelines for attributing prior art when reimplementing community contributions to avoid similar issues in the future.

Example

No code snippet is necessary in this case, as the issue is related to attribution and contributor recognition rather than a technical problem.

Notes

The resolution of this issue depends on the vLLM project's policies and procedures for handling contributor recognition and attribution. The project maintainers should review their guidelines and ensure that they are fair and transparent.

Recommendation

Apply workaround: Add attribution to PR #35466 and establish clear guidelines for attributing prior art to avoid similar issues in the future. This will help to recognize the contribution of MekayelAnik and maintain a positive and inclusive community around the vLLM project.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#environment variable #network issue #logging issue #authentication issue #prompt issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix Request for attribution: Multi-ISA CPU dispatcher work (PR #35466) [3 pull requests, 6 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Attribution Report: Multi-ISA CPU Dispatcher Contribution

PR fix notes

PR #9: fix: Add Python dispatcher for multi-ISA CPU support

Description (problem / solution / changelog)

Summary

Changes

How It Works

Testing Checklist

Test Command

Changed files

PR #35466: [CI/Build] CPU release supports both of AVX2 and AVX512

Description (problem / solution / changelog)

Changed files

PR #35346: Cpu dispatcher

Description (problem / solution / changelog)

Changed files

Attribution Report: Multi-ISA CPU Dispatcher Contribution

Summary

Timeline of Events

1. Original Contribution — December 2025

2. First Upstream Attempt — 2026-02-26

3. Reimplementation Without Attribution — 2026-02-27

Technical Comparison

Evidence of Lineage

Request

Appendix: Verifiable Links

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING