vllm - ✅(Solved) Fix Request for attribution: Multi-ISA CPU dispatcher work (PR #35466) [3 pull requests, 6 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38942Fetched 2026-04-08 02:44:48
View on GitHub
Comments
6
Participants
2
Timeline
17
Reactions
0
Participants
Timeline (top)
commented ×6subscribed ×5mentioned ×4cross-referenced ×2

My original work on a Python dispatcher for multi-ISA CPU support — contributed to dtrifiro/vllm PR #9 in December 2025 — was used without attribution in vllm-project/vllm PR #35466, merged on 2026-02-28. A clear lineage exists through an intermediate PR (#35346) that explicitly rebased my commits, was closed, and replaced the next day with a reimplementation that gave no credit.


Root Cause

My original work on a Python dispatcher for multi-ISA CPU support — contributed to dtrifiro/vllm PR #9 in December 2025 — was used without attribution in vllm-project/vllm PR #35466, merged on 2026-02-28. A clear lineage exists through an intermediate PR (#35346) that explicitly rebased my commits, was closed, and replaced the next day with a reimplementation that gave no credit.


Fix Action

Fix / Workaround

Attribution Report: Multi-ISA CPU Dispatcher Contribution

My original work on a Python dispatcher for multi-ISA CPU support — contributed to dtrifiro/vllm PR #9 in December 2025 — was used without attribution in vllm-project/vllm PR #35466, merged on 2026-02-28. A clear lineage exists through an intermediate PR (#35346) that explicitly rebased my commits, was closed, and replaced the next day with a reimplementation that gave no credit.

DetailValue
Repositorydtrifiro/vllm
PR#9 — "fix: Add Python dispatcher for multi-ISA CPU support"
AuthorMekayelAnik (MD. MEKAYEL ANIK)
Created2025-12-19
Merged2025-12-22 (into cpu-build-dispatcher branch)
Commits3 commits, all authored by MekayelAnik

PR fix notes

PR #9: fix: Add Python dispatcher for multi-ISA CPU support

Description (problem / solution / changelog)

Summary

This PR fixes the namespace mismatch issue that causes both AVX2 and AVX512 paths to crash during inference.

Root Cause: The two extensions register to different namespaces:

  • _C.so (AVX2) registers → torch.ops._C.*
  • _C_avx512.so (AVX512) registers → torch.ops._C_avx512.*

But vLLM Python code hardcodes calls to torch.ops._C.something(), which fails when the AVX512 extension is loaded (and vice versa).

Changes

FileChanges
vllm/_ops_dispatch.pyNEW - Dispatcher module with get_ops(), get_utils(), has_op()
vllm/_custom_ops.pyUpdated 83 torch.ops._C.get_ops()., 13 hasattrhas_op()
vllm/v1/worker/cpu_worker.pyReplaced try/except hack with get_utils()
docs/runtime-isa-dispatch.mdFull documentation and implementation guide

How It Works

# _ops_dispatch.py detects which extension is loaded
def get_ops():
    if hasattr(torch.ops._C_avx512, 'silu_and_mul'):
        return torch.ops._C_avx512
    return torch.ops._C

# Call sites now use dispatcher
get_ops().silu_and_mul(out, x)  # Routes to correct extension

Testing Checklist

  • AVX2 system: import vllm succeeds
  • AVX2 system: LLM("facebook/opt-125m") loads
  • AVX2 system: llm.generate("Hello") completes
  • AVX512 system: import vllm succeeds
  • AVX512 system: LLM("facebook/opt-125m") loads
  • AVX512 system: llm.generate("Hello") completes

Test Command

python -c "
from vllm._ops_dispatch import _detect_cpu_extension
print(f'Detected extension: {_detect_cpu_extension()}')

from vllm import LLM
llm = LLM('facebook/opt-125m')
print(llm.generate('Hello'))
"

Changed files

  • docs/runtime-isa-dispatch.md (added, +1928/-0)
  • vllm/_custom_ops.py (modified, +101/-97)
  • vllm/_ops_dispatch.py (added, +199/-0)
  • vllm/compilation/activation_quant_fusion.py (modified, +6/-5)
  • vllm/compilation/collective_fusion.py (modified, +5/-4)
  • vllm/compilation/fix_functionalization.py (modified, +12/-11)
  • vllm/compilation/fusion.py (modified, +18/-17)
  • vllm/compilation/fusion_attn.py (modified, +2/-1)
  • vllm/compilation/matcher_utils.py (modified, +12/-11)
  • vllm/compilation/qk_norm_rope_fusion.py (modified, +2/-1)
  • vllm/distributed/device_communicators/cpu_communicator.py (modified, +9/-8)
  • vllm/distributed/parallel_state.py (modified, +3/-2)
  • vllm/model_executor/layers/activation.py (modified, +10/-9)
  • vllm/model_executor/layers/fused_moe/batched_deep_gemm_moe.py (modified, +2/-1)
  • vllm/model_executor/layers/fused_moe/cpu_fused_moe.py (modified, +3/-2)
  • vllm/model_executor/layers/fused_moe/cutlass_moe.py (modified, +3/-2)
  • vllm/model_executor/layers/fused_moe/fused_marlin_moe.py (modified, +3/-2)
  • vllm/model_executor/layers/fused_moe/fused_moe.py (modified, +4/-3)
  • vllm/model_executor/layers/fused_moe/modular_kernel.py (modified, +4/-3)
  • vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py (modified, +3/-2)
  • vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py (modified, +2/-1)
  • vllm/model_executor/layers/quantization/gguf.py (modified, +3/-2)
  • vllm/model_executor/layers/quantization/kernels/scaled_mm/cpu.py (modified, +3/-2)
  • vllm/model_executor/layers/quantization/utils/fp8_utils.py (modified, +3/-2)
  • vllm/model_executor/layers/quantization/utils/int8_utils.py (modified, +2/-1)
  • vllm/model_executor/layers/utils.py (modified, +3/-2)
  • vllm/model_executor/models/deepseek_v2.py (modified, +3/-2)
  • vllm/utils/torch_utils.py (modified, +3/-2)
  • vllm/v1/worker/cpu_worker.py (modified, +3/-6)

PR #35466: [CI/Build] CPU release supports both of AVX2 and AVX512

Description (problem / solution / changelog)

A simple version to support multiple ISAs in one wheel.

Changed files

  • cmake/cpu_extension.cmake (modified, +118/-132)
  • csrc/cpu/torch_bindings.cpp (modified, +7/-10)
  • setup.py (modified, +10/-1)
  • vllm/_custom_ops.py (modified, +1/-3)
  • vllm/platforms/cpu.py (modified, +24/-0)
  • vllm/v1/worker/cpu_worker.py (modified, +1/-1)

PR #35346: Cpu dispatcher

Description (problem / solution / changelog)

Rebase https://github.com/dtrifiro/vllm/tree/cpu-build-dispatcher-cleanup

Changed files

  • CMakeLists.txt (modified, +18/-17)
  • benchmarks/kernels/benchmark_2d_silu_mul_fp8_quant.py (modified, +2/-1)
  • benchmarks/kernels/benchmark_fused_collective.py (modified, +6/-7)
  • cmake/cpu_extension.cmake (modified, +120/-117)
  • requirements/common.txt (modified, +1/-1)
  • setup.py (modified, +3/-1)
  • vllm/_custom_ops.py (modified, +116/-120)
  • vllm/_ops_dispatch.py (added, +203/-0)
  • vllm/compilation/passes/fusion/act_quant_fusion.py (modified, +6/-5)
  • vllm/compilation/passes/fusion/allreduce_rms_fusion.py (modified, +3/-2)
  • vllm/compilation/passes/fusion/attn_quant_fusion.py (modified, +2/-1)
  • vllm/compilation/passes/fusion/collective_fusion.py (modified, +3/-2)
  • vllm/compilation/passes/fusion/matcher_utils.py (modified, +12/-11)
  • vllm/compilation/passes/fusion/qk_norm_rope_fusion.py (modified, +2/-1)
  • vllm/compilation/passes/fusion/rms_quant_fusion.py (modified, +18/-18)
  • vllm/compilation/passes/utility/fix_functionalization.py (modified, +12/-11)
  • vllm/compilation/passes/utility/scatter_split_replace.py (modified, +4/-3)
  • vllm/distributed/device_communicators/cpu_communicator.py (modified, +9/-8)
  • vllm/distributed/parallel_state.py (modified, +3/-2)
  • vllm/kernels/helion/ops/silu_mul_fp8.py (modified, +2/-1)
  • vllm/model_executor/kernels/linear/scaled_mm/cpu.py (modified, +3/-2)
  • vllm/model_executor/layers/activation.py (modified, +10/-9)
  • vllm/model_executor/layers/fused_moe/activation.py (modified, +5/-3)
  • vllm/model_executor/layers/fused_moe/batched_deep_gemm_moe.py (modified, +2/-1)
  • vllm/model_executor/layers/fused_moe/cpu_fused_moe.py (modified, +3/-2)
  • vllm/model_executor/layers/fused_moe/cutlass_moe.py (modified, +3/-2)
  • vllm/model_executor/layers/fused_moe/unquantized_fused_moe_method.py (modified, +3/-4)
  • vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py (modified, +2/-1)
  • vllm/model_executor/layers/quantization/utils/fp8_utils.py (modified, +3/-2)
  • vllm/model_executor/layers/quantization/utils/int8_utils.py (modified, +2/-1)
  • vllm/model_executor/layers/sparse_attn_indexer.py (modified, +5/-4)
  • vllm/model_executor/layers/utils.py (modified, +3/-2)
  • vllm/platforms/cpu.py (modified, +50/-0)
  • vllm/utils/torch_utils.py (modified, +4/-3)
  • vllm/v1/attention/ops/rocm_aiter_mla_sparse.py (modified, +3/-2)
  • vllm/v1/worker/cpu_worker.py (modified, +4/-1)
RAW_BUFFERClick to expand / collapse

Attribution Report: Multi-ISA CPU Dispatcher Contribution

Prepared by: Mohammad Mekayel Anik (@MekayelAnik) Date: 2026-04-04 Regarding: Unattributed use of prior work in vllm-project/vllm#35466


Summary

My original work on a Python dispatcher for multi-ISA CPU support — contributed to dtrifiro/vllm PR #9 in December 2025 — was used without attribution in vllm-project/vllm PR #35466, merged on 2026-02-28. A clear lineage exists through an intermediate PR (#35346) that explicitly rebased my commits, was closed, and replaced the next day with a reimplementation that gave no credit.


Timeline of Events

1. Original Contribution — December 2025

DetailValue
Repositorydtrifiro/vllm
PR#9 — "fix: Add Python dispatcher for multi-ISA CPU support"
AuthorMekayelAnik (MD. MEKAYEL ANIK)
Created2025-12-19
Merged2025-12-22 (into cpu-build-dispatcher branch)
Commits3 commits, all authored by MekayelAnik

What the contribution did:

  • Created vllm/_ops_dispatch.py — a new Python dispatcher module providing get_ops(), get_utils(), has_op(), and _detect_cpu_extension() functions
  • Updated 83+ torch.ops._C.* call sites across 23+ files to use the dispatcher
  • Replaced hasattr(torch.ops._C, ...) checks with has_op() across compilation, distributed, and model executor files

Problem solved:

The two CPU extensions (_C.so for AVX2/generic and _C_avx512.so for AVX512) registered to different torch.ops namespaces, causing runtime crashes when code assumed a single namespace. My dispatcher detected which extension was loaded and routed calls to the correct namespace at runtime — enabling multi-ISA CPU support in a single wheel.

2. First Upstream Attempt — 2026-02-26

DetailValue
Repositoryvllm-project/vllm
PR#35346 (closed, not merged)
Authormajian4work (Ma Jian, Intel)
Created2026-02-26
StatusClosed without merging

Key evidence:

3. Reimplementation Without Attribution — 2026-02-27

DetailValue
Repositoryvllm-project/vllm
PR#35466 — "[CI/Build] CPU release supports both of AVX2 and AVX512"
Authormajian4work (Ma Jian, Intel)
Created2026-02-27 (one day after PR #35346)
Merged2026-02-28

Key facts:

  • Opened one day after the first attempt (PR #35346) that contained my commits was closed
  • Solves the exact same problem: multi-ISA CPU support in a single wheel
  • Description is simply: "A simple version to support multiple ISAs in one wheel"
  • Zero attribution to me (MekayelAnik), Daniele Trifiro (dtrifiro), PR #35346, or the cpu-build-dispatcher branch
  • Commits authored by jiang1.li / Li, Jiang (Intel) with only Signed-off-by: jiang1.li <[email protected]>

Technical Comparison

AspectMy Approach (dtrifiro/vllm PR #9)Merged Upstream (PR #35466)
Problem solvedMulti-ISA CPU support (AVX2 + AVX512 in one wheel)Identical
Dispatch mechanismPython-level dispatcher (_ops_dispatch.py) routing torch.ops._C.* calls to correct namespace at runtimeC++ level: #define TORCH_EXTENSION_NAME _C forces both extensions to register under torch.ops._C, then Python import_kernels() loads the right .so
ISA detectionPython _detect_cpu_extension() checking hasattr(torch.ops, '_C_avx512')torch.cpu._is_avx512_supported() in CpuPlatform.import_kernels()
Python call site changes83+ call sites modified to use get_ops().xxxNo Python call site changes needed
Files modified29 files6 files
ScopeComprehensive Python-side refactorBuild-system-focused fix

While the final implementation uses a different mechanism (C++ macro vs. Python dispatcher), the underlying problem identification, the concept of runtime ISA detection, and the goal of multi-ISA CPU support in a single wheel are directly derived from the work on the cpu-build-dispatcher branch where I was a key contributor.


Evidence of Lineage

The chain of evidence is unambiguous:

  1. 2025-12-19: I create the multi-ISA dispatcher solution on dtrifiro/vllm
  2. 2025-12-22: My PR #9 is merged into dtrifiro's cpu-build-dispatcher branch
  3. 2026-02-26: majian4work opens upstream PR #35346, explicitly rebasing dtrifiro's cpu-build-dispatcher branch — which includes my 3 commits
  4. 2026-02-27: PR #35346 is closed; majian4work opens PR #35466 the next day — a "simple version" solving the same problem with no attribution whatsoever
  5. 2026-02-28: PR #35466 is merged into vllm-project/vllm

The one-day turnaround between closing PR #35346 (which contained my work) and opening PR #35466 (which reimplemented it) strongly demonstrates that PR #35466 was directly informed by my prior work.


Request

I respectfully request that the vLLM project:

  1. Acknowledge that the multi-ISA CPU dispatcher work in PR #35466 was informed by prior work on the cpu-build-dispatcher branch of dtrifiro/vllm, to which I (MekayelAnik) was a contributor
  2. Add attribution in the form of a comment on PR #35466, a mention in release notes, or a Co-authored-by acknowledgment
  3. Grant contributor status — my work directly contributed to solving this problem and I should be recognized as a contributor to the vLLM project. My commits exist in PR #35346 with my authorship intact, proving my contribution. I request to be added to the project's contributors list
  4. Establish clear guidelines for attributing prior art when reimplementing community contributions

Appendix: Verifiable Links

extent analysis

TL;DR

The vLLM project should acknowledge and add attribution for the multi-ISA CPU dispatcher work in PR #35466, which was informed by prior work on the cpu-build-dispatcher branch of dtrifiro/vllm.

Guidance

  • Review the commit history and PR descriptions to verify the lineage of the multi-ISA CPU dispatcher work.
  • Add a comment on PR #35466 acknowledging the prior work and attributing the contribution to MekayelAnik.
  • Consider adding a Co-authored-by acknowledgment or mentioning the contribution in release notes.
  • Establish clear guidelines for attributing prior art when reimplementing community contributions to avoid similar issues in the future.

Example

No code snippet is necessary in this case, as the issue is related to attribution and contributor recognition rather than a technical problem.

Notes

The resolution of this issue depends on the vLLM project's policies and procedures for handling contributor recognition and attribution. The project maintainers should review their guidelines and ensure that they are fair and transparent.

Recommendation

Apply workaround: Add attribution to PR #35466 and establish clear guidelines for attributing prior art to avoid similar issues in the future. This will help to recognize the contribution of MekayelAnik and maintain a positive and inclusive community around the vLLM project.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING