vllm - ✅(Solved) Fix [vLLM IR] Port activations to IR op [3 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38733Fetched 2026-04-08 02:23:07
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
0
Assignees
Timeline (top)
assigned ×2commented ×1issue_type_added ×1labeled ×1

PR fix notes

PR #39453: Port activations to IR op 1/3

Description (problem / solution / changelog)

Purpose

This PR rewrites the forward_* methods in SiluAndMul with a call to new vllm.ir.ops.silu_and_mul. This is to address the first point of https://github.com/vllm-project/vllm/issues/38733

Test Plan

Currently the existing tests are reused, i.e., running pytest tests/ currently passes.

TODO:

  • Adding new tests specifically for the new IR, similar to rms_norm

Test Result

Currently pending


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • docs/design/vllm_ir.md (added, +626/-0)
  • tests/compile/backend.py (modified, +15/-2)
  • tests/compile/passes/distributed/test_sequence_parallelism.py (modified, +13/-16)
  • tests/compile/passes/ir/test_clone_cleanup.py (added, +370/-0)
  • tests/compile/passes/ir/test_inplace_functionalization.py (added, +403/-0)
  • tests/compile/passes/test_functionalization.py (modified, +2/-2)
  • tests/compile/passes/test_fusion.py (modified, +4/-15)
  • tests/ir/test_compile.py (added, +167/-0)
  • tests/ir/test_inplace_op.py (added, +102/-0)
  • tests/ir/test_op.py (modified, +80/-10)
  • tests/kernels/ir/test_activation.py (added, +75/-0)
  • tests/kernels/ir/test_layernorm.py (modified, +11/-6)
  • tests/test_config.py (modified, +6/-3)
  • vllm/_aiter_ops.py (modified, +1/-99)
  • vllm/compilation/backends.py (modified, +19/-0)
  • vllm/compilation/passes/fusion/allreduce_rms_fusion.py (modified, +16/-9)
  • vllm/compilation/passes/fusion/matcher_utils.py (modified, +0/-67)
  • vllm/compilation/passes/fusion/rms_quant_fusion.py (modified, +26/-12)
  • vllm/compilation/passes/fusion/rocm_aiter_fusion.py (modified, +30/-16)
  • vllm/compilation/passes/fusion/sequence_parallelism.py (modified, +11/-9)
  • vllm/compilation/passes/inductor_pass.py (modified, +4/-0)
  • vllm/compilation/passes/ir/clone_elimination.py (added, +117/-0)
  • vllm/compilation/passes/ir/inplace_functionalization.py (added, +98/-0)
  • vllm/compilation/passes/ir/lowering_pass.py (modified, +7/-35)
  • vllm/compilation/passes/ir/utils.py (added, +40/-0)
  • vllm/compilation/passes/pass_manager.py (modified, +7/-1)
  • vllm/config/kernel.py (modified, +11/-2)
  • vllm/config/vllm.py (modified, +1/-2)
  • vllm/envs.py (modified, +3/-7)
  • vllm/ir/op.py (modified, +290/-16)
  • vllm/ir/ops/__init__.py (modified, +3/-2)
  • vllm/ir/ops/activation.py (added, +13/-0)
  • vllm/ir/ops/layernorm.py (modified, +27/-3)
  • vllm/kernels/__init__.py (modified, +2/-2)
  • vllm/kernels/aiter_ops.py (modified, +71/-0)
  • vllm/kernels/oink_ops.py (modified, +46/-5)
  • vllm/kernels/triton/__init__.py (added, +3/-0)
  • vllm/kernels/triton/layernorm_batch_invariant.py (added, +59/-0)
  • vllm/kernels/vllm_c.py (modified, +37/-0)
  • vllm/kernels/xpu_ops.py (modified, +9/-0)
  • vllm/model_executor/layers/activation.py (modified, +5/-11)
  • vllm/model_executor/layers/batch_invariant.py (modified, +1/-0)
  • vllm/model_executor/layers/layernorm.py (modified, +10/-213)
  • vllm/platforms/cuda.py (modified, +10/-3)

PR #40135: [vLLM IR] Port activations (gelu) to IR op

Description (problem / solution / changelog)

GELU Algorithm Porting & Integration

Step 1: Port GELU Algorithm Implementation

  • Port the GELU algorithm implementation

Notes

  1. The "lowering test" will serve as the unified testing standard moving forward.
  2. Only the vllm_c kernel is implemented; other kernels may contain duplicate code (corrections are appreciated).
  3. No explicit priority is defined inside platform-specific code (to maintain simplicity).
  4. Benchmarks and semantic tests are not yet included.

Step 2: Integrate New Features

  • (Optional) Support in-place operations (not required)
  • (Optional) Support kernel fusion (not required)

Notes

  1. In-place operations are not required in this op.
  2. Kernel fusion pass is not required during this phase.

Step 3: Merge & Adapt to Unified Test Standards

  • Merge the new development branch
  • Resolve code conflicts during merge
  • Adapt the implementation to unified lowering tests
  • Align implementation with benchmarks and semantic tests

Related


General

  • Corrections and feedback are welcome.

Purpose

Test Plan

.venv/bin/python -m pytest tests/kernels/core/test_activation.py tests/kernels/ir/test_activation.py -v

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
</details>

Changed files

  • tests/compile/passes/ir/test_lowering.py (modified, +73/-0)
  • tests/kernels/ir/test_activation.py (added, +87/-0)
  • vllm/config/kernel.py (modified, +9/-0)
  • vllm/ir/ops/__init__.py (modified, +2/-1)
  • vllm/ir/ops/activation.py (added, +49/-0)
  • vllm/kernels/vllm_c.py (modified, +41/-0)
  • vllm/model_executor/layers/activation.py (modified, +7/-31)
RAW_BUFFERClick to expand / collapse

SiluAndMul and other activation function CustomOp subclasses should be ported over to vLLM IR. This should be done in three steps:

  1. Replace forward_* methods in SiluAndMul with a call to new vllm.ir.ops.silu_and_mul.
  2. The same for other activation functions
  3. Convert CustomOp objects to PluggableLayer

An additional challenge is the compile_native=True behavior: inside the fused_moe torch custom op, SiluAndMul.forward_native is not visible to model-level compilation, so we apply another torch.compile decorator. To work with vLLM IR, we'll have to locally disable torch wrapping (vllm.ir.enable_torch_wrap(False)), and only in the MoE case. So we should default compile_native=False and only set it to True for MoE. Moving forward, we will enable automatic compilation of all IR native implementations by default, but that requires more design & discussion: #38744

1 is high priority, 2 is slightly less so. 3 requires the above compilation fix.

Also, once all OOT platforms migrate these ops to vLLM IR, we can remove the PluggableLayer system completely.

extent analysis

TL;DR

Port SiluAndMul and other activation functions to vLLM IR by replacing their forward methods and converting CustomOp objects to PluggableLayer.

Guidance

  • Replace the forward_* methods in SiluAndMul with a call to the new vllm.ir.ops.silu_and_mul to start the porting process.
  • Apply the same replacement to other activation functions to ensure consistency across the codebase.
  • Convert CustomOp objects to PluggableLayer to align with the vLLM IR requirements, but note that this step depends on resolving the compilation fix.
  • Consider defaulting compile_native to False and only setting it to True for MoE cases to work around the current compilation limitations.

Example

No specific code example is provided due to the lack of detailed implementation information in the issue.

Notes

The provided guidance assumes that the vLLM IR and PluggableLayer systems are already set up and functional. The compilation fix for compile_native=True behavior in the fused_moe torch custom op needs to be addressed before fully converting CustomOp objects.

Recommendation

Apply workaround: Default compile_native to False and only set it to True for MoE cases until the compilation fix is implemented, allowing for a gradual transition to vLLM IR.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING