vllm - ✅(Solved) Fix [vLLM IR] Port activations to IR op [3 pull requests, 1 comments, 2 participants]

ProExpertProg · 2026-04-01T15:48:39Z

[vllm] PR 39453: Port activations to IR op 1/3 - Repository: vllm-project/vllm - Author: bohnstingl - State: open | merged: False - Link: https://github.com/vl… # PR #39453: Port activations to IR op 1/3 - Repository: vllm-project/vllm - Author: bohnstingl - State: open | merged: False - Link: https://github.com/vllm-project/vllm/pull/39453 ## Description (problem / solution / changelog) ## Purpose This PR rewrites the `forward_*` methods in `SiluAndMul` with a call to new `vllm.ir.ops.silu_and_mul`. This is to address the first point of https://github.com/vllm-project/vllm/issues/38733 ## Test Plan Currently the existing tests are reused, i.e., running pytest tests/ currently passes. TODO: - [ ] Adding new tests specifically for the new IR, similar to `rms_norm` ## Test Result Currently pending --- Essential Elements of an Effective PR Description Checklist - [x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)". - [x] The test plan, such as providing test command. - [x] The test results, such as pasting the results comparison before and after, or e2e results - [ ] (Optional) The necessary documentation update, such as updating `supported_models.md` and `examples` for a new model. - [ ] (Optional) Release notes update. If your change is user facing, please update the release notes draft in the [Google Doc](https://docs.google.com/document/d/1YyVqrgX4gHTtrstbq8oWUImOyPCKSGnJ7xtTpmXzlRs/edit?tab=t.0). ## Changed files - `docs/design/vllm_ir.md` (added, +626/-0) - `tests/compile/backend.py` (modified, +15/-2) - `tests/compile/passes/distributed/test_sequence_parallelism.py` (modified, +13/-16) - `tests/compile/passes/ir/test_clone_cleanup.py` (added, +370/-0) - `tests/compile/passes/ir/test_inplace_functionalization.py` (added, +403/-0) - `tests/compile/passes/test_functionalization.py` (modified, +2/-2) - `tests/compile/passes/test_fusion.py` (modified, +4/-15) - `tests/ir/test_compile.py` (added, +167/-0) - `tests/ir/test_inplace_op.py` (added, +102/-0) - `tests/ir/test_op.py` (modified, +80/-10) - `tests/kernels/ir/test_activation.py` (added, +75/-0) - `tests/kernels/ir/test_layernorm.py` (modified, +11/-6) - `tests/test_config.py` (modified, +6/-3) - `vllm/_aiter_ops.py` (modified, +1/-99) - `vllm/compilation/backends.py` (modified, +19/-0) - `vllm/compilation/passes/fusion/allreduce_rms_fusion.py` (modified, +16/-9) - `vllm/compilation/passes/fusion/matcher_utils.py` (modified, +0/-67) - `vllm/compilation/passes/fusion/rms_quant_fusion.py` (modified, +26/-12) - `vllm/compilation/passes/fusion/rocm_aiter_fusion.py` (modified, +30/-16) - `vllm/compilation/passes/fusion/sequence_parallelism.py` (modified, +11/-9) - `vllm/compilation/passes/inductor_pass.py` (modified, +4/-0) - `vllm/compilation/passes/ir/clone_elimination.py` (added, +117/-0) - `vllm/compilation/passes/ir/inplace_functionalization.py` (added, +98/-0) - `vllm/compilation/passes/ir/lowering_pass.py` (modified, +7/-35) - `vllm/compilation/passes/ir/utils.py` (added, +40/-0) - `vllm/compilation/passes/pass_manager.py` (modified, +7/-1) - `vllm/config/kernel.py` (modified, +11/-2) - `vllm/config/vllm.py` (modified, +1/-2) - `vllm/envs.py` (modified, +3/-7) - `vllm/ir/op.py` (modified, +290/-16) - `vllm/ir/ops/__init__.py` (modified, +3/-2) - `vllm/ir/ops/activation.py` (added, +13/-0) - `vllm/ir/ops/layernorm.py` (modified, +27/-3) - `vllm/kernels/__init__.py` (modified, +2/-2) - `vllm/kernels/aiter_ops.py` (modified, +71/-0) - `vllm/kernels/oink_ops.py` (modified, +46/-5) - `vllm/kernels/triton/__init__.py` (added, +3/-0) - `vllm/kernels/triton/layernorm_batch_invariant.py` (added, +59/-0) - `vllm/kernels/vllm_c.py` (modified, +37/-0) - `vllm/kernels/xpu_ops.py` (modified, +9/-0) - `vllm/model_executor/layers/activation.py` (modified, +5/-11) - `vllm/model_executor/layers/batch_invariant.py` (modified, +1/-0) - `vllm/model_executor/layers/layernorm.py` (modified, +10/-213) - `vllm/platforms/cuda.py` (modified, +10/-3) --- # PR #40135: [vLLM IR] Port activations (gelu) to IR op - Repository: vllm-project/vllm - Author: Alex-ai-future - State: open | merged: False - Link: https://github.com/vllm-project/vllm/pull/40135 ## Description (problem / solution / changelog) # GELU Algorithm Porting & Integration ## Step 1: Port GELU Algorithm Implementation - [x] Port the GELU algorithm implementation **Notes** 1. The "lowering test" will serve as the unified testing standard moving forward. 2. Only the `vllm_c` kernel is implemented; other kernels may contain duplicate code (corrections are appreciated). 3. No explicit priority is defined inside platform-specific code (to maintain simplicity). 4. Benchmarks and semantic tests are not yet included. ## Step 2: Integrate New Features - [x] (Optional) Support in-place operations (not required) - [x] (Optional) Support kernel fusion (not required) **Notes** 1. In-place operations are not required in this op. 2. Kernel fusion

vllm2026-04-01 15:48:39

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38733•Fetched 2026-04-08 02:23:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Assignees

Timeline (top)

assigned ×2commented ×1issue_type_added ×1labeled ×1

PR fix notes

PR #39453: Port activations to IR op 1/3

Repository: vllm-project/vllm
Author: bohnstingl
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/39453

Description (problem / solution / changelog)

Purpose

This PR rewrites the forward_* methods in SiluAndMul with a call to new vllm.ir.ops.silu_and_mul. This is to address the first point of https://github.com/vllm-project/vllm/issues/38733

Test Plan

Currently the existing tests are reused, i.e., running pytest tests/ currently passes.

TODO:

Adding new tests specifically for the new IR, similar to rms_norm

Test Result

Currently pending

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

</details>

Changed files

docs/design/vllm_ir.md (added, +626/-0)
tests/compile/backend.py (modified, +15/-2)
tests/compile/passes/distributed/test_sequence_parallelism.py (modified, +13/-16)
tests/compile/passes/ir/test_clone_cleanup.py (added, +370/-0)
tests/compile/passes/ir/test_inplace_functionalization.py (added, +403/-0)
tests/compile/passes/test_functionalization.py (modified, +2/-2)
tests/compile/passes/test_fusion.py (modified, +4/-15)
tests/ir/test_compile.py (added, +167/-0)
tests/ir/test_inplace_op.py (added, +102/-0)
tests/ir/test_op.py (modified, +80/-10)
tests/kernels/ir/test_activation.py (added, +75/-0)
tests/kernels/ir/test_layernorm.py (modified, +11/-6)
tests/test_config.py (modified, +6/-3)
vllm/_aiter_ops.py (modified, +1/-99)
vllm/compilation/backends.py (modified, +19/-0)
vllm/compilation/passes/fusion/allreduce_rms_fusion.py (modified, +16/-9)
vllm/compilation/passes/fusion/matcher_utils.py (modified, +0/-67)
vllm/compilation/passes/fusion/rms_quant_fusion.py (modified, +26/-12)
vllm/compilation/passes/fusion/rocm_aiter_fusion.py (modified, +30/-16)
vllm/compilation/passes/fusion/sequence_parallelism.py (modified, +11/-9)
vllm/compilation/passes/inductor_pass.py (modified, +4/-0)
vllm/compilation/passes/ir/clone_elimination.py (added, +117/-0)
vllm/compilation/passes/ir/inplace_functionalization.py (added, +98/-0)
vllm/compilation/passes/ir/lowering_pass.py (modified, +7/-35)
vllm/compilation/passes/ir/utils.py (added, +40/-0)
vllm/compilation/passes/pass_manager.py (modified, +7/-1)
vllm/config/kernel.py (modified, +11/-2)
vllm/config/vllm.py (modified, +1/-2)
vllm/envs.py (modified, +3/-7)
vllm/ir/op.py (modified, +290/-16)
vllm/ir/ops/__init__.py (modified, +3/-2)
vllm/ir/ops/activation.py (added, +13/-0)
vllm/ir/ops/layernorm.py (modified, +27/-3)
vllm/kernels/__init__.py (modified, +2/-2)
vllm/kernels/aiter_ops.py (modified, +71/-0)
vllm/kernels/oink_ops.py (modified, +46/-5)
vllm/kernels/triton/__init__.py (added, +3/-0)
vllm/kernels/triton/layernorm_batch_invariant.py (added, +59/-0)
vllm/kernels/vllm_c.py (modified, +37/-0)
vllm/kernels/xpu_ops.py (modified, +9/-0)
vllm/model_executor/layers/activation.py (modified, +5/-11)
vllm/model_executor/layers/batch_invariant.py (modified, +1/-0)
vllm/model_executor/layers/layernorm.py (modified, +10/-213)
vllm/platforms/cuda.py (modified, +10/-3)

PR #40135: [vLLM IR] Port activations (gelu) to IR op

Repository: vllm-project/vllm
Author: Alex-ai-future
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/40135

Description (problem / solution / changelog)

GELU Algorithm Porting & Integration

Step 1: Port GELU Algorithm Implementation

Port the GELU algorithm implementation

Notes

The "lowering test" will serve as the unified testing standard moving forward.
Only the vllm_c kernel is implemented; other kernels may contain duplicate code (corrections are appreciated).
No explicit priority is defined inside platform-specific code (to maintain simplicity).
Benchmarks and semantic tests are not yet included.

Step 2: Integrate New Features

(Optional) Support in-place operations (not required)
(Optional) Support kernel fusion (not required)

Notes

In-place operations are not required in this op.
Kernel fusion pass is not required during this phase.

Step 3: Merge & Adapt to Unified Test Standards

Merge the new development branch
Resolve code conflicts during merge
Adapt the implementation to unified lowering tests
Align implementation with benchmarks and semantic tests

General

Corrections and feedback are welcome.

Purpose

Test Plan

.venv/bin/python -m pytest tests/kernels/core/test_activation.py tests/kernels/ir/test_activation.py -v

Test Result

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

</details>

Changed files

tests/compile/passes/ir/test_lowering.py (modified, +73/-0)
tests/kernels/ir/test_activation.py (added, +87/-0)
vllm/config/kernel.py (modified, +9/-0)
vllm/ir/ops/__init__.py (modified, +2/-1)
vllm/ir/ops/activation.py (added, +49/-0)
vllm/kernels/vllm_c.py (modified, +41/-0)
vllm/model_executor/layers/activation.py (modified, +7/-31)

RAW_BUFFERClick to expand / collapse

SiluAndMul and other activation function CustomOp subclasses should be ported over to vLLM IR. This should be done in three steps:

Replace forward_* methods in SiluAndMul with a call to new vllm.ir.ops.silu_and_mul.
The same for other activation functions
Convert CustomOp objects to PluggableLayer

An additional challenge is the compile_native=True behavior: inside the fused_moe torch custom op, SiluAndMul.forward_native is not visible to model-level compilation, so we apply another torch.compile decorator. To work with vLLM IR, we'll have to locally disable torch wrapping (vllm.ir.enable_torch_wrap(False)), and only in the MoE case. So we should default compile_native=False and only set it to True for MoE. Moving forward, we will enable automatic compilation of all IR native implementations by default, but that requires more design & discussion: #38744

1 is high priority, 2 is slightly less so. 3 requires the above compilation fix.

Also, once all OOT platforms migrate these ops to vLLM IR, we can remove the PluggableLayer system completely.

extent analysis

TL;DR

Port SiluAndMul and other activation functions to vLLM IR by replacing their forward methods and converting CustomOp objects to PluggableLayer.

Guidance

Replace the forward_* methods in SiluAndMul with a call to the new vllm.ir.ops.silu_and_mul to start the porting process.
Apply the same replacement to other activation functions to ensure consistency across the codebase.
Convert CustomOp objects to PluggableLayer to align with the vLLM IR requirements, but note that this step depends on resolving the compilation fix.
Consider defaulting compile_native to False and only setting it to True for MoE cases to work around the current compilation limitations.

Example

No specific code example is provided due to the lack of detailed implementation information in the issue.

Notes

The provided guidance assumes that the vLLM IR and PluggableLayer systems are already set up and functional. The compilation fix for compile_native=True behavior in the fused_moe torch custom op needs to be addressed before fully converting CustomOp objects.

Recommendation

Apply workaround: Default compile_native to False and only set it to True for MoE cases until the compilation fix is implemented, allowing for a gradual transition to vLLM IR.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#optimization #mixed precision #training loop #device allocation #model download

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [vLLM IR] Port activations to IR op [3 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #39453: Port activations to IR op 1/3

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

PR #40135: [vLLM IR] Port activations (gelu) to IR op

Description (problem / solution / changelog)

GELU Algorithm Porting & Integration

Step 1: Port GELU Algorithm Implementation

Step 2: Integrate New Features

Step 3: Merge & Adapt to Unified Test Standards

Related

General

Purpose

Test Plan

Test Result

Changed files

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING