vllm - ✅(Solved) Fix [vLLM IR] Port RoPE ops to IR [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38756Fetched 2026-04-08 02:22:51
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Assignees
Timeline (top)
assigned ×1commented ×1issue_type_added ×1labeled ×1

PR fix notes

PR #39488: [vLLM IR][Rope] Port RotaryEmbedding and DeepseekScalingRotaryEmbedding to IR Ops

Description (problem / solution / changelog)

Purpose

Port RotaryEmbedding and DeepseekScalingRotaryEmbedding to vLLM IR, following https://github.com/vllm-project/vllm/issues/38756. For the base RotaryEmbedding, keep the key=None path out of the IR maybe_inplace call and fall back to the existing static implementation.

This preserves cross-layer KV sharing behavior and avoids torch.library schema inference failures on optional returns.

Test Plan

To be tested

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • docs/design/vllm_ir.md (added, +626/-0)
  • tests/compile/backend.py (modified, +15/-2)
  • tests/compile/passes/distributed/test_sequence_parallelism.py (modified, +13/-16)
  • tests/compile/passes/ir/test_clone_cleanup.py (added, +370/-0)
  • tests/compile/passes/ir/test_inplace_functionalization.py (added, +403/-0)
  • tests/compile/passes/test_functionalization.py (modified, +2/-2)
  • tests/compile/passes/test_fusion.py (modified, +4/-15)
  • tests/ir/test_inplace_op.py (added, +102/-0)
  • tests/test_config.py (modified, +6/-3)
  • vllm/_aiter_ops.py (modified, +1/-99)
  • vllm/compilation/backends.py (modified, +19/-0)
  • vllm/compilation/passes/fusion/allreduce_rms_fusion.py (modified, +16/-9)
  • vllm/compilation/passes/fusion/matcher_utils.py (modified, +0/-149)
  • vllm/compilation/passes/fusion/qk_norm_rope_fusion.py (modified, +27/-21)
  • vllm/compilation/passes/fusion/rms_quant_fusion.py (modified, +26/-12)
  • vllm/compilation/passes/fusion/rocm_aiter_fusion.py (modified, +30/-16)
  • vllm/compilation/passes/fusion/rope_kvcache_fusion.py (modified, +12/-11)
  • vllm/compilation/passes/fusion/sequence_parallelism.py (modified, +11/-9)
  • vllm/compilation/passes/inductor_pass.py (modified, +4/-0)
  • vllm/compilation/passes/ir/clone_elimination.py (added, +117/-0)
  • vllm/compilation/passes/ir/inplace_functionalization.py (added, +98/-0)
  • vllm/compilation/passes/ir/lowering_pass.py (modified, +5/-35)
  • vllm/compilation/passes/ir/utils.py (added, +40/-0)
  • vllm/compilation/passes/pass_manager.py (modified, +7/-1)
  • vllm/config/kernel.py (modified, +6/-0)
  • vllm/config/vllm.py (modified, +1/-2)
  • vllm/envs.py (modified, +3/-7)
  • vllm/ir/op.py (modified, +114/-5)
  • vllm/ir/ops/__init__.py (modified, +3/-2)
  • vllm/ir/ops/layernorm.py (modified, +22/-0)
  • vllm/ir/ops/rotary_embedding.py (added, +162/-0)
  • vllm/kernels/aiter_ops.py (modified, +117/-0)
  • vllm/kernels/oink_ops.py (modified, +46/-5)
  • vllm/kernels/triton/layernorm_batch_invariant.py (modified, +30/-0)
  • vllm/kernels/vllm_c.py (modified, +69/-0)
  • vllm/kernels/xpu_ops.py (modified, +41/-0)
  • vllm/model_executor/layers/layernorm.py (modified, +10/-202)
  • vllm/model_executor/layers/rotary_embedding/base.py (modified, +20/-70)
  • vllm/model_executor/layers/rotary_embedding/deepseek_scaling_rope.py (modified, +16/-43)
  • vllm/platforms/cpu.py (modified, +14/-0)
  • vllm/platforms/cuda.py (modified, +15/-1)
  • vllm/platforms/rocm.py (modified, +16/-1)
RAW_BUFFERClick to expand / collapse

There are many flavors of rope, but some of them only contain a native implementation; those do not need to be ported. Additionally, the sin_cos_cache initialization logic should remain in the layer. At the very least, the following should be ported:

  • RotaryEmbedding
  • DeepseekScalingRotaryEmbedding

However, we should carefully inspect semantics if any of the ops can be consolidated, especially using simple bool params. This will help us reduce the maintenance burden and increase the coverage for rope+cache related fusions.

Final challenge for rope will be the inplace semantics, as the _C implementation is fully inplace, and the arguments are views, which will complicate the aliasing analysis for the clone elimination after the lowering pass (see #36823)

extent analysis

TL;DR

Porting RotaryEmbedding and DeepseekScalingRotaryEmbedding while inspecting semantics for potential consolidation is likely the next step.

Guidance

  • Identify and prioritize the porting of RotaryEmbedding and DeepseekScalingRotaryEmbedding to ensure compatibility.
  • Inspect the semantics of existing operations to determine if any can be consolidated using simple boolean parameters, reducing maintenance burden and increasing coverage.
  • Consider the implications of inplace semantics on the aliasing analysis for clone elimination after the lowering pass, as the current _C implementation is fully inplace.
  • Review related issues, such as #36823, for additional context on the challenges with rope and cache related fusions.

Notes

The solution may require careful analysis of the existing codebase and consideration of the trade-offs between compatibility, performance, and maintainability.

Recommendation

Apply workaround: carefully port and inspect the required components while considering the implications of inplace semantics, as a full solution may require significant changes to the existing implementation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING