pytorch - ✅(Solved) Fix [vllm] vllm::sparse_attn_indexer custom op output aliases input — now hard-errors with new torch.compile aliasing check (DeepSeek-V3.2) [2 pull requests, 4 comments, 2 participants]

pytorch2026-04-30 09:50:08

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#182006•Fetched 2026-05-01 05:32:56

View on GitHub

Comments

Participants

Timeline

Reactions

Author

huydhn

Participants

huydhn

zou3519

Timeline (top)

subscribed ×25mentioned ×14labeled ×9commented ×4

Under torch 2.12.0, vLLM's vllm::sparse_attn_indexer custom operator (used by DeepSeek-V3.2's sparse attention indexer) hits PyTorch's custom-op aliasing check. The op returns a tensor that aliases one of its inputs (or another return).

Behavior depends on environment:

In CI (CI env var set, e.g., GitHub Actions): hard RuntimeError, engine fails to start.
Outside CI (regular user-land): only a UserWarning, execution continues.

The check is gated by:

# torch/_functorch/config.py
check_custom_op_aliasing = True
error_on_custom_op_aliasing = bool(os.getenv("CI"))

In 2.12, the plan is to flip error_on_custom_op_aliasing to True by default (i.e., everywhere, not only in CI). At that point user-land will see the same hard error.

The error message:

RuntimeError: vllm::sparse_attn_indexer (with implementation in ???): The output of this custom operator (1) must not also be an input to this custom operator and (2) may not alias any inputs to this custom operator or other returns. The most common way to trigger this error is if we have y = custom_op(x) and y and x are the same Tensor. Please instead return a clone of the offending output tensor(s) (e.g. return x.clone()) or refactor the custom operator to not return y.

The user-land warning is the same text prefixed with UserWarning: and ending with: This is deprecated and will become an error in PyTorch 2.12.

Environment

torch: 2.12.0+cu130 (test channel)
triton: 3.7.0
CUDA: 13.0
Python: 3.12
GPU: NVIDIA B200 (linux.dgx.b200.8, TP=8)
vLLM model: deepseek-ai/DeepSeek-V3.2

Reproduction

CI=1 vllm serve deepseek-ai/DeepSeek-V3.2 --tensor-parallel-size 8 --disable-log-stats  --load-format dummy --compilation-config '{"cudagraph_mode":"PIECEWISE","use_inductor_graph_partition":true}'

(Worker_TP5 pid=47907) ERROR [multiproc_executor.py:962]
  RuntimeError: vllm::sparse_attn_indexer (with implementation in ???):
  The output of this custom operator (1) must not also be an input to this custom operator
  and (2) may not alias any inputs to this custom operator or other returns.
  ...
(APIServer pid=47461) RuntimeError: Engine core initialization failed.

Outside CI, the same model loads — only the UserWarning is emitted (per #173844's "warn by default" rollout).

Failing job: https://github.com/pytorch/pytorch/actions/runs/24839322153/job/72733144857

Why this is a 2.12-surfaced regression

The aliasing check is a deliberate PyTorch tightening for 2.12:

PR #166545 (2026-01-02) — added _check_custom_op_aliasing runtime check, gated by config flags.
PR #173844 (2026-02-03, "Warn wrong custom ops by default") — flipped on by default. PR description: "By 2.12, we want to error on custom ops that have wrong schemas."

So in 2.11 + GitHub Actions (CI=1) this is already a hard error; in 2.11 user-land it's only a warning. In 2.12 the default flips to error everywhere.

Where the offending op is registered

vllm::sparse_attn_indexer was introduced in vLLM PR vllm-project/vllm#29287 ([ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp, merged 2025-11-24). The CustomOp wraps the heavy DSA kernels (fp8_mqa_logits / fp8_paged_mqa_logits, etc.) for the DeepSeek-V3.2 sparse MLA path. Its current return tensor either is, or aliases, one of the input tensors.

cc @ezyang @gchanan @seemethere @malfet @pytorch/pytorch-dev-infra @chauhang @penguinwu @bdhirsh @bobrenjc93 @aorenste @atalman @zou3519 @Lucaskabela @angelayi @tugsbayasgalan

Error Message

CI=1 vllm serve deepseek-ai/DeepSeek-V3.2 --tensor-parallel-size 8 --disable-log-stats --load-format dummy --compilation-config '{"cudagraph_mode":"PIECEWISE","use_inductor_graph_partition":true}'

(Worker_TP5 pid=47907) ERROR [multiproc_executor.py:962] RuntimeError: vllm::sparse_attn_indexer (with implementation in ???): The output of this custom operator (1) must not also be an input to this custom operator and (2) may not alias any inputs to this custom operator or other returns. ... (APIServer pid=47461) RuntimeError: Engine core initialization failed.

Root Cause

Behavior depends on environment:

In CI (CI env var set, e.g., GitHub Actions): hard RuntimeError, engine fails to start.
Outside CI (regular user-land): only a UserWarning, execution continues.

The check is gated by:

# torch/_functorch/config.py
check_custom_op_aliasing = True
error_on_custom_op_aliasing = bool(os.getenv("CI"))

In 2.12, the plan is to flip error_on_custom_op_aliasing to True by default (i.e., everywhere, not only in CI). At that point user-land will see the same hard error.

The error message:

RuntimeError: vllm::sparse_attn_indexer (with implementation in ???): The output of this custom operator (1) must not also be an input to this custom operator and (2) may not alias any inputs to this custom operator or other returns. The most common way to trigger this error is if we have y = custom_op(x) and y and x are the same Tensor. Please instead return a clone of the offending output tensor(s) (e.g. return x.clone()) or refactor the custom operator to not return y.

The user-land warning is the same text prefixed with UserWarning: and ending with: This is deprecated and will become an error in PyTorch 2.12.

Environment

torch: 2.12.0+cu130 (test channel)
triton: 3.7.0
CUDA: 13.0
Python: 3.12
GPU: NVIDIA B200 (linux.dgx.b200.8, TP=8)
vLLM model: deepseek-ai/DeepSeek-V3.2

Reproduction

CI=1 vllm serve deepseek-ai/DeepSeek-V3.2 --tensor-parallel-size 8 --disable-log-stats  --load-format dummy --compilation-config '{"cudagraph_mode":"PIECEWISE","use_inductor_graph_partition":true}'

(Worker_TP5 pid=47907) ERROR [multiproc_executor.py:962]
  RuntimeError: vllm::sparse_attn_indexer (with implementation in ???):
  The output of this custom operator (1) must not also be an input to this custom operator
  and (2) may not alias any inputs to this custom operator or other returns.
  ...
(APIServer pid=47461) RuntimeError: Engine core initialization failed.

Outside CI, the same model loads — only the UserWarning is emitted (per #173844's "warn by default" rollout).

Failing job: https://github.com/pytorch/pytorch/actions/runs/24839322153/job/72733144857

Why this is a 2.12-surfaced regression

The aliasing check is a deliberate PyTorch tightening for 2.12:

PR #166545 (2026-01-02) — added _check_custom_op_aliasing runtime check, gated by config flags.
PR #173844 (2026-02-03, "Warn wrong custom ops by default") — flipped on by default. PR description: "By 2.12, we want to error on custom ops that have wrong schemas."

So in 2.11 + GitHub Actions (CI=1) this is already a hard error; in 2.11 user-land it's only a warning. In 2.12 the default flips to error everywhere.

Where the offending op is registered

cc @ezyang @gchanan @seemethere @malfet @pytorch/pytorch-dev-infra @chauhang @penguinwu @bdhirsh @bobrenjc93 @aorenste @atalman @zou3519 @Lucaskabela @angelayi @tugsbayasgalan

Fix Action

Fixed

Fixed by PR: Make custom op aliasing check warn (not error) in CI (https://github.com/pytorch/pytorch/pull/182068)

PR fix notes

PR #182068: Make custom op aliasing check warn (not error) in CI

Repository: pytorch/pytorch
Author: huydhn
State: open | merged: False
Link: https://github.com/pytorch/pytorch/pull/182068

Description (problem / solution / changelog)

Set error_on_custom_op_aliasing to False unconditionally so CI runs and local runs produce the same UserWarning instead of CI hard-erroring. Previously, error_on_custom_op_aliasing=bool(os.getenv("CI")) caused the check to raise RuntimeError under CI=1 while only warning elsewhere, which surfaced surprising failures (e.g. vllm::sparse_attn_indexer in DeepSeek-V3.2 hitting a hard error in GitHub Actions but warning in user-land). Fixes #182006.

This probably will have a conflict with @zou3519 https://github.com/pytorch/pytorch/pull/182063, so I can rebase and land this after

Authored with Claude.

Changed files

torch/_functorch/config.py (modified, +2/-3)

Code Example

# torch/_functorch/config.py
  check_custom_op_aliasing = True
  error_on_custom_op_aliasing = bool(os.getenv("CI"))

---

CI=1 vllm serve deepseek-ai/DeepSeek-V3.2 --tensor-parallel-size 8 --disable-log-stats  --load-format dummy --compilation-config '{"cudagraph_mode":"PIECEWISE","use_inductor_graph_partition":true}'

  (Worker_TP5 pid=47907) ERROR [multiproc_executor.py:962]
    RuntimeError: vllm::sparse_attn_indexer (with implementation in ???):
    The output of this custom operator (1) must not also be an input to this custom operator
    and (2) may not alias any inputs to this custom operator or other returns.
    ...
  (APIServer pid=47461) RuntimeError: Engine core initialization failed.

RAW_BUFFERClick to expand / collapse

Summary

Behavior depends on environment:

In CI (CI env var set, e.g., GitHub Actions): hard RuntimeError, engine fails to start.
Outside CI (regular user-land): only a UserWarning, execution continues.

The check is gated by:

# torch/_functorch/config.py
check_custom_op_aliasing = True
error_on_custom_op_aliasing = bool(os.getenv("CI"))

In 2.12, the plan is to flip error_on_custom_op_aliasing to True by default (i.e., everywhere, not only in CI). At that point user-land will see the same hard error.

The error message:

RuntimeError: vllm::sparse_attn_indexer (with implementation in ???): The output of this custom operator (1) must not also be an input to this custom operator and (2) may not alias any inputs to this custom operator or other returns. The most common way to trigger this error is if we have y = custom_op(x) and y and x are the same Tensor. Please instead return a clone of the offending output tensor(s) (e.g. return x.clone()) or refactor the custom operator to not return y.

The user-land warning is the same text prefixed with UserWarning: and ending with: This is deprecated and will become an error in PyTorch 2.12.

Environment

torch: 2.12.0+cu130 (test channel)
triton: 3.7.0
CUDA: 13.0
Python: 3.12
GPU: NVIDIA B200 (linux.dgx.b200.8, TP=8)
vLLM model: deepseek-ai/DeepSeek-V3.2

Reproduction

CI=1 vllm serve deepseek-ai/DeepSeek-V3.2 --tensor-parallel-size 8 --disable-log-stats  --load-format dummy --compilation-config '{"cudagraph_mode":"PIECEWISE","use_inductor_graph_partition":true}'

(Worker_TP5 pid=47907) ERROR [multiproc_executor.py:962]
  RuntimeError: vllm::sparse_attn_indexer (with implementation in ???):
  The output of this custom operator (1) must not also be an input to this custom operator
  and (2) may not alias any inputs to this custom operator or other returns.
  ...
(APIServer pid=47461) RuntimeError: Engine core initialization failed.

Outside CI, the same model loads — only the UserWarning is emitted (per #173844's "warn by default" rollout).

Failing job: https://github.com/pytorch/pytorch/actions/runs/24839322153/job/72733144857

Why this is a 2.12-surfaced regression

The aliasing check is a deliberate PyTorch tightening for 2.12:

PR #166545 (2026-01-02) — added _check_custom_op_aliasing runtime check, gated by config flags.
PR #173844 (2026-02-03, "Warn wrong custom ops by default") — flipped on by default. PR description: "By 2.12, we want to error on custom ops that have wrong schemas."

So in 2.11 + GitHub Actions (CI=1) this is already a hard error; in 2.11 user-land it's only a warning. In 2.12 the default flips to error everywhere.

Where the offending op is registered

cc @ezyang @gchanan @seemethere @malfet @pytorch/pytorch-dev-infra @chauhang @penguinwu @bdhirsh @bobrenjc93 @aorenste @atalman @zou3519 @Lucaskabela @angelayi @tugsbayasgalan

extent analysis

TL;DR

The vllm::sparse_attn_indexer custom operator in PyTorch 2.12.0 needs to be refactored to avoid aliasing its input tensors to fix the RuntimeError.

Guidance

Identify the vllm::sparse_attn_indexer custom operator implementation and refactor it to return a clone of the output tensor instead of aliasing the input tensor.
Verify the fix by running the vllm serve command with the CI environment variable set to 1 and checking for the absence of the RuntimeError.
Consider updating the check_custom_op_aliasing and error_on_custom_op_aliasing config flags in torch/_functorch/config.py to test the behavior in different environments.
Review the PyTorch 2.12 release notes and documentation for custom operator aliasing checks to ensure compliance with the new requirements.

Example

# Refactor the custom operator to return a clone of the output tensor
def sparse_attn_indexer(input_tensor):
    #... existing implementation...
    output_tensor = torch.clone(input_tensor)  # Return a clone of the input tensor
    return output_tensor

Notes

The fix requires modifying the vllm::sparse_attn_indexer custom operator implementation, which may involve updating the vllm library or the DeepSeek-V3.2 model. The check_custom_op_aliasing and error_on_custom_op_aliasing config flags can be used to test the behavior in different environments.

Recommendation

Apply a workaround by refactoring the vllm::sparse_attn_indexer custom operator to return a clone of the output tensor, as this will ensure compliance with the PyTorch 2.12 custom operator aliasing checks.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #dependency error #configuration error #environment variable #network issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

pytorch - ✅(Solved) Fix [vllm] vllm::sparse_attn_indexer custom op output aliases input — now hard-errors with new torch.compile aliasing check (DeepSeek-V3.2) [2 pull requests, 4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Environment

Reproduction

Why this is a 2.12-surfaced regression

Where the offending op is registered

Error Message

Root Cause

Environment

Reproduction

Why this is a 2.12-surfaced regression

Where the offending op is registered

Fix Action

Fixed

PR fix notes

PR #182068: Make custom op aliasing check warn (not error) in CI

Description (problem / solution / changelog)

Changed files

Code Example

Summary

Environment

Reproduction

Why this is a 2.12-surfaced regression

Where the offending op is registered

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING