pytorch - ✅(Solved) Fix [vllm] vllm::sparse_attn_indexer custom op output aliases input — now hard-errors with new torch.compile aliasing check (DeepSeek-V3.2) [2 pull requests, 4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#182006Fetched 2026-05-01 05:32:56
View on GitHub
Comments
4
Participants
2
Timeline
56
Reactions
0
Author
Participants
Timeline (top)
subscribed ×25mentioned ×14labeled ×9commented ×4

Under torch 2.12.0, vLLM's vllm::sparse_attn_indexer custom operator (used by DeepSeek-V3.2's sparse attention indexer) hits PyTorch's custom-op aliasing check. The op returns a tensor that aliases one of its inputs (or another return).

Behavior depends on environment:

  • In CI (CI env var set, e.g., GitHub Actions): hard RuntimeError, engine fails to start.
  • Outside CI (regular user-land): only a UserWarning, execution continues.

The check is gated by:

# torch/_functorch/config.py
check_custom_op_aliasing = True
error_on_custom_op_aliasing = bool(os.getenv("CI"))

In 2.12, the plan is to flip error_on_custom_op_aliasing to True by default (i.e., everywhere, not only in CI). At that point user-land will see the same hard error.

The error message:

RuntimeError: vllm::sparse_attn_indexer (with implementation in ???): The output of this custom operator (1) must not also be an input to this custom operator and (2) may not alias any inputs to this custom operator or other returns. The most common way to trigger this error is if we have y = custom_op(x) and y and x are the same Tensor. Please instead return a clone of the offending output tensor(s) (e.g. return x.clone()) or refactor the custom operator to not return y.

The user-land warning is the same text prefixed with UserWarning: and ending with: This is deprecated and will become an error in PyTorch 2.12.

Environment

  • torch: 2.12.0+cu130 (test channel)
  • triton: 3.7.0
  • CUDA: 13.0
  • Python: 3.12
  • GPU: NVIDIA B200 (linux.dgx.b200.8, TP=8)
  • vLLM model: deepseek-ai/DeepSeek-V3.2

Reproduction

CI=1 vllm serve deepseek-ai/DeepSeek-V3.2 --tensor-parallel-size 8 --disable-log-stats  --load-format dummy --compilation-config '{"cudagraph_mode":"PIECEWISE","use_inductor_graph_partition":true}'

(Worker_TP5 pid=47907) ERROR [multiproc_executor.py:962]
  RuntimeError: vllm::sparse_attn_indexer (with implementation in ???):
  The output of this custom operator (1) must not also be an input to this custom operator
  and (2) may not alias any inputs to this custom operator or other returns.
  ...
(APIServer pid=47461) RuntimeError: Engine core initialization failed.

Outside CI, the same model loads — only the UserWarning is emitted (per #173844's "warn by default" rollout).

Failing job: https://github.com/pytorch/pytorch/actions/runs/24839322153/job/72733144857

Why this is a 2.12-surfaced regression

The aliasing check is a deliberate PyTorch tightening for 2.12:

  • PR #166545 (2026-01-02) — added _check_custom_op_aliasing runtime check, gated by config flags.
  • PR #173844 (2026-02-03, "Warn wrong custom ops by default") — flipped on by default. PR description: "By 2.12, we want to error on custom ops that have wrong schemas."

So in 2.11 + GitHub Actions (CI=1) this is already a hard error; in 2.11 user-land it's only a warning. In 2.12 the default flips to error everywhere.

Where the offending op is registered

vllm::sparse_attn_indexer was introduced in vLLM PR vllm-project/vllm#29287 ([ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp, merged 2025-11-24). The CustomOp wraps the heavy DSA kernels (fp8_mqa_logits / fp8_paged_mqa_logits, etc.) for the DeepSeek-V3.2 sparse MLA path. Its current return tensor either is, or aliases, one of the input tensors.

cc @ezyang @gchanan @seemethere @malfet @pytorch/pytorch-dev-infra @chauhang @penguinwu @bdhirsh @bobrenjc93 @aorenste @atalman @zou3519 @Lucaskabela @angelayi @tugsbayasgalan

Error Message

CI=1 vllm serve deepseek-ai/DeepSeek-V3.2 --tensor-parallel-size 8 --disable-log-stats --load-format dummy --compilation-config '{"cudagraph_mode":"PIECEWISE","use_inductor_graph_partition":true}'

(Worker_TP5 pid=47907) ERROR [multiproc_executor.py:962] RuntimeError: vllm::sparse_attn_indexer (with implementation in ???): The output of this custom operator (1) must not also be an input to this custom operator and (2) may not alias any inputs to this custom operator or other returns. ... (APIServer pid=47461) RuntimeError: Engine core initialization failed.

Root Cause

Under torch 2.12.0, vLLM's vllm::sparse_attn_indexer custom operator (used by DeepSeek-V3.2's sparse attention indexer) hits PyTorch's custom-op aliasing check. The op returns a tensor that aliases one of its inputs (or another return).

Behavior depends on environment:

  • In CI (CI env var set, e.g., GitHub Actions): hard RuntimeError, engine fails to start.
  • Outside CI (regular user-land): only a UserWarning, execution continues.

The check is gated by:

# torch/_functorch/config.py
check_custom_op_aliasing = True
error_on_custom_op_aliasing = bool(os.getenv("CI"))

In 2.12, the plan is to flip error_on_custom_op_aliasing to True by default (i.e., everywhere, not only in CI). At that point user-land will see the same hard error.

The error message:

RuntimeError: vllm::sparse_attn_indexer (with implementation in ???): The output of this custom operator (1) must not also be an input to this custom operator and (2) may not alias any inputs to this custom operator or other returns. The most common way to trigger this error is if we have y = custom_op(x) and y and x are the same Tensor. Please instead return a clone of the offending output tensor(s) (e.g. return x.clone()) or refactor the custom operator to not return y.

The user-land warning is the same text prefixed with UserWarning: and ending with: This is deprecated and will become an error in PyTorch 2.12.

Environment

  • torch: 2.12.0+cu130 (test channel)
  • triton: 3.7.0
  • CUDA: 13.0
  • Python: 3.12
  • GPU: NVIDIA B200 (linux.dgx.b200.8, TP=8)
  • vLLM model: deepseek-ai/DeepSeek-V3.2

Reproduction

CI=1 vllm serve deepseek-ai/DeepSeek-V3.2 --tensor-parallel-size 8 --disable-log-stats  --load-format dummy --compilation-config '{"cudagraph_mode":"PIECEWISE","use_inductor_graph_partition":true}'

(Worker_TP5 pid=47907) ERROR [multiproc_executor.py:962]
  RuntimeError: vllm::sparse_attn_indexer (with implementation in ???):
  The output of this custom operator (1) must not also be an input to this custom operator
  and (2) may not alias any inputs to this custom operator or other returns.
  ...
(APIServer pid=47461) RuntimeError: Engine core initialization failed.

Outside CI, the same model loads — only the UserWarning is emitted (per #173844's "warn by default" rollout).

Failing job: https://github.com/pytorch/pytorch/actions/runs/24839322153/job/72733144857

Why this is a 2.12-surfaced regression

The aliasing check is a deliberate PyTorch tightening for 2.12:

  • PR #166545 (2026-01-02) — added _check_custom_op_aliasing runtime check, gated by config flags.
  • PR #173844 (2026-02-03, "Warn wrong custom ops by default") — flipped on by default. PR description: "By 2.12, we want to error on custom ops that have wrong schemas."

So in 2.11 + GitHub Actions (CI=1) this is already a hard error; in 2.11 user-land it's only a warning. In 2.12 the default flips to error everywhere.

Where the offending op is registered

vllm::sparse_attn_indexer was introduced in vLLM PR vllm-project/vllm#29287 ([ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp, merged 2025-11-24). The CustomOp wraps the heavy DSA kernels (fp8_mqa_logits / fp8_paged_mqa_logits, etc.) for the DeepSeek-V3.2 sparse MLA path. Its current return tensor either is, or aliases, one of the input tensors.

cc @ezyang @gchanan @seemethere @malfet @pytorch/pytorch-dev-infra @chauhang @penguinwu @bdhirsh @bobrenjc93 @aorenste @atalman @zou3519 @Lucaskabela @angelayi @tugsbayasgalan

Fix Action

Fixed

PR fix notes

PR #182068: Make custom op aliasing check warn (not error) in CI

Description (problem / solution / changelog)

Set error_on_custom_op_aliasing to False unconditionally so CI runs and local runs produce the same UserWarning instead of CI hard-erroring. Previously, error_on_custom_op_aliasing=bool(os.getenv("CI")) caused the check to raise RuntimeError under CI=1 while only warning elsewhere, which surfaced surprising failures (e.g. vllm::sparse_attn_indexer in DeepSeek-V3.2 hitting a hard error in GitHub Actions but warning in user-land). Fixes #182006.

This probably will have a conflict with @zou3519 https://github.com/pytorch/pytorch/pull/182063, so I can rebase and land this after

Authored with Claude.

Changed files

  • torch/_functorch/config.py (modified, +2/-3)

Code Example

# torch/_functorch/config.py
  check_custom_op_aliasing = True
  error_on_custom_op_aliasing = bool(os.getenv("CI"))

---

CI=1 vllm serve deepseek-ai/DeepSeek-V3.2 --tensor-parallel-size 8 --disable-log-stats  --load-format dummy --compilation-config '{"cudagraph_mode":"PIECEWISE","use_inductor_graph_partition":true}'

  (Worker_TP5 pid=47907) ERROR [multiproc_executor.py:962]
    RuntimeError: vllm::sparse_attn_indexer (with implementation in ???):
    The output of this custom operator (1) must not also be an input to this custom operator
    and (2) may not alias any inputs to this custom operator or other returns.
    ...
  (APIServer pid=47461) RuntimeError: Engine core initialization failed.
RAW_BUFFERClick to expand / collapse

Summary

Under torch 2.12.0, vLLM's vllm::sparse_attn_indexer custom operator (used by DeepSeek-V3.2's sparse attention indexer) hits PyTorch's custom-op aliasing check. The op returns a tensor that aliases one of its inputs (or another return).

Behavior depends on environment:

  • In CI (CI env var set, e.g., GitHub Actions): hard RuntimeError, engine fails to start.
  • Outside CI (regular user-land): only a UserWarning, execution continues.

The check is gated by:

# torch/_functorch/config.py
check_custom_op_aliasing = True
error_on_custom_op_aliasing = bool(os.getenv("CI"))

In 2.12, the plan is to flip error_on_custom_op_aliasing to True by default (i.e., everywhere, not only in CI). At that point user-land will see the same hard error.

The error message:

RuntimeError: vllm::sparse_attn_indexer (with implementation in ???): The output of this custom operator (1) must not also be an input to this custom operator and (2) may not alias any inputs to this custom operator or other returns. The most common way to trigger this error is if we have y = custom_op(x) and y and x are the same Tensor. Please instead return a clone of the offending output tensor(s) (e.g. return x.clone()) or refactor the custom operator to not return y.

The user-land warning is the same text prefixed with UserWarning: and ending with: This is deprecated and will become an error in PyTorch 2.12.

Environment

  • torch: 2.12.0+cu130 (test channel)
  • triton: 3.7.0
  • CUDA: 13.0
  • Python: 3.12
  • GPU: NVIDIA B200 (linux.dgx.b200.8, TP=8)
  • vLLM model: deepseek-ai/DeepSeek-V3.2

Reproduction

CI=1 vllm serve deepseek-ai/DeepSeek-V3.2 --tensor-parallel-size 8 --disable-log-stats  --load-format dummy --compilation-config '{"cudagraph_mode":"PIECEWISE","use_inductor_graph_partition":true}'

(Worker_TP5 pid=47907) ERROR [multiproc_executor.py:962]
  RuntimeError: vllm::sparse_attn_indexer (with implementation in ???):
  The output of this custom operator (1) must not also be an input to this custom operator
  and (2) may not alias any inputs to this custom operator or other returns.
  ...
(APIServer pid=47461) RuntimeError: Engine core initialization failed.

Outside CI, the same model loads — only the UserWarning is emitted (per #173844's "warn by default" rollout).

Failing job: https://github.com/pytorch/pytorch/actions/runs/24839322153/job/72733144857

Why this is a 2.12-surfaced regression

The aliasing check is a deliberate PyTorch tightening for 2.12:

  • PR #166545 (2026-01-02) — added _check_custom_op_aliasing runtime check, gated by config flags.
  • PR #173844 (2026-02-03, "Warn wrong custom ops by default") — flipped on by default. PR description: "By 2.12, we want to error on custom ops that have wrong schemas."

So in 2.11 + GitHub Actions (CI=1) this is already a hard error; in 2.11 user-land it's only a warning. In 2.12 the default flips to error everywhere.

Where the offending op is registered

vllm::sparse_attn_indexer was introduced in vLLM PR vllm-project/vllm#29287 ([ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp, merged 2025-11-24). The CustomOp wraps the heavy DSA kernels (fp8_mqa_logits / fp8_paged_mqa_logits, etc.) for the DeepSeek-V3.2 sparse MLA path. Its current return tensor either is, or aliases, one of the input tensors.

cc @ezyang @gchanan @seemethere @malfet @pytorch/pytorch-dev-infra @chauhang @penguinwu @bdhirsh @bobrenjc93 @aorenste @atalman @zou3519 @Lucaskabela @angelayi @tugsbayasgalan

extent analysis

TL;DR

The vllm::sparse_attn_indexer custom operator in PyTorch 2.12.0 needs to be refactored to avoid aliasing its input tensors to fix the RuntimeError.

Guidance

  • Identify the vllm::sparse_attn_indexer custom operator implementation and refactor it to return a clone of the output tensor instead of aliasing the input tensor.
  • Verify the fix by running the vllm serve command with the CI environment variable set to 1 and checking for the absence of the RuntimeError.
  • Consider updating the check_custom_op_aliasing and error_on_custom_op_aliasing config flags in torch/_functorch/config.py to test the behavior in different environments.
  • Review the PyTorch 2.12 release notes and documentation for custom operator aliasing checks to ensure compliance with the new requirements.

Example

# Refactor the custom operator to return a clone of the output tensor
def sparse_attn_indexer(input_tensor):
    #... existing implementation...
    output_tensor = torch.clone(input_tensor)  # Return a clone of the input tensor
    return output_tensor

Notes

The fix requires modifying the vllm::sparse_attn_indexer custom operator implementation, which may involve updating the vllm library or the DeepSeek-V3.2 model. The check_custom_op_aliasing and error_on_custom_op_aliasing config flags can be used to test the behavior in different environments.

Recommendation

Apply a workaround by refactoring the vllm::sparse_attn_indexer custom operator to return a clone of the output tensor, as this will ensure compliance with the PyTorch 2.12 custom operator aliasing checks.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING