pytorch - ✅(Solved) Fix [Inductor] `torch._foreach_sub` silently ignores `alpha` under `torch.compile` [1 pull requests, 1 comments, 2 participants]

pytorch2026-05-19 18:19:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#184415•Fetched 2026-05-20 03:38:48

View on GitHub

Comments

Participants

Timeline

106

Reactions

Author

wuyii8941

Participants

janeyx99

wuyii8941

Timeline (top)

mentioned ×45subscribed ×45labeled ×11unlabeled ×3

Root Cause

In torch/_inductor/lowering.py, _foreach_sub.List is registered without allow_alpha=True:

# line 7953 — _foreach_add correctly passes allow_alpha:
foreach_add_list = register_foreach_pointwise(
    aten._foreach_add.List, add, allow_alpha=True
)

# line 7963 — _foreach_sub is missing allow_alpha:
register_foreach_pointwise(aten._foreach_sub.List, sub)  # ← alpha silently dropped

Inside make_foreach_pointwise (line 784), the alpha value is extracted from kwargs into scalar_val. But when allow_alpha=False (the default), the apply_fn on line 816 calls pw_fn(*args) without forwarding scalar_val, so the alpha is silently discarded.

Fix Action

Fixed

Fixed by PR: [easy][compile] fix _foreach_sub (https://github.com/pytorch/pytorch/pull/184421)

PR fix notes

PR #184421: [easy][compile] fix _foreach_sub

Repository: pytorch/pytorch
Author: khushi-411
State: open | merged: False
Link: https://github.com/pytorch/pytorch/pull/184421

Description (problem / solution / changelog)

Fixes: #184415

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

Changed files

test/inductor/test_foreach.py (modified, +14/-0)
torch/_inductor/lowering.py (modified, +2/-2)

Code Example

import torch

def fn(a, b):
    return torch._foreach_sub(a, b, alpha=2.0)

torch.manual_seed(42)
a = [torch.randn(4, device='cuda') for _ in range(3)]
b = [torch.randn(4, device='cuda') for _ in range(3)]

eager = fn(a, b)
compiled = torch.compile(fn)([t.clone() for t in a], [t.clone() for t in b])

for i in range(3):
    print(f"eager[{i}]:    {eager[i].tolist()}")
    print(f"compiled[{i}]: {compiled[i].tolist()}")
    print(f"  match: {torch.equal(eager[i], compiled[i])}")  # False

# compiled output exactly matches alpha=1 (no alpha):
no_alpha = torch._foreach_sub(a, b)
for i in range(3):
    assert torch.equal(compiled[i], no_alpha[i]), "compiled should match no-alpha"
print("\nalpha was silently ignored: compiled == no_alpha for all tensors")

---

eager[0]:    [1.2315161228179932, -0.292142391204834, -1.423011302947998, 2.6724295616149902]
compiled[0]: [0.7127674221992493, 0.9346156120300293, -0.7975307703018188, 1.7607448101043701]
  match: False
...
alpha was silently ignored: compiled == no_alpha for all tensors

---

# line 7953 — _foreach_add correctly passes allow_alpha:
foreach_add_list = register_foreach_pointwise(
    aten._foreach_add.List, add, allow_alpha=True
)

# line 7963 — _foreach_sub is missing allow_alpha:
register_foreach_pointwise(aten._foreach_sub.List, sub)  # ← alpha silently dropped

---

register_foreach_pointwise(aten._foreach_sub.List, sub, allow_alpha=True)

---

PyTorch: 2.13.0.dev20260513+cu126
GPU: NVIDIA Tesla T4
CUDA: 12.6
Triton: 3.3.1

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch._foreach_sub(tensors, other_tensors, alpha=k) silently ignores the alpha parameter when compiled with torch.compile. Instead of computing a_i - alpha * b_i, the compiled code computes a_i - b_i (as if alpha=1).

The analogous torch._foreach_add with alpha works correctly under torch.compile.

Reproducer

import torch

def fn(a, b):
    return torch._foreach_sub(a, b, alpha=2.0)

torch.manual_seed(42)
a = [torch.randn(4, device='cuda') for _ in range(3)]
b = [torch.randn(4, device='cuda') for _ in range(3)]

eager = fn(a, b)
compiled = torch.compile(fn)([t.clone() for t in a], [t.clone() for t in b])

for i in range(3):
    print(f"eager[{i}]:    {eager[i].tolist()}")
    print(f"compiled[{i}]: {compiled[i].tolist()}")
    print(f"  match: {torch.equal(eager[i], compiled[i])}")  # False

# compiled output exactly matches alpha=1 (no alpha):
no_alpha = torch._foreach_sub(a, b)
for i in range(3):
    assert torch.equal(compiled[i], no_alpha[i]), "compiled should match no-alpha"
print("\nalpha was silently ignored: compiled == no_alpha for all tensors")

Output:

eager[0]:    [1.2315161228179932, -0.292142391204834, -1.423011302947998, 2.6724295616149902]
compiled[0]: [0.7127674221992493, 0.9346156120300293, -0.7975307703018188, 1.7607448101043701]
  match: False
...
alpha was silently ignored: compiled == no_alpha for all tensors

Root Cause

In torch/_inductor/lowering.py, _foreach_sub.List is registered without allow_alpha=True:

# line 7953 — _foreach_add correctly passes allow_alpha:
foreach_add_list = register_foreach_pointwise(
    aten._foreach_add.List, add, allow_alpha=True
)

# line 7963 — _foreach_sub is missing allow_alpha:
register_foreach_pointwise(aten._foreach_sub.List, sub)  # ← alpha silently dropped

Suggested Fix

register_foreach_pointwise(aten._foreach_sub.List, sub, allow_alpha=True)

Versions

PyTorch: 2.13.0.dev20260513+cu126
GPU: NVIDIA Tesla T4
CUDA: 12.6
Triton: 3.3.1

cc @ezyang @gchanan @kadeng @msaroufim @crcrpar @mcarilli @janeyx99 @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @muchulee8 @amjames @aakhundov @coconutruben @jataylo

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#mixed precision #training loop #device allocation #model download #tokenizer error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix [Inductor] `torch._foreach_sub` silently ignores `alpha` under `torch.compile` [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #184421: [easy][compile] fix _foreach_sub

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

🐛 Describe the bug

Reproducer

Root Cause

Suggested Fix

Versions

Versions

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix [Inductor] `torch._foreach_sub` silently ignores `alpha` under `torch.compile` [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #184421: [easy][compile] fix _foreach_sub

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

🐛 Describe the bug

Reproducer

Root Cause

Suggested Fix

Versions

Versions

Still need to ship something?

RELATED_DISCOVERY

TRENDING