Compiled output should match eager output exactly: - Loss: 1.48 (not 1.96) - Grad: `[[-0.4, -0.6, 0.4, 0.4, 0.2]]` (not `[[-1.6, -0.6, 1.0, 1.0, 0.2]]`)

pytorch - 💡(How to fix) Fix torch.compile produces wrong results for F.multilabel_margin_loss when target contains -1

pytorch2026-05-28 04:09:53

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

Target pattern	Eager loss	Compiled loss	FWD diff	BWD max diff
`[0,1,2,3,4]` (no padding)	✅ correct	✅ correct	0	0
`[0,1,2,3,-1]`	✅ correct	❌ wrong	0.049	0.20
`[0,1,-1,-1,-1]`	✅ correct	❌ wrong	1.60	1.20
`[0,-1,-1,-1,-1]`	✅ correct	❌ wrong	0.69	0.80
`[-1,0,1,2,3]`	✅ correct	❌ wrong	7.81	4.00

Root Cause

The bug is in the AOT autograd decomposition of aten::multilabel_margin_loss_forward. Both aot_eager and inductor backends produce the same wrong result, confirming the issue is in the decomposition layer, not in Inductor codegen.

The eager C++ kernel treats -1 in target as a "stop reading positive indices" marker. The decomposition appears to not implement this stop-marker semantics correctly, causing it to include extra terms in the hinge loss computation.

Code Example

import torch
import torch.nn.functional as F

device = "cuda"
x = torch.tensor([[1.0, -1.0, 0.5, 0.3, -0.2]], device=device, requires_grad=True)
target = torch.tensor([[0, 1, -1, -1, -1]], device=device, dtype=torch.long)

# Eager (correct)
x_e = x.detach().clone().requires_grad_(True)
loss_e = F.multilabel_margin_loss(x_e, target)
loss_e.backward()
print(f"Eager loss: {loss_e.item()}")       # 1.48
print(f"Eager grad: {x_e.grad}")            # [[-0.4, -0.6, 0.4, 0.4, 0.2]]

# Compiled (WRONG)
torch._dynamo.reset()
x_c = x.detach().clone().requires_grad_(True)
compiled_fn = torch.compile(lambda x: F.multilabel_margin_loss(x, target))
loss_c = compiled_fn(x_c)
loss_c.backward()
print(f"Compiled loss: {loss_c.item()}")     # 1.96 ← WRONG
print(f"Compiled grad: {x_c.grad}")          # [[-1.6, -0.6, 1.0, 1.0, 0.2]] ← WRONG

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile (both inductor and aot_eager backends) computes wrong forward loss and wrong backward gradients for F.multilabel_margin_loss when the target tensor contains -1 (the standard padding/ignore marker).

The eager implementation correctly treats -1 as a stop marker for positive class indices. The compiled decomposition does not handle -1 correctly, resulting in a different (wrong) loss value and gradient.

Minimal reproducer

import torch
import torch.nn.functional as F

device = "cuda"
x = torch.tensor([[1.0, -1.0, 0.5, 0.3, -0.2]], device=device, requires_grad=True)
target = torch.tensor([[0, 1, -1, -1, -1]], device=device, dtype=torch.long)

# Eager (correct)
x_e = x.detach().clone().requires_grad_(True)
loss_e = F.multilabel_margin_loss(x_e, target)
loss_e.backward()
print(f"Eager loss: {loss_e.item()}")       # 1.48
print(f"Eager grad: {x_e.grad}")            # [[-0.4, -0.6, 0.4, 0.4, 0.2]]

# Compiled (WRONG)
torch._dynamo.reset()
x_c = x.detach().clone().requires_grad_(True)
compiled_fn = torch.compile(lambda x: F.multilabel_margin_loss(x, target))
loss_c = compiled_fn(x_c)
loss_c.backward()
print(f"Compiled loss: {loss_c.item()}")     # 1.96 ← WRONG
print(f"Compiled grad: {x_c.grad}")          # [[-1.6, -0.6, 1.0, 1.0, 0.2]] ← WRONG

Expected behavior

Compiled output should match eager output exactly:

Loss: 1.48 (not 1.96)
Grad: [[-0.4, -0.6, 0.4, 0.4, 0.2]] (not [[-1.6, -0.6, 1.0, 1.0, 0.2]])

Observed behavior

Target pattern	Eager loss	Compiled loss	FWD diff	BWD max diff
`[0,1,2,3,4]` (no padding)	✅ correct	✅ correct	0	0
`[0,1,2,3,-1]`	✅ correct	❌ wrong	0.049	0.20
`[0,1,-1,-1,-1]`	✅ correct	❌ wrong	1.60	1.20
`[0,-1,-1,-1,-1]`	✅ correct	❌ wrong	0.69	0.80
`[-1,0,1,2,3]`	✅ correct	❌ wrong	7.81	4.00

Root cause

Impact

Any multi-label classification model using F.multilabel_margin_loss with -1 padding under torch.compile silently trains with incorrect gradients. There is no error or warning.

Versions

PyTorch: 2.13.0.dev20260520+cu126
CUDA: 12.6
GPU: Tesla T4
Python: 3.11

cc @chauhang @penguinwu

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Compiled output should match eager output exactly:

Loss: 1.48 (not 1.96)
Grad: [[-0.4, -0.6, 0.4, 0.4, 0.2]] (not [[-1.6, -0.6, 1.0, 1.0, 0.2]])

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix torch.compile produces wrong results for F.multilabel_margin_loss when target contains -1

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

🐛 Describe the bug

🐛 Describe the bug

Minimal reproducer

Expected behavior

Observed behavior

Root cause

Impact

Versions

Versions

FAQ

Expected behavior

Still need to ship something?

TRENDING