pytorch - 💡(How to fix) Fix [inductor] Severe numerical inconsistency: diag_embed + erf + normalize(p=-1) under torch.compile(dynamic=True) — max diff up to 565,248

pytorch2026-05-29 03:30:05

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

Both eager and compiled modes complete without error — silent correctness bug (INCON type).

Root Cause

Root cause hypothesis

The operator chain creates a numerically fragile computation:

Code Example

import torch
import torch.nn as nn
import torch.nn.functional as F

torch.set_grad_enabled(False)

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 8, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(8)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = torch.diag_embed(x)
        x = torch.erf(x)
        x = F.normalize(x, p=-1, dim=-1)
        return x

model = Model()
x = torch.randn(2, 3, 16, 16)

res_eager = model(*[x])
res_compiled = torch.compile(model, dynamic=True)(x)

print("Max diff:", (res_eager - res_compiled).abs().max().item())

---

Run 1: Max diff: 483328.0
Run 2: Max diff: 565248.0
Run 3: Max diff: 528384.0

---

PyTorch version: 2.13.0.dev20260521+cu126
OS: Linux (Ubuntu 20.04)
Python: 3.12
CUDA: 12.6

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

A model combining Conv2d → BatchNorm2d → ReLU → diag_embed → erf → normalize(p=-1) produces wildly incorrect outputs under torch.compile(dynamic=True) compared to eager mode. The maximum absolute difference reaches 565,248 , indicating a severe correctness bug — not a precision issue, but a fundamentally wrong computation.

Both modes complete without errors and produce identically-shaped outputs. The bug is reproducible across multiple runs with varying random inputs, always producing massive discrepancies.

Minimal repro

import torch
import torch.nn as nn
import torch.nn.functional as F

torch.set_grad_enabled(False)

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 8, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(8)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = torch.diag_embed(x)
        x = torch.erf(x)
        x = F.normalize(x, p=-1, dim=-1)
        return x

model = Model()
x = torch.randn(2, 3, 16, 16)

res_eager = model(*[x])
res_compiled = torch.compile(model, dynamic=True)(x)

print("Max diff:", (res_eager - res_compiled).abs().max().item())

Result (3 consecutive runs, same repro)

Run 1: Max diff: 483328.0
Run 2: Max diff: 565248.0
Run 3: Max diff: 528384.0

The discrepancy is not deterministic (varies with random input) but always present and massive (5e+05 range). The outputs have identical shape but fundamentally different values.

Root cause hypothesis

The operator chain creates a numerically fragile computation:

diag_embed — creates sparse-structured tensors with many near-zero elements
erf — maps to (-1, 1), further compressing the dynamic range
normalize(p=-1) — divides by min(|x|) along the last dimension, making the result extremely sensitive to tiny values Under dynamic=True , Inductor's fusion of diag_embed + erf may produce slightly different intermediate values for near-zero diagonal elements. The normalize(p=-1) then amplifies these tiny differences into massive output discrepancies due to division by a near-zero denominator.

Additional context

Both eager and compiled modes complete without error — silent correctness bug (INCON type).
The max diff is 350× larger than a previously reported INCON bug involving polygamma + cumulative_trapezoid (~1,400), suggesting a different root cause.

Versions

PyTorch version: 2.13.0.dev20260521+cu126
OS: Linux (Ubuntu 20.04)
Python: 3.12
CUDA: 12.6

cc @chauhang @penguinwu @ezyang @bobrenjc93 @aditvenk @laithsakka @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix [inductor] Severe numerical inconsistency: diag_embed + erf + normalize(p=-1) under torch.compile(dynamic=True) — max diff up to 565,248

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root cause hypothesis

Code Example

🐛 Describe the bug

Minimal repro

Result (3 consecutive runs, same repro)

Root cause hypothesis

Additional context

Versions

Still need to ship something?

TRENDING