pytorch - 💡(How to fix) Fix `torch.compile` crashes with "Failed to trace builtin operator" on valid MKLDNN unary model [2 comments, 2 participants]

pytorch2026-03-21 07:56:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#178041•Fetched 2026-04-08 01:07:34

View on GitHub

Comments

Participants

Timeline

119

Reactions

Author

himi1008

Participants

himi1008

williamwen42

Timeline (top)

mentioned ×50subscribed ×50labeled ×9referenced ×3

Error Message

import torch import torch.nn as nn

class GELUModel(nn.Module): def init(self, in_features=128, out_features=256): super().init() self.embedding = nn.Embedding(num_embeddings=500, embedding_dim=in_features) self.linear = nn.Linear(in_features, out_features) self.norm = nn.BatchNorm1d(out_features)

def forward(self, x):
    x = self.embedding(x)
    if x.dim() > 2:
        batch_size, seq_len, emb_dim = x.shape
        x = x.view(-1, emb_dim)
    
    x = self.linear(x)
    # GELU via erf decomposition
    t1 = x * 0.7071067811865476
    t2 = torch.erf(t1)
    t3 = t2 + 1
    t4 = x * 0.5
    x = t4 * t3
    
    x = self.norm(x)
    
    if 'batch_size' in locals() and 'seq_len' in locals():
        x = x.view(batch_size, seq_len, -1)
    return x

device = "cuda" model = GELUModel().to(device).eval() x = torch.randint(0, 500, (8, 16), dtype=torch.long, device=device)

Eager: works fine

with torch.no_grad(): ref = model(x) print(f"Eager: OK, shape={ref.shape}")

Compiled: crashes

torch._dynamo.reset() compiled = torch.compile(model, backend="inductor") try: with torch.no_grad(): out = compiled(x) print(f"Compiled: OK, shape={out.shape}") except Exception as e: print(f"Compiled: CRASH — {e}")

Root Cause

The model uses locals() check in a conditional (if 'batch_size' in locals()), which may confuse Dynamo tracing.
The view(-1, emb_dim) followed by view(batch_size, seq_len, -1) reshape sequence works in eager because the tensor is contiguous, but Dynamo may not handle the conditional shape logic correctly.
The GELU pattern 0.5 * x * (1 + erf(x / √2)) is a common decomposition that should be compilable.

Code Example

import torch
import torch.nn as nn

class GELUModel(nn.Module):
    def __init__(self, in_features=128, out_features=256):
        super().__init__()
        self.embedding = nn.Embedding(num_embeddings=500, embedding_dim=in_features)
        self.linear = nn.Linear(in_features, out_features)
        self.norm = nn.BatchNorm1d(out_features)

    def forward(self, x):
        x = self.embedding(x)
        if x.dim() > 2:
            batch_size, seq_len, emb_dim = x.shape
            x = x.view(-1, emb_dim)
        
        x = self.linear(x)
        # GELU via erf decomposition
        t1 = x * 0.7071067811865476
        t2 = torch.erf(t1)
        t3 = t2 + 1
        t4 = x * 0.5
        x = t4 * t3
        
        x = self.norm(x)
        
        if 'batch_size' in locals() and 'seq_len' in locals():
            x = x.view(batch_size, seq_len, -1)
        return x

device = "cuda"
model = GELUModel().to(device).eval()
x = torch.randint(0, 500, (8, 16), dtype=torch.long, device=device)

# Eager: works fine
with torch.no_grad():
    ref = model(x)
print(f"Eager: OK, shape={ref.shape}")

# Compiled: crashes
torch._dynamo.reset()
compiled = torch.compile(model, backend="inductor")
try:
    with torch.no_grad():
        out = compiled(x)
    print(f"Compiled: OK, shape={out.shape}")
except Exception as e:
    print(f"Compiled: CRASH — {e}")

---

Failed to trace builtin operator

---

PyTorch version: 2.12.0.dev20260315+cu126
Python: 3.10.12
OS: Ubuntu 22.04.5 LTS (WSL2)
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
CUDA: 12.6

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile with inductor backend crashes with Failed to trace builtin operator when compiling a valid CNN model that uses GELU activation via the erf approximation (0.5 * x * (1 + erf(x / √2))), combined with nn.BatchNorm1d and a view/reshape sequence. The same model runs correctly in eager mode.

The error occurs during Dynamo tracing, suggesting that the compiler cannot trace a builtin Python operator (likely a comparison or conditional) encountered in the BatchNorm1d or view path. This was discovered via a fuzzer-generated model targeting the MKLDNN unary=4 pointwise fusion pattern.

Minimal reproducer

import torch
import torch.nn as nn

class GELUModel(nn.Module):
    def __init__(self, in_features=128, out_features=256):
        super().__init__()
        self.embedding = nn.Embedding(num_embeddings=500, embedding_dim=in_features)
        self.linear = nn.Linear(in_features, out_features)
        self.norm = nn.BatchNorm1d(out_features)

    def forward(self, x):
        x = self.embedding(x)
        if x.dim() > 2:
            batch_size, seq_len, emb_dim = x.shape
            x = x.view(-1, emb_dim)
        
        x = self.linear(x)
        # GELU via erf decomposition
        t1 = x * 0.7071067811865476
        t2 = torch.erf(t1)
        t3 = t2 + 1
        t4 = x * 0.5
        x = t4 * t3
        
        x = self.norm(x)
        
        if 'batch_size' in locals() and 'seq_len' in locals():
            x = x.view(batch_size, seq_len, -1)
        return x

device = "cuda"
model = GELUModel().to(device).eval()
x = torch.randint(0, 500, (8, 16), dtype=torch.long, device=device)

# Eager: works fine
with torch.no_grad():
    ref = model(x)
print(f"Eager: OK, shape={ref.shape}")

# Compiled: crashes
torch._dynamo.reset()
compiled = torch.compile(model, backend="inductor")
try:
    with torch.no_grad():
        out = compiled(x)
    print(f"Compiled: OK, shape={out.shape}")
except Exception as e:
    print(f"Compiled: CRASH — {e}")

Behavior summary

Mode	Result
Eager	Succeeds, output shape `(8, 16, 256)`
`torch.compile(backend="inductor")`	Crashes: `Failed to trace builtin operator`

Notes

The model uses locals() check in a conditional (if 'batch_size' in locals()), which may confuse Dynamo tracing.
The view(-1, emb_dim) followed by view(batch_size, seq_len, -1) reshape sequence works in eager because the tensor is contiguous, but Dynamo may not handle the conditional shape logic correctly.
The GELU pattern 0.5 * x * (1 + erf(x / √2)) is a common decomposition that should be compilable.

Error logs

Failed to trace builtin operator

(Full traceback from torch.compile compilation phase)

Versions

PyTorch version: 2.12.0.dev20260315+cu126
Python: 3.10.12
OS: Ubuntu 22.04.5 LTS (WSL2)
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
CUDA: 12.6

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal @chauhang @penguinwu @voznesenskym @EikanWang @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @amjames @Lucaskabela @jataylo

extent analysis

Fix Plan

To fix the issue with torch.compile crashing when using the inductor backend, we need to modify the model to avoid using locals() for conditional checks and ensure that the tensor shapes are properly handled. Here are the steps:

Modify the forward method to avoid using locals() for conditional checks.
Ensure that the tensor shapes are properly handled when using view and reshape.

Code Changes

import torch
import torch.nn as nn

class GELUModel(nn.Module):
    def __init__(self, in_features=128, out_features=256):
        super().__init__()
        self.embedding = nn.Embedding(num_embeddings=500, embedding_dim=in_features)
        self.linear = nn.Linear(in_features, out_features)
        self.norm = nn.BatchNorm1d(out_features)

    def forward(self, x):
        x = self.embedding(x)
        batch_size = x.shape[0]
        seq_len = x.shape[1]
        emb_dim = x.shape[2]
        
        x = x.view(-1, emb_dim)
        
        x = self.linear(x)
        # GELU via erf decomposition
        t1 = x * 0.7071067811865476
        t2 = torch.erf(t1)
        t3 = t2 + 1
        t4 = x * 0.5
        x = t4 * t3
        
        x = self.norm(x)
        
        x = x.view(batch_size, seq_len, -1)
        return x

device = "cuda"
model = GELUModel().to(device).eval()
x = torch.randint(0, 500, (8, 16), dtype=torch.long, device=device)

# Eager: works fine
with torch.no_grad():
    ref = model(x)
print(f"Eager: OK, shape={ref.shape}")

# Compiled: should work now
torch._dynamo.reset()
compiled = torch.compile(model, backend="inductor")
try:
    with torch.no_grad():
        out = compiled(x)
    print(f"Compiled: OK, shape={out.shape}")
except Exception as e:
    print(f"Compiled: CRASH — {e}")

Verification

To verify that the fix worked, run the modified code and check that the compiled model runs without crashing and produces the correct output shape.

Extra Tips

When using torch.compile, it's essential to ensure that the model is properly defined and doesn't use any unsupported operations or conditional checks that may confuse the compiler.
Using locals() for conditional checks can lead to issues with Dynamo tracing, so it's recommended to avoid using it in the model's forward method.
Properly handling tensor shapes when using view and reshape is crucial to ensure that the model works correctly in both eager and

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #serialization error #model compatibility #GPU setup #container setup #orchestration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix `torch.compile` crashes with "Failed to trace builtin operator" on valid MKLDNN unary model [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Eager: works fine

Compiled: crashes

Root Cause

Code Example

🐛 Describe the bug

Minimal reproducer

Behavior summary

Notes

Error logs

Versions

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix `torch.compile` crashes with "Failed to trace builtin operator" on valid MKLDNN unary model [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Eager: works fine

Compiled: crashes

Root Cause

Code Example

🐛 Describe the bug

Minimal reproducer

Behavior summary

Notes

Error logs

Versions

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING