pytorch - 💡(How to fix) Fix `torch.compile` crashes with "Failed to trace builtin operator" on valid MKLDNN unary model [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178041Fetched 2026-04-08 01:07:34
View on GitHub
Comments
2
Participants
2
Timeline
119
Reactions
0
Author
Timeline (top)
mentioned ×50subscribed ×50labeled ×9referenced ×3

Error Message

import torch import torch.nn as nn

class GELUModel(nn.Module): def init(self, in_features=128, out_features=256): super().init() self.embedding = nn.Embedding(num_embeddings=500, embedding_dim=in_features) self.linear = nn.Linear(in_features, out_features) self.norm = nn.BatchNorm1d(out_features)

def forward(self, x):
    x = self.embedding(x)
    if x.dim() > 2:
        batch_size, seq_len, emb_dim = x.shape
        x = x.view(-1, emb_dim)
    
    x = self.linear(x)
    # GELU via erf decomposition
    t1 = x * 0.7071067811865476
    t2 = torch.erf(t1)
    t3 = t2 + 1
    t4 = x * 0.5
    x = t4 * t3
    
    x = self.norm(x)
    
    if 'batch_size' in locals() and 'seq_len' in locals():
        x = x.view(batch_size, seq_len, -1)
    return x

device = "cuda" model = GELUModel().to(device).eval() x = torch.randint(0, 500, (8, 16), dtype=torch.long, device=device)

Eager: works fine

with torch.no_grad(): ref = model(x) print(f"Eager: OK, shape={ref.shape}")

Compiled: crashes

torch._dynamo.reset() compiled = torch.compile(model, backend="inductor") try: with torch.no_grad(): out = compiled(x) print(f"Compiled: OK, shape={out.shape}") except Exception as e: print(f"Compiled: CRASH — {e}")

Root Cause

  • The model uses locals() check in a conditional (if 'batch_size' in locals()), which may confuse Dynamo tracing.
  • The view(-1, emb_dim) followed by view(batch_size, seq_len, -1) reshape sequence works in eager because the tensor is contiguous, but Dynamo may not handle the conditional shape logic correctly.
  • The GELU pattern 0.5 * x * (1 + erf(x / √2)) is a common decomposition that should be compilable.

Code Example

import torch
import torch.nn as nn

class GELUModel(nn.Module):
    def __init__(self, in_features=128, out_features=256):
        super().__init__()
        self.embedding = nn.Embedding(num_embeddings=500, embedding_dim=in_features)
        self.linear = nn.Linear(in_features, out_features)
        self.norm = nn.BatchNorm1d(out_features)

    def forward(self, x):
        x = self.embedding(x)
        if x.dim() > 2:
            batch_size, seq_len, emb_dim = x.shape
            x = x.view(-1, emb_dim)
        
        x = self.linear(x)
        # GELU via erf decomposition
        t1 = x * 0.7071067811865476
        t2 = torch.erf(t1)
        t3 = t2 + 1
        t4 = x * 0.5
        x = t4 * t3
        
        x = self.norm(x)
        
        if 'batch_size' in locals() and 'seq_len' in locals():
            x = x.view(batch_size, seq_len, -1)
        return x

device = "cuda"
model = GELUModel().to(device).eval()
x = torch.randint(0, 500, (8, 16), dtype=torch.long, device=device)

# Eager: works fine
with torch.no_grad():
    ref = model(x)
print(f"Eager: OK, shape={ref.shape}")

# Compiled: crashes
torch._dynamo.reset()
compiled = torch.compile(model, backend="inductor")
try:
    with torch.no_grad():
        out = compiled(x)
    print(f"Compiled: OK, shape={out.shape}")
except Exception as e:
    print(f"Compiled: CRASH — {e}")

---

Failed to trace builtin operator

---

PyTorch version: 2.12.0.dev20260315+cu126
Python: 3.10.12
OS: Ubuntu 22.04.5 LTS (WSL2)
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
CUDA: 12.6
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile with inductor backend crashes with Failed to trace builtin operator when compiling a valid CNN model that uses GELU activation via the erf approximation (0.5 * x * (1 + erf(x / √2))), combined with nn.BatchNorm1d and a view/reshape sequence. The same model runs correctly in eager mode.

The error occurs during Dynamo tracing, suggesting that the compiler cannot trace a builtin Python operator (likely a comparison or conditional) encountered in the BatchNorm1d or view path. This was discovered via a fuzzer-generated model targeting the MKLDNN unary=4 pointwise fusion pattern.

Minimal reproducer

import torch
import torch.nn as nn

class GELUModel(nn.Module):
    def __init__(self, in_features=128, out_features=256):
        super().__init__()
        self.embedding = nn.Embedding(num_embeddings=500, embedding_dim=in_features)
        self.linear = nn.Linear(in_features, out_features)
        self.norm = nn.BatchNorm1d(out_features)

    def forward(self, x):
        x = self.embedding(x)
        if x.dim() > 2:
            batch_size, seq_len, emb_dim = x.shape
            x = x.view(-1, emb_dim)
        
        x = self.linear(x)
        # GELU via erf decomposition
        t1 = x * 0.7071067811865476
        t2 = torch.erf(t1)
        t3 = t2 + 1
        t4 = x * 0.5
        x = t4 * t3
        
        x = self.norm(x)
        
        if 'batch_size' in locals() and 'seq_len' in locals():
            x = x.view(batch_size, seq_len, -1)
        return x

device = "cuda"
model = GELUModel().to(device).eval()
x = torch.randint(0, 500, (8, 16), dtype=torch.long, device=device)

# Eager: works fine
with torch.no_grad():
    ref = model(x)
print(f"Eager: OK, shape={ref.shape}")

# Compiled: crashes
torch._dynamo.reset()
compiled = torch.compile(model, backend="inductor")
try:
    with torch.no_grad():
        out = compiled(x)
    print(f"Compiled: OK, shape={out.shape}")
except Exception as e:
    print(f"Compiled: CRASH — {e}")

Behavior summary

ModeResult
EagerSucceeds, output shape (8, 16, 256)
torch.compile(backend="inductor")Crashes: Failed to trace builtin operator

Notes

  • The model uses locals() check in a conditional (if 'batch_size' in locals()), which may confuse Dynamo tracing.
  • The view(-1, emb_dim) followed by view(batch_size, seq_len, -1) reshape sequence works in eager because the tensor is contiguous, but Dynamo may not handle the conditional shape logic correctly.
  • The GELU pattern 0.5 * x * (1 + erf(x / √2)) is a common decomposition that should be compilable.

Error logs

Failed to trace builtin operator

(Full traceback from torch.compile compilation phase)

Versions

PyTorch version: 2.12.0.dev20260315+cu126
Python: 3.10.12
OS: Ubuntu 22.04.5 LTS (WSL2)
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
CUDA: 12.6

cc @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @jgong5 @mingfeima @sanchitintel @ashokei @jingxu10 @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal @chauhang @penguinwu @voznesenskym @EikanWang @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @amjames @Lucaskabela @jataylo

extent analysis

Fix Plan

To fix the issue with torch.compile crashing when using the inductor backend, we need to modify the model to avoid using locals() for conditional checks and ensure that the tensor shapes are properly handled. Here are the steps:

  • Modify the forward method to avoid using locals() for conditional checks.
  • Ensure that the tensor shapes are properly handled when using view and reshape.

Code Changes

import torch
import torch.nn as nn

class GELUModel(nn.Module):
    def __init__(self, in_features=128, out_features=256):
        super().__init__()
        self.embedding = nn.Embedding(num_embeddings=500, embedding_dim=in_features)
        self.linear = nn.Linear(in_features, out_features)
        self.norm = nn.BatchNorm1d(out_features)

    def forward(self, x):
        x = self.embedding(x)
        batch_size = x.shape[0]
        seq_len = x.shape[1]
        emb_dim = x.shape[2]
        
        x = x.view(-1, emb_dim)
        
        x = self.linear(x)
        # GELU via erf decomposition
        t1 = x * 0.7071067811865476
        t2 = torch.erf(t1)
        t3 = t2 + 1
        t4 = x * 0.5
        x = t4 * t3
        
        x = self.norm(x)
        
        x = x.view(batch_size, seq_len, -1)
        return x

device = "cuda"
model = GELUModel().to(device).eval()
x = torch.randint(0, 500, (8, 16), dtype=torch.long, device=device)

# Eager: works fine
with torch.no_grad():
    ref = model(x)
print(f"Eager: OK, shape={ref.shape}")

# Compiled: should work now
torch._dynamo.reset()
compiled = torch.compile(model, backend="inductor")
try:
    with torch.no_grad():
        out = compiled(x)
    print(f"Compiled: OK, shape={out.shape}")
except Exception as e:
    print(f"Compiled: CRASH — {e}")

Verification

To verify that the fix worked, run the modified code and check that the compiled model runs without crashing and produces the correct output shape.

Extra Tips

  • When using torch.compile, it's essential to ensure that the model is properly defined and doesn't use any unsupported operations or conditional checks that may confuse the compiler.
  • Using locals() for conditional checks can lead to issues with Dynamo tracing, so it's recommended to avoid using it in the model's forward method.
  • Properly handling tensor shapes when using view and reshape is crucial to ensure that the model works correctly in both eager and

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix `torch.compile` crashes with "Failed to trace builtin operator" on valid MKLDNN unary model [2 comments, 2 participants]