pytorch - ✅(Solved) Fix `torch.compile` raises RuntimeError on valid `torch.addmm` with shape mismatch where eager succeeds [1 pull requests, 1 participants]

pytorch2026-03-21 07:54:29

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#178040•Fetched 2026-04-08 01:07:35

View on GitHub

Comments

Participants

Timeline

Reactions

Author

himi1008

Participants

himi1008

Timeline (top)

subscribed ×25mentioned ×24labeled ×6

Error Message

RuntimeError: The size of tensor a (512) must match the size of tensor b (10000) at non-singleton dimension 1

Root Cause

In eager mode, torch.addmm(bias, input, weight, beta=0.0, alpha=0.1) with beta=0.0 algebraically means 0.0 * bias + 0.1 * (input @ weight), so the bias value and shape are irrelevant — it's purely 0.1 * (input @ weight). The eager CUDA kernel appears to skip the bias contribution entirely when beta=0.0, so no shape check is performed on the bias.

In torch.compile / Inductor, the lowering validates tensor shapes statically before executing, detecting the shape mismatch between bias (dim 512) and the matrix product output (dim 10000) regardless of beta value.

This is a valid compile regression: users can reasonably pass dummy bias tensors with beta=0.0 to effectively perform scaled matrix multiply.

PR fix notes

PR #180716: Fix torch.compile addmm with beta=0 and mismatched bias

Repository: pytorch/pytorch
Author: dsashidh
State: open | merged: False
Link: https://github.com/pytorch/pytorch/pull/180716

Description (problem / solution / changelog)

Fixes #178040 Fixes torch.compile raising RuntimeError on torch.addmm calls where beta=0 and bias shape doesn't match output shape. When beta=0 the bias term is zeroed out so its shape is irrelevant. This now matches eager

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @mlazos

Changed files

test/inductor/test_torchinductor.py (modified, +21/-0)
torch/_inductor/decomposition.py (modified, +6/-0)

Code Example

import torch

d_model = 512
vocab_size = 10000
batch_size = 4

x = torch.randn(batch_size, d_model, device="cuda")
weight = torch.randn(vocab_size, d_model, device="cuda")
bias = torch.zeros(d_model, device="cuda")  # shape [512], not [10000]

# Eager: succeeds because beta=0.0 zeros out the bias term
try:
    out = torch.addmm(bias, x, weight.t(), beta=0.0, alpha=0.1)
    print(f"eager: OK shape={out.shape}")
except RuntimeError as e:
    print(f"eager: ERROR — {e}")

# Compiled: raises shape mismatch error
torch._dynamo.reset()

@torch.compile(fullgraph=True)
def compiled_addmm(bias, x, weight):
    return torch.addmm(bias, x, weight.t(), beta=0.0, alpha=0.1)

try:
    out = compiled_addmm(bias, x, weight)
    print(f"compile: OK shape={out.shape}")
except RuntimeError as e:
    print(f"compile: ERROR — {e}")

---

import torch
import torch.nn as nn
import torch.nn.functional as F

class TransformerStyleModel(nn.Module):
    def __init__(self, d_model=512, nhead=8, num_layers=3, vocab_size=10000, max_seq_len=512):
        super().__init__()
        self.d_model = d_model
        self.embedding = nn.Embedding(vocab_size, d_model)
        self.pos_encoding = nn.Parameter(torch.randn(1, max_seq_len, d_model) * 0.02)
        self.ffn_weight1 = nn.Parameter(torch.randn(d_model * 4, d_model) * 0.02)
        self.ffn_bias1 = nn.Parameter(torch.zeros(d_model * 4))
        self.ffn_weight2 = nn.Parameter(torch.randn(d_model, d_model * 4) * 0.02)
        self.ffn_bias2 = nn.Parameter(torch.zeros(d_model))
        self.ln1 = nn.LayerNorm(d_model)
        self.ln2 = nn.LayerNorm(d_model)
        self.output_proj = nn.Linear(d_model, vocab_size)
        self.dropout = nn.Dropout(0.1)

    def forward(self, x):
        batch_size, seq_len = x.shape
        x = self.embedding(x) * self.d_model ** 0.5
        x = x + self.pos_encoding[:, :seq_len, :]
        x = self.dropout(x)
        x_flat = x.view(-1, self.d_model)
        ffn_intermediate = torch.addmm(self.ffn_bias1, x_flat, self.ffn_weight1.t(), beta=1.0, alpha=1.0)
        ffn_intermediate = F.gelu(ffn_intermediate)
        ffn_intermediate = ffn_intermediate * 0.5
        ffn_intermediate = self.dropout(ffn_intermediate)
        ffn_output = torch.addmm(self.ffn_bias2, ffn_intermediate, self.ffn_weight2.t(), beta=1.0, alpha=1.0)
        ffn_output = torch.tanh(ffn_output)
        ffn_output = ffn_output + x_flat
        ffn_output = self.ln1(ffn_output)
        ffn_output = ffn_output.view(batch_size, seq_len, self.d_model)
        x_flat2 = ffn_output.view(-1, self.d_model)
        # BUG: bias shape [512] mismatches output [10000], but beta=0.0 makes it work in eager
        logits = torch.addmm(torch.zeros(self.d_model, device=x.device), x_flat2,
                              self.output_proj.weight.t(), beta=0.0, alpha=0.1)
        logits = logits + self.output_proj.bias
        logits = torch.softmax(logits, dim=-1)
        logits = logits * 0.9 + 0.05
        logits = logits.view(batch_size, seq_len, -1)
        return logits

model = TransformerStyleModel().cuda().eval()
x = torch.randint(0, 10000, (2, 32), dtype=torch.long, device="cuda")

# Eager: succeeds
try:
    with torch.no_grad():
        out = model(x)
    print(f"eager: OK shape={out.shape}")
except RuntimeError as e:
    print(f"eager: ERROR — {e}")

# Compiled: fails
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    with torch.no_grad():
        out = compiled_model(x)
    print(f"compile: OK shape={out.shape}")
except RuntimeError as e:
    print(f"compile: ERROR — {e}")

---

(no error — returns tensor of shape [batch*seq, vocab_size])

---

RuntimeError: The size of tensor a (512) must match the size of tensor b (10000)
at non-singleton dimension 1

---

PyTorch version: 2.12.0.dev20260315+cu126
OS: Ubuntu 22.04.5 LTS (x86_64)
Python version: 3.10.12
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
CUDA: 12.6

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile raises RuntimeError: The size of tensor a (512) must match the size of tensor b (10000) at non-singleton dimension 1 on a valid torch.addmm call that eager mode executes successfully. This is a compile regression — the model runs fine in eager mode but fails under compilation.

The root cause is a torch.addmm(bias, input, weight.t(), beta=0.0, alpha=0.1) call where the bias tensor has a wrong size (d_model=512 instead of vocab_size=10000), but in eager mode beta=0.0 causes the bias term to be zeroed out, making its shape irrelevant. torch.compile / Inductor appears to lower this in a way that validates the bias shape even when beta=0.0.

This was discovered via a fuzzer-generated Transformer model targeting the unfuse_bias_add_to_pointwise optimization pattern.

Minimal reproducer

import torch

d_model = 512
vocab_size = 10000
batch_size = 4

x = torch.randn(batch_size, d_model, device="cuda")
weight = torch.randn(vocab_size, d_model, device="cuda")
bias = torch.zeros(d_model, device="cuda")  # shape [512], not [10000]

# Eager: succeeds because beta=0.0 zeros out the bias term
try:
    out = torch.addmm(bias, x, weight.t(), beta=0.0, alpha=0.1)
    print(f"eager: OK shape={out.shape}")
except RuntimeError as e:
    print(f"eager: ERROR — {e}")

# Compiled: raises shape mismatch error
torch._dynamo.reset()

@torch.compile(fullgraph=True)
def compiled_addmm(bias, x, weight):
    return torch.addmm(bias, x, weight.t(), beta=0.0, alpha=0.1)

try:
    out = compiled_addmm(bias, x, weight)
    print(f"compile: OK shape={out.shape}")
except RuntimeError as e:
    print(f"compile: ERROR — {e}")

Full model-level reproducer (as found by fuzzer)

import torch
import torch.nn as nn
import torch.nn.functional as F

class TransformerStyleModel(nn.Module):
    def __init__(self, d_model=512, nhead=8, num_layers=3, vocab_size=10000, max_seq_len=512):
        super().__init__()
        self.d_model = d_model
        self.embedding = nn.Embedding(vocab_size, d_model)
        self.pos_encoding = nn.Parameter(torch.randn(1, max_seq_len, d_model) * 0.02)
        self.ffn_weight1 = nn.Parameter(torch.randn(d_model * 4, d_model) * 0.02)
        self.ffn_bias1 = nn.Parameter(torch.zeros(d_model * 4))
        self.ffn_weight2 = nn.Parameter(torch.randn(d_model, d_model * 4) * 0.02)
        self.ffn_bias2 = nn.Parameter(torch.zeros(d_model))
        self.ln1 = nn.LayerNorm(d_model)
        self.ln2 = nn.LayerNorm(d_model)
        self.output_proj = nn.Linear(d_model, vocab_size)
        self.dropout = nn.Dropout(0.1)

    def forward(self, x):
        batch_size, seq_len = x.shape
        x = self.embedding(x) * self.d_model ** 0.5
        x = x + self.pos_encoding[:, :seq_len, :]
        x = self.dropout(x)
        x_flat = x.view(-1, self.d_model)
        ffn_intermediate = torch.addmm(self.ffn_bias1, x_flat, self.ffn_weight1.t(), beta=1.0, alpha=1.0)
        ffn_intermediate = F.gelu(ffn_intermediate)
        ffn_intermediate = ffn_intermediate * 0.5
        ffn_intermediate = self.dropout(ffn_intermediate)
        ffn_output = torch.addmm(self.ffn_bias2, ffn_intermediate, self.ffn_weight2.t(), beta=1.0, alpha=1.0)
        ffn_output = torch.tanh(ffn_output)
        ffn_output = ffn_output + x_flat
        ffn_output = self.ln1(ffn_output)
        ffn_output = ffn_output.view(batch_size, seq_len, self.d_model)
        x_flat2 = ffn_output.view(-1, self.d_model)
        # BUG: bias shape [512] mismatches output [10000], but beta=0.0 makes it work in eager
        logits = torch.addmm(torch.zeros(self.d_model, device=x.device), x_flat2,
                              self.output_proj.weight.t(), beta=0.0, alpha=0.1)
        logits = logits + self.output_proj.bias
        logits = torch.softmax(logits, dim=-1)
        logits = logits * 0.9 + 0.05
        logits = logits.view(batch_size, seq_len, -1)
        return logits

model = TransformerStyleModel().cuda().eval()
x = torch.randint(0, 10000, (2, 32), dtype=torch.long, device="cuda")

# Eager: succeeds
try:
    with torch.no_grad():
        out = model(x)
    print(f"eager: OK shape={out.shape}")
except RuntimeError as e:
    print(f"eager: ERROR — {e}")

# Compiled: fails
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    with torch.no_grad():
        out = compiled_model(x)
    print(f"compile: OK shape={out.shape}")
except RuntimeError as e:
    print(f"compile: ERROR — {e}")

Behavior summary

Operation	Eager	`torch.compile`	Consistent?
`torch.addmm(bias_512, x, weight_10000x512.t(), beta=0.0)`	Succeeds	`RuntimeError` (shape mismatch)	No

Root cause analysis

This is a valid compile regression: users can reasonably pass dummy bias tensors with beta=0.0 to effectively perform scaled matrix multiply.

Ablation

This bug was discovered in E6 (full system) round-3, where the advanced feedback and self-repair pipeline generated models targeting the unfuse_bias_add_to_pointwise optimization pattern.

Error logs

Eager mode (correct behavior — succeeds):

(no error — returns tensor of shape [batch*seq, vocab_size])

torch.compile (incorrect — should also succeed):

RuntimeError: The size of tensor a (512) must match the size of tensor b (10000)
at non-singleton dimension 1

Versions

PyTorch version: 2.12.0.dev20260315+cu126
OS: Ubuntu 22.04.5 LTS (x86_64)
Python version: 3.10.12
GPU: NVIDIA GeForce RTX 3060 Laptop GPU
CUDA: 12.6

cc @jianyuh @nikitaved @mruberry @walterddr @xwang233 @Lezcano @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

Fix Plan

To fix the issue, we need to ensure that the bias tensor has the correct shape when using torch.addmm with beta=0.0 in compiled mode.

Here are the steps to fix the issue:

Update the bias tensor shape to match the output shape of the matrix product.
Alternatively, pass a dummy bias tensor with the correct shape when beta=0.0.

Code Changes

# Update the bias tensor shape
bias = torch.zeros(vocab_size, device="cuda")  # shape [10000]

# Alternatively, pass a dummy bias tensor with the correct shape when beta=0.0
logits = torch.addmm(torch.zeros(vocab_size, device=x.device), x_flat2, self.output_proj.weight.t(), beta=0.0, alpha=0.1)

Verification

To verify that the fix worked, run the compiled model with the updated bias tensor shape and check that it no longer raises a RuntimeError.

# Eager: succeeds
try:
    with torch.no_grad():
        out = model(x)
    print(f"eager: OK shape={out.shape}")
except RuntimeError as e:
    print(f"eager: ERROR — {e}")

# Compiled: should now succeed
torch._dynamo.reset()
compiled_model = torch.compile(model, fullgraph=True)
try:
    with torch.no_grad():
        out = compiled_model(x)
    print(f"compile: OK shape={out.shape}")
except RuntimeError as e:
    print(f"compile: ERROR — {e}")

Extra Tips

Always ensure that tensor shapes are correct when using torch.addmm, even when beta=0.0.
Use the torch.zeros function to create a dummy bias tensor with the correct shape when beta=0.0.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #optimization #model compatibility #GPU setup #container setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix `torch.compile` raises RuntimeError on valid `torch.addmm` with shape mismatch where eager succeeds [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #180716: Fix torch.compile addmm with beta=0 and mismatched bias

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Minimal reproducer

Full model-level reproducer (as found by fuzzer)

Behavior summary

Root cause analysis

Ablation

Error logs

Versions

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix `torch.compile` raises RuntimeError on valid `torch.addmm` with shape mismatch where eager succeeds [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #180716: Fix torch.compile addmm with beta=0 and mismatched bias

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Minimal reproducer

Full model-level reproducer (as found by fuzzer)

Behavior summary

Root cause analysis

Ablation

Error logs

Versions

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING