pytorch - 💡(How to fix) Fix torch.compile mismatch for matmul followed by bfloat16 cast and fp32 add [1 comments, 2 participants]

pytorch2026-04-27 10:14:09

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#181568•Fetched 2026-04-28 06:24:41

View on GitHub

Comments

Participants

Timeline

Reactions

Author

rookieLiu2018

Participants

IvanKobzarev

rookieLiu2018

Timeline (top)

mentioned ×21subscribed ×21labeled ×5commented ×1

Error Message

out max_abs_diff = 0.0307159423828125 mean_abs_diff = 0.006347520276904106 diff_count = 510 numel = 510

AssertionError Traceback (most recent call last) /tmp/ipykernel_38/2312913235.py in <cell line: 0>() 30 "numel =", r.numel(), 31 ) ---> 32 torch.testing.assert_close(r, g)

/usr/local/lib/python3.11/dist-packages/torch/testing/_comparison.py in assert_close(actual, expected, allow_subclasses, rtol, atol, equal_nan, check_device, check_dtype, check_layout, check_stride, msg) 1528 if error_metas: 1529 # TODO: compose all metas into one AssertionError -> 1530 raise error_metas[0].to_error(msg) 1531 1532

AssertionError: Tensor-likes are not close!

Mismatched elements: 507 / 510 (99.4%) Greatest absolute difference: 0.0307159423828125 at index (1, 14, 12) (up to 1e-05 allowed) Greatest relative difference: 0.24862122535705566 at index (1, 7, 5) (up to 1.3e-06 allowed)

Code Example

import torch

print(torch.__version__)

torch.manual_seed(119)

def f(x, w, b):
    y = (x @ w).to(torch.bfloat16) + b
    return y, y.mean(-1)

x = torch.randn(2, 15, 32)
w = torch.randn(32, 17)
b = torch.randn(17)

ref = f(x, w, b)
got = torch.compile(
    f,
    backend="inductor",
    fullgraph=True,
    dynamic=True,
)(x, w, b)

for name, r, g in zip(["out", "mean"], ref, got):
    diff = (r.float() - g.float()).abs()
    print(
        name,
        "max_abs_diff =", diff.max().item(),
        "mean_abs_diff =", diff.mean().item(),
        "diff_count =", int((diff != 0).sum()),
        "numel =", r.numel(),
    )
    torch.testing.assert_close(r, g)

---

out max_abs_diff = 0.0307159423828125 mean_abs_diff = 0.006347520276904106 diff_count = 510 numel = 510
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_38/2312913235.py in <cell line: 0>()
     30         "numel =", r.numel(),
     31     )
---> 32     torch.testing.assert_close(r, g)

/usr/local/lib/python3.11/dist-packages/torch/testing/_comparison.py in assert_close(actual, expected, allow_subclasses, rtol, atol, equal_nan, check_device, check_dtype, check_layout, check_stride, msg)
   1528     if error_metas:
   1529         # TODO: compose all metas into one AssertionError
-> 1530         raise error_metas[0].to_error(msg)
   1531 
   1532 

AssertionError: Tensor-likes are not close!

Mismatched elements: 507 / 510 (99.4%)
Greatest absolute difference: 0.0307159423828125 at index (1, 14, 12) (up to 1e-05 allowed)
Greatest relative difference: 0.24862122535705566 at index (1, 7, 5) (up to 1.3e-06 allowed)

---

PyTorch: 2.10.0+cpu

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile(backend="inductor") produces different results from eager execution when the result of a matmul is explicitly cast to torch.bfloat16 before adding an fp32 bias.

The explicit to(torch.bfloat16) is a semantic precision boundary: it should round/truncate the matmul result to bf16 before the following add. However, the compiled result differs from eager on almost all elements and appears consistent with the bf16 cast being removed or not preserved correctly.

Minimal reproducible example

import torch

print(torch.__version__)

torch.manual_seed(119)

def f(x, w, b):
    y = (x @ w).to(torch.bfloat16) + b
    return y, y.mean(-1)

x = torch.randn(2, 15, 32)
w = torch.randn(32, 17)
b = torch.randn(17)

ref = f(x, w, b)
got = torch.compile(
    f,
    backend="inductor",
    fullgraph=True,
    dynamic=True,
)(x, w, b)

for name, r, g in zip(["out", "mean"], ref, got):
    diff = (r.float() - g.float()).abs()
    print(
        name,
        "max_abs_diff =", diff.max().item(),
        "mean_abs_diff =", diff.mean().item(),
        "diff_count =", int((diff != 0).sum()),
        "numel =", r.numel(),
    )
    torch.testing.assert_close(r, g)

out max_abs_diff = 0.0307159423828125 mean_abs_diff = 0.006347520276904106 diff_count = 510 numel = 510
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_38/2312913235.py in <cell line: 0>()
     30         "numel =", r.numel(),
     31     )
---> 32     torch.testing.assert_close(r, g)

/usr/local/lib/python3.11/dist-packages/torch/testing/_comparison.py in assert_close(actual, expected, allow_subclasses, rtol, atol, equal_nan, check_device, check_dtype, check_layout, check_stride, msg)
   1528     if error_metas:
   1529         # TODO: compose all metas into one AssertionError
-> 1530         raise error_metas[0].to_error(msg)
   1531 
   1532 

AssertionError: Tensor-likes are not close!

Mismatched elements: 507 / 510 (99.4%)
Greatest absolute difference: 0.0307159423828125 at index (1, 14, 12) (up to 1e-05 allowed)
Greatest relative difference: 0.24862122535705566 at index (1, 7, 5) (up to 1.3e-06 allowed)

Versions

PyTorch: 2.10.0+cpu

cc @ezyang @gchanan @kadeng @msaroufim @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

The issue can be addressed by ensuring that the torch.compile backend correctly handles the explicit cast to torch.bfloat16 before adding an fp32 bias.

Guidance

Verify that the torch.compile backend is correctly configured to handle mixed precision operations, specifically the cast to torch.bfloat16.
Check if the issue persists when using a different backend or when disabling compilation.
Consider adding explicit type conversions or using a different data type for the bias to mitigate the issue.
Review the PyTorch documentation for any known issues or limitations related to mixed precision operations with the inductor backend.

Notes

The provided example is specific to PyTorch version 2.10.0+cpu, and the issue may be resolved in later versions. Additionally, the inductor backend is still a relatively new feature, and its behavior may not be fully consistent with eager execution.

Recommendation

Apply a workaround, such as using a different data type for the bias or adding explicit type conversions, to mitigate the issue until a fixed version of PyTorch is available.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#callback error #memory management #API rate limit #retriever error #indexing error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix torch.compile mismatch for matmul followed by bfloat16 cast and fp32 add [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

out max_abs_diff = 0.0307159423828125 mean_abs_diff = 0.006347520276904106 diff_count = 510 numel = 510

Code Example

🐛 Describe the bug

Minimal reproducible example

Versions

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix torch.compile mismatch for matmul followed by bfloat16 cast and fp32 add [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

out max_abs_diff = 0.0307159423828125 mean_abs_diff = 0.006347520276904106 diff_count = 510 numel = 510

Code Example

🐛 Describe the bug

Minimal reproducible example

Versions

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING