pytorch - 💡(How to fix) Fix torch.compile mismatch for matmul followed by bfloat16 cast and fp32 add [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181568Fetched 2026-04-28 06:24:41
View on GitHub
Comments
1
Participants
2
Timeline
48
Reactions
0
Timeline (top)
mentioned ×21subscribed ×21labeled ×5commented ×1

Error Message

out max_abs_diff = 0.0307159423828125 mean_abs_diff = 0.006347520276904106 diff_count = 510 numel = 510

AssertionError Traceback (most recent call last) /tmp/ipykernel_38/2312913235.py in <cell line: 0>() 30 "numel =", r.numel(), 31 ) ---> 32 torch.testing.assert_close(r, g)

/usr/local/lib/python3.11/dist-packages/torch/testing/_comparison.py in assert_close(actual, expected, allow_subclasses, rtol, atol, equal_nan, check_device, check_dtype, check_layout, check_stride, msg) 1528 if error_metas: 1529 # TODO: compose all metas into one AssertionError -> 1530 raise error_metas[0].to_error(msg) 1531 1532

AssertionError: Tensor-likes are not close!

Mismatched elements: 507 / 510 (99.4%) Greatest absolute difference: 0.0307159423828125 at index (1, 14, 12) (up to 1e-05 allowed) Greatest relative difference: 0.24862122535705566 at index (1, 7, 5) (up to 1.3e-06 allowed)

Code Example

import torch

print(torch.__version__)

torch.manual_seed(119)

def f(x, w, b):
    y = (x @ w).to(torch.bfloat16) + b
    return y, y.mean(-1)

x = torch.randn(2, 15, 32)
w = torch.randn(32, 17)
b = torch.randn(17)

ref = f(x, w, b)
got = torch.compile(
    f,
    backend="inductor",
    fullgraph=True,
    dynamic=True,
)(x, w, b)

for name, r, g in zip(["out", "mean"], ref, got):
    diff = (r.float() - g.float()).abs()
    print(
        name,
        "max_abs_diff =", diff.max().item(),
        "mean_abs_diff =", diff.mean().item(),
        "diff_count =", int((diff != 0).sum()),
        "numel =", r.numel(),
    )
    torch.testing.assert_close(r, g)

---

out max_abs_diff = 0.0307159423828125 mean_abs_diff = 0.006347520276904106 diff_count = 510 numel = 510
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_38/2312913235.py in <cell line: 0>()
     30         "numel =", r.numel(),
     31     )
---> 32     torch.testing.assert_close(r, g)

/usr/local/lib/python3.11/dist-packages/torch/testing/_comparison.py in assert_close(actual, expected, allow_subclasses, rtol, atol, equal_nan, check_device, check_dtype, check_layout, check_stride, msg)
   1528     if error_metas:
   1529         # TODO: compose all metas into one AssertionError
-> 1530         raise error_metas[0].to_error(msg)
   1531 
   1532 

AssertionError: Tensor-likes are not close!

Mismatched elements: 507 / 510 (99.4%)
Greatest absolute difference: 0.0307159423828125 at index (1, 14, 12) (up to 1e-05 allowed)
Greatest relative difference: 0.24862122535705566 at index (1, 7, 5) (up to 1.3e-06 allowed)

---

PyTorch: 2.10.0+cpu
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile(backend="inductor") produces different results from eager execution when the result of a matmul is explicitly cast to torch.bfloat16 before adding an fp32 bias.

The explicit to(torch.bfloat16) is a semantic precision boundary: it should round/truncate the matmul result to bf16 before the following add. However, the compiled result differs from eager on almost all elements and appears consistent with the bf16 cast being removed or not preserved correctly.

Minimal reproducible example

import torch

print(torch.__version__)

torch.manual_seed(119)

def f(x, w, b):
    y = (x @ w).to(torch.bfloat16) + b
    return y, y.mean(-1)

x = torch.randn(2, 15, 32)
w = torch.randn(32, 17)
b = torch.randn(17)

ref = f(x, w, b)
got = torch.compile(
    f,
    backend="inductor",
    fullgraph=True,
    dynamic=True,
)(x, w, b)

for name, r, g in zip(["out", "mean"], ref, got):
    diff = (r.float() - g.float()).abs()
    print(
        name,
        "max_abs_diff =", diff.max().item(),
        "mean_abs_diff =", diff.mean().item(),
        "diff_count =", int((diff != 0).sum()),
        "numel =", r.numel(),
    )
    torch.testing.assert_close(r, g)
out max_abs_diff = 0.0307159423828125 mean_abs_diff = 0.006347520276904106 diff_count = 510 numel = 510
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_38/2312913235.py in <cell line: 0>()
     30         "numel =", r.numel(),
     31     )
---> 32     torch.testing.assert_close(r, g)

/usr/local/lib/python3.11/dist-packages/torch/testing/_comparison.py in assert_close(actual, expected, allow_subclasses, rtol, atol, equal_nan, check_device, check_dtype, check_layout, check_stride, msg)
   1528     if error_metas:
   1529         # TODO: compose all metas into one AssertionError
-> 1530         raise error_metas[0].to_error(msg)
   1531 
   1532 

AssertionError: Tensor-likes are not close!

Mismatched elements: 507 / 510 (99.4%)
Greatest absolute difference: 0.0307159423828125 at index (1, 14, 12) (up to 1e-05 allowed)
Greatest relative difference: 0.24862122535705566 at index (1, 7, 5) (up to 1.3e-06 allowed)

Versions

PyTorch: 2.10.0+cpu

cc @ezyang @gchanan @kadeng @msaroufim @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

The issue can be addressed by ensuring that the torch.compile backend correctly handles the explicit cast to torch.bfloat16 before adding an fp32 bias.

Guidance

  • Verify that the torch.compile backend is correctly configured to handle mixed precision operations, specifically the cast to torch.bfloat16.
  • Check if the issue persists when using a different backend or when disabling compilation.
  • Consider adding explicit type conversions or using a different data type for the bias to mitigate the issue.
  • Review the PyTorch documentation for any known issues or limitations related to mixed precision operations with the inductor backend.

Notes

The provided example is specific to PyTorch version 2.10.0+cpu, and the issue may be resolved in later versions. Additionally, the inductor backend is still a relatively new feature, and its behavior may not be fully consistent with eager execution.

Recommendation

Apply a workaround, such as using a different data type for the bias or adding explicit type conversions, to mitigate the issue until a fixed version of PyTorch is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING