pytorch - 💡(How to fix) Fix torch.compile produces different fp16 results for where + full_like + cast + add pattern

pytorch2026-05-14 19:33:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

Error logs

Code Example

import sys
import torch

x = torch.arange(30, dtype=torch.float32).reshape(2, 3, 5) / 7
m = (torch.arange(30) % 4 == 0).reshape(2, 3, 5)

def f(x, m):
    z = torch.where(m, torch.full_like(x, 0.75), torch.full_like(x, -0.25))
    y = z.to(torch.float16) + x.to(torch.float16)
    return y, y.float().sum()

eager = f(x, m)
compiled = torch.compile(f, backend="inductor", fullgraph=True)(x, m)

diff = (eager[0].float() - compiled[0].float()).abs()

print("input_head:", x.reshape(-1)[:10])
print("mask_head:", m.reshape(-1)[:10])
print("eager_sum:", eager[1])
print("compiled_sum:", compiled[1])
print("max_abs_diff:", diff.max())
print("diff_count:", int((diff != 0).sum()))

sys.exit(0 if (not torch.equal(eager[0], compiled[0]) or not torch.equal(eager[1], compiled[1])) else 1)

---

input_head: tensor([0.0000, 0.1429, 0.2857, 0.4286, 0.5714, 0.7143, 0.8571, 1.0000, 1.1429,
        1.2857])
mask_head: tensor([ True, False, False, False,  True, False, False, False,  True, False])
eager_sum: tensor(62.6470)
compiled_sum: tensor(62.6429)
max_abs_diff: tensor(0.0020)
diff_count: 4

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile with the Inductor backend produces different results from eager execution for a pure PyTorch function using torch.where, torch.full_like, fp16 casts, addition, and reduction.

The elementwise fp16 output differs at several positions, and the final float sum is also different.

import sys
import torch

x = torch.arange(30, dtype=torch.float32).reshape(2, 3, 5) / 7
m = (torch.arange(30) % 4 == 0).reshape(2, 3, 5)

def f(x, m):
    z = torch.where(m, torch.full_like(x, 0.75), torch.full_like(x, -0.25))
    y = z.to(torch.float16) + x.to(torch.float16)
    return y, y.float().sum()

eager = f(x, m)
compiled = torch.compile(f, backend="inductor", fullgraph=True)(x, m)

diff = (eager[0].float() - compiled[0].float()).abs()

print("input_head:", x.reshape(-1)[:10])
print("mask_head:", m.reshape(-1)[:10])
print("eager_sum:", eager[1])
print("compiled_sum:", compiled[1])
print("max_abs_diff:", diff.max())
print("diff_count:", int((diff != 0).sum()))

sys.exit(0 if (not torch.equal(eager[0], compiled[0]) or not torch.equal(eager[1], compiled[1])) else 1)

Error logs

input_head: tensor([0.0000, 0.1429, 0.2857, 0.4286, 0.5714, 0.7143, 0.8571, 1.0000, 1.1429,
        1.2857])
mask_head: tensor([ True, False, False, False,  True, False, False, False,  True, False])
eager_sum: tensor(62.6470)
compiled_sum: tensor(62.6429)
max_abs_diff: tensor(0.0020)
diff_count: 4

Versions

PyTorch version: 2.13.0.dev20260513+cpu Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.3 LTS (x86_64) GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 Clang version: 18.1.3 (1ubuntu1) CMake version: Could not collect Libc version: glibc-2.39

Python version: 3.11.15 (main, Mar 11 2026, 17:20:07) [GCC 14.3.0] (64-bit runtime) Python platform: Linux-6.17.0-20-generic-x86_64-with-glibc2.39 Is CUDA available: False CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: N/A GPU models and configuration: GPU 0: NVIDIA RTX 6000 Ada Generation GPU 1: NVIDIA RTX 6000 Ada Generation

Nvidia driver version: 570.211.01 cuDNN version: Could not collect Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

Versions of relevant libraries: [pip3] numpy==2.4.4 [pip3] torch==2.13.0.dev20260513+cpu [conda] numpy 2.4.4 pypi_0 pypi [conda] torch 2.13.0.dev20260513+cpu pypi_0 pypi

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01 @chauhang @penguinwu @voznesenskym @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix torch.compile produces different fp16 results for where + full_like + cast + add pattern

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error logs

Code Example

🐛 Describe the bug

Error logs

Versions

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix torch.compile produces different fp16 results for where + full_like + cast + add pattern

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error logs

Code Example

🐛 Describe the bug

Error logs

Versions

Still need to ship something?

RELATED_DISCOVERY

TRENDING