pytorch - ✅(Solved) Fix torch.compile changes fp16 overflow behavior for cast-to-float16 multiplication [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#183607Fetched 2026-05-14 03:28:08
View on GitHub
Comments
1
Participants
2
Timeline
45
Reactions
0
Participants
Timeline (top)
mentioned ×18subscribed ×18labeled ×6commented ×1

Error Message

Error logs

Fix Action

Fixed

PR fix notes

PR #183639: Preserve low precision cast barriers in Inductor

Description (problem / solution / changelog)

Stack from ghstack (oldest at bottom):

  • -> #183639

Inductor must observe explicit fp16/bf16 casts before fused pointwise results are promoted or widened, otherwise overflow behavior can differ from eager execution. This preserves those barriers and keeps lossy cast chains from being removed.

Fixes #183607 Generated by my agent

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @mlazos

Changed files

  • test/inductor/test_cpu_repro.py (modified, +28/-0)
  • test/inductor/test_pattern_matcher.py (modified, +1/-1)
  • torch/_inductor/fx_passes/joint_graph.py (modified, +1/-3)
  • torch/_inductor/lowering.py (modified, +25/-5)

Code Example

import torch

def fn(x):
    y = x.to(torch.float16)
    return (y * y).to(torch.float32)

x = torch.tensor([5000.0])

eager = fn(x).item()

torch._dynamo.reset()
compiled = torch.compile(fn, backend="inductor")(x).item()

print("eager:", eager, "compile:", compiled)
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile with the Inductor backend produces a different result from eager execution for a cast-to-float16 multiplication pattern.

In eager mode, the input is first cast to float16, then multiplied in float16, which overflows to inf. In the compiled version, the result is 25000000.0, suggesting the multiplication is effectively performed without preserving the intermediate fp16 overflow behavior.

import torch

def fn(x):
    y = x.to(torch.float16)
    return (y * y).to(torch.float32)

x = torch.tensor([5000.0])

eager = fn(x).item()

torch._dynamo.reset()
compiled = torch.compile(fn, backend="inductor")(x).item()

print("eager:", eager, "compile:", compiled)

Error logs

eager: inf compile: 25000000.0

Versions

PyTorch version: 2.11.0+cu130 Is debug build: False CUDA used to build PyTorch: 13.0 ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.2 LTS (x86_64) GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.39

Python version: 3.10.20 (main, Mar 11 2026, 17:46:40) [GCC 14.3.0] (64-bit runtime) Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39 Is CUDA available: False CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: N/A GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Laptop GPU Nvidia driver version: 545.92 cuDNN version: Could not collect Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

Versions of relevant libraries: [pip3] numpy==2.2.6 [pip3] onnx==1.21.0 [pip3] onnx2torch==1.5.15 [pip3] onnxruntime==1.23.2 [pip3] torch==2.11.0 [pip3] torchvision==0.26.0 [pip3] triton==3.6.0

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix torch.compile changes fp16 overflow behavior for cast-to-float16 multiplication [1 pull requests, 1 comments, 2 participants]