pytorch - ✅(Solved) Fix torch.compile changes fp16 overflow behavior for cast-to-float16 multiplication [1 pull requests, 1 comments, 2 participants]

pytorch2026-05-13 19:20:45

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#183607•Fetched 2026-05-14 03:28:08

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ALinrunrun

Participants

ALinrunrun

jansel

Timeline (top)

mentioned ×18subscribed ×18labeled ×6commented ×1

Error Message

Error logs

Fix Action

Fixed

Fixed by PR: Preserve low precision cast barriers in Inductor (https://github.com/pytorch/pytorch/pull/183639)

PR fix notes

PR #183639: Preserve low precision cast barriers in Inductor

Repository: pytorch/pytorch
Author: jansel
State: open | merged: False
Link: https://github.com/pytorch/pytorch/pull/183639

Description (problem / solution / changelog)

Stack from ghstack (oldest at bottom):

-> #183639

Inductor must observe explicit fp16/bf16 casts before fused pointwise results are promoted or widened, otherwise overflow behavior can differ from eager execution. This preserves those barriers and keeps lossy cast chains from being removed.

Fixes #183607 Generated by my agent

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @mlazos

Changed files

test/inductor/test_cpu_repro.py (modified, +28/-0)
test/inductor/test_pattern_matcher.py (modified, +1/-1)
torch/_inductor/fx_passes/joint_graph.py (modified, +1/-3)
torch/_inductor/lowering.py (modified, +25/-5)

Code Example

import torch

def fn(x):
    y = x.to(torch.float16)
    return (y * y).to(torch.float32)

x = torch.tensor([5000.0])

eager = fn(x).item()

torch._dynamo.reset()
compiled = torch.compile(fn, backend="inductor")(x).item()

print("eager:", eager, "compile:", compiled)

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile with the Inductor backend produces a different result from eager execution for a cast-to-float16 multiplication pattern.

In eager mode, the input is first cast to float16, then multiplied in float16, which overflows to inf. In the compiled version, the result is 25000000.0, suggesting the multiplication is effectively performed without preserving the intermediate fp16 overflow behavior.

import torch

def fn(x):
    y = x.to(torch.float16)
    return (y * y).to(torch.float32)

x = torch.tensor([5000.0])

eager = fn(x).item()

torch._dynamo.reset()
compiled = torch.compile(fn, backend="inductor")(x).item()

print("eager:", eager, "compile:", compiled)

Error logs

eager: inf compile: 25000000.0

Versions

PyTorch version: 2.11.0+cu130 Is debug build: False CUDA used to build PyTorch: 13.0 ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.2 LTS (x86_64) GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.39

Python version: 3.10.20 (main, Mar 11 2026, 17:46:40) [GCC 14.3.0] (64-bit runtime) Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39 Is CUDA available: False CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: N/A GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Laptop GPU Nvidia driver version: 545.92 cuDNN version: Could not collect Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

Versions of relevant libraries: [pip3] numpy==2.2.6 [pip3] onnx==1.21.0 [pip3] onnx2torch==1.5.15 [pip3] onnxruntime==1.23.2 [pip3] torch==2.11.0 [pip3] torchvision==0.26.0 [pip3] triton==3.6.0

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#memory management #API rate limit #retriever error #indexing error #inference speed

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix torch.compile changes fp16 overflow behavior for cast-to-float16 multiplication [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error logs

Fix Action

Fixed

PR fix notes

PR #183639: Preserve low precision cast barriers in Inductor

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Error logs

Versions

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix torch.compile changes fp16 overflow behavior for cast-to-float16 multiplication [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error logs

Fix Action

Fixed

PR fix notes

PR #183639: Preserve low precision cast barriers in Inductor

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Error logs

Versions

Still need to ship something?

RELATED_DISCOVERY

TRENDING