pytorch - 💡(How to fix) Fix [Inductor] Eager and inductor produce different results for floor_divide [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#177740Fetched 2026-04-08 00:58:00
View on GitHub
Comments
1
Participants
2
Timeline
258
Reactions
0
Author
Participants
Timeline (top)
mentioned ×120subscribed ×120labeled ×7referenced ×6

Code Example

import torch

t0 = torch.tensor([84.3125], dtype=torch.float64).reshape((1,)).to(torch.float16).to("cuda:0")
t1 = torch.tensor([21.078125], dtype=torch.float64).reshape((1,)).to(torch.float16).to("cuda:0")

# Eager
eager_out = torch.floor_divide(t0, t1)

# Compiled
torch._dynamo.reset()
def fn(t0, t1):
    return torch.floor_divide(t0, t1)

compiled_fn = torch.compile(fn, fullgraph=True)
compiled_out = compiled_fn(t0, t1)

# Compare
print('eager  :', eager_out)
print('compiled:', compiled_out)
print('equal  :', torch.equal(eager_out, compiled_out))
if eager_out.is_floating_point():
    diff = (eager_out.to(torch.float64) - compiled_out.to(torch.float64)).abs()
    print('max diff:', diff.max().item())

---

$ python repro.py
eager  : tensor([4.], device='cuda:0', dtype=torch.float16)
compiled: tensor([3.], device='cuda:0', dtype=torch.float16)
equal  : False
max diff: 1.0

>>> torch.__version__
'2.12.0.dev20260317+cu128'

---

Is CUDA available: True                                                                                                                              
CUDA runtime version: 12.8.93                                                                                                                        
CUDA_MODULE_LOADING set to:                                                                                                                          
GPU models and configuration:                                                   
GPU 0: NVIDIA GB200                                                                                                                                        
GPU 1: NVIDIA GB200

---

[conda] nvidia-nccl-cu12          2.29.3                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.8.93                  pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.8.90                  pypi_0    pypi
[conda] torch                     2.12.0.dev20260317+cu128          pypi_0    pypi
[conda] triton                    3.6.0+git9844da95          pypi_0    pypi
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Repro:

import torch

t0 = torch.tensor([84.3125], dtype=torch.float64).reshape((1,)).to(torch.float16).to("cuda:0")
t1 = torch.tensor([21.078125], dtype=torch.float64).reshape((1,)).to(torch.float16).to("cuda:0")

# Eager
eager_out = torch.floor_divide(t0, t1)

# Compiled
torch._dynamo.reset()
def fn(t0, t1):
    return torch.floor_divide(t0, t1)

compiled_fn = torch.compile(fn, fullgraph=True)
compiled_out = compiled_fn(t0, t1)

# Compare
print('eager  :', eager_out)
print('compiled:', compiled_out)
print('equal  :', torch.equal(eager_out, compiled_out))
if eager_out.is_floating_point():
    diff = (eager_out.to(torch.float64) - compiled_out.to(torch.float64)).abs()
    print('max diff:', diff.max().item())

Output:

$ python repro.py
eager  : tensor([4.], device='cuda:0', dtype=torch.float16)
compiled: tensor([3.], device='cuda:0', dtype=torch.float16)
equal  : False
max diff: 1.0

>>> torch.__version__
'2.12.0.dev20260317+cu128'

Versions

Is CUDA available: True                                                                                                                              
CUDA runtime version: 12.8.93                                                                                                                        
CUDA_MODULE_LOADING set to:                                                                                                                          
GPU models and configuration:                                                   
GPU 0: NVIDIA GB200                                                                                                                                        
GPU 1: NVIDIA GB200
[conda] nvidia-nccl-cu12          2.29.3                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.8.93                  pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.8.90                  pypi_0    pypi
[conda] torch                     2.12.0.dev20260317+cu128          pypi_0    pypi
[conda] triton                    3.6.0+git9844da95          pypi_0    pypi

cc @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

Fix Plan

The fix involves ensuring consistent rounding behavior between eager and compiled modes in PyTorch.

  1. Check PyTorch Version: Ensure you are using the latest version of PyTorch, as updates may address known issues.
  2. Use Consistent Data Types: Avoid mixing data types (e.g., torch.float64 and torch.float16) when performing operations, as this can lead to inconsistencies.
  3. Specify Rounding Mode: For division operations, consider specifying the rounding mode explicitly using torch.floor_divide or other rounding functions to ensure consistency.

Example Code:

import torch

# Define tensors with consistent data type
t0 = torch.tensor([84.3125], dtype=torch.float16).to("cuda:0")
t1 = torch.tensor([21.078125], dtype=torch.float16).to("cuda:0")

# Eager mode
eager_out = torch.floor_divide(t0, t1)

# Compiled mode
torch._dynamo.reset()
def fn(t0, t1):
    return torch.floor_divide(t0, t1)

compiled_fn = torch.compile(fn, fullgraph=True)
compiled_out = compiled_fn(t0, t1)

# Compare outputs
print('eager  :', eager_out)
print('compiled:', compiled_out)
print('equal  :', torch.equal(eager_out, compiled_out))

Verification

To verify the fix, compare the outputs of eager and compiled modes. If the fix is successful, the outputs should be equal.

Extra Tips

  • Always ensure that your PyTorch and CUDA versions are up-to-date, as newer versions often include bug fixes and performance improvements.
  • Be cautious when mixing different data types in your operations, as this can lead to unexpected behavior.
  • Consider using tools like PyTorch's torch.compile with fullgraph=True for better performance in compiled mode.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING