pytorch - ✅(Solved) Fix Eager and compile disagree for integer floor_divide with zero divisor [1 pull requests, 2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178013Fetched 2026-04-08 01:07:40
View on GitHub
Comments
2
Participants
1
Timeline
77
Reactions
0
Author
Participants
Assignees
Timeline (top)
mentioned ×26subscribed ×26labeled ×12unlabeled ×5

Fix Action

Fixed

PR fix notes

PR #178016: Fix eager/compiled mismatch for integer floor_divide with zero divisor

Description (problem / solution / changelog)

Integer floor_divide on CUDA produced different results between eager and compiled paths when the divisor was zero, because both paths hit undefined behavior without any guard.

Eager (c10::div_floor_integer) executed a / b directly, where NVIDIA's 64-bit division emulation returned a truncated 32-bit result (0xFFFFFFFF for int64). Compiled (Triton floordiv) executed a // b via Triton's truncdiv, which returned -1.

Add a b == 0 early-return of 0 in both paths:

  • c10::div_floor_integer: simple b == 0 check before any arithmetic.
  • Triton floordiv codegen: replace b with 1 before the division, then select 0 for the final result. The guard must precede the division because LLVM may assume divisors are non-zero (UB-based optimization) and eliminate a post-division check entirely.

Fixes https://github.com/pytorch/pytorch/issues/178013

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @mlazos

Changed files

  • c10/util/generic_math.h (modified, +4/-0)
  • test/inductor/test_torchinductor.py (modified, +28/-0)
  • torch/_inductor/codegen/triton.py (modified, +9/-1)

Code Example

import torch

t0 = torch.tensor([0.0], dtype=torch.float64).reshape((1,)).to(torch.int64).to("cuda:0")
t1 = torch.tensor([0.0], dtype=torch.float64).reshape((1,)).to(torch.int64).to("cuda:0")

# Eager
eager_out = torch.floor_divide(t0, t1)

# Compiled
torch._dynamo.reset()
def fn(t0, t1):
    return torch.floor_divide(t0, t1)

compiled_fn = torch.compile(fn, fullgraph=True)
compiled_out = compiled_fn(t0, t1)

# Compare
print('eager  :', eager_out)
print('compiled:', compiled_out)
print('equal  :', torch.equal(eager_out, compiled_out))
if eager_out.is_floating_point():
    diff = (eager_out.to(torch.float64) - compiled_out.to(torch.float64)).abs()
    print('max diff:', diff.max().item())
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

import torch

t0 = torch.tensor([0.0], dtype=torch.float64).reshape((1,)).to(torch.int64).to("cuda:0")
t1 = torch.tensor([0.0], dtype=torch.float64).reshape((1,)).to(torch.int64).to("cuda:0")

# Eager
eager_out = torch.floor_divide(t0, t1)

# Compiled
torch._dynamo.reset()
def fn(t0, t1):
    return torch.floor_divide(t0, t1)

compiled_fn = torch.compile(fn, fullgraph=True)
compiled_out = compiled_fn(t0, t1)

# Compare
print('eager  :', eager_out)
print('compiled:', compiled_out)
print('equal  :', torch.equal(eager_out, compiled_out))
if eager_out.is_floating_point():
    diff = (eager_out.to(torch.float64) - compiled_out.to(torch.float64)).abs()
    print('max diff:', diff.max().item())

~

Versions

torch trunk

cc @ezyang @gchanan @kadeng @msaroufim @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @muchulee8 @amjames @aakhundov @coconutruben @jataylo @ptrblck @eqy @jerryzh168 @tinglvv @nWEIdia

extent analysis

Fix Plan

The issue arises from the division by zero in the torch.floor_divide function. To fix this, we need to add a check to avoid division by zero.

Step-by-Step Solution

  • Check if the divisor is zero before performing the division
  • If the divisor is zero, handle the case accordingly (e.g., return a specific value or raise an exception)

Example Code

import torch

def safe_floor_divide(t0, t1):
    if t1 == 0:
        # Handle division by zero, e.g., return zero or raise an exception
        return torch.tensor(0, dtype=torch.int64).to("cuda:0")
    else:
        return torch.floor_divide(t0, t1)

t0 = torch.tensor([0.0], dtype=torch.float64).reshape((1,)).to(torch.int64).to("cuda:0")
t1 = torch.tensor([0.0], dtype=torch.float64).reshape((1,)).to(torch.int64).to("cuda:0")

# Eager
eager_out = safe_floor_divide(t0, t1)

# Compiled
torch._dynamo.reset()
def fn(t0, t1):
    return safe_floor_divide(t0, t1)

compiled_fn = torch.compile(fn, fullgraph=True)
compiled_out = compiled_fn(t0, t1)

# Compare
print('eager  :', eager_out)
print('compiled:', compiled_out)
print('equal  :', torch.equal(eager_out, compiled_out))

Verification

Run the modified code and verify that it handles the division by zero case correctly and produces the expected output.

Extra Tips

  • Always check for potential division by zero cases in your code to avoid unexpected behavior or errors.
  • Consider adding input validation and error handling to make your code more robust.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING