pytorch - 💡(How to fix) Fix `torch.div(rounding_mode='trunc')` produces off-by-one results under `torch.compile` on CUDA [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#184408Fetched 2026-05-20 03:38:52
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Root Cause

In torch/_inductor/lowering.py, the floor path uses _div_rn (IEEE round-to-nearest division) but the trunc path uses plain div (Triton's approximate reciprocal-based division with ~2⁻¹⁰ precision):

if rounding_mode == "floor":
    # Use div_rn (IEEE round-to-nearest) instead of truediv here because
    # Triton's default division uses an approximate reciprocal, which can
    # produce a result slightly below the true quotient and cause floor()
    # to round down by one.
    return floordiv(a, b) if both_integer else floor(_div_rn(a, b))
if rounding_mode == "trunc":
    assert not both_boolean, "truncdiv operands can not be boolean at the same time"
    return truncdiv(a, b) if both_integer else trunc(div(a, b))  # ← should use _div_rn

The comment on the floor path explains exactly why div_rn is needed. The same reasoning applies to trunc: approximate division can push the quotient slightly past an integer boundary, causing trunc() to return the wrong integer.

Code Example

import torch

def div_trunc(a, b):
    return torch.div(a, b, rounding_mode='trunc')

# Case 1: near-integer quotient — compiled rounds PAST the boundary
a = torch.tensor([14.999999, 26.999999, 13.999999], device='cuda')
b = torch.tensor([3.0, 3.0, 7.0], device='cuda')

print("Near-integer quotient:")
print(f"  Eager:    {div_trunc(a, b).tolist()}")                         # [4.0, 8.0, 1.0]print(f"  Compiled: {torch.compile(div_trunc)(a.clone(), b.clone()).tolist()}")  # [5.0, 9.0, 2.0]  ✗ off by +1

# Case 2: EXACT integer quotient — compiled rounds BELOW the true value
a2 = torch.tensor([148.0, 1073.0, 2112.0], device='cuda')
b2 = torch.tensor([37.0, 37.0, 33.0], device='cuda')

print("Exact integer quotient (148/37=4, 1073/37=29, 2112/33=64):")
print(f"  Eager:    {div_trunc(a2, b2).tolist()}")                           # [4.0, 29.0, 64.0]print(f"  Compiled: {torch.compile(div_trunc)(a2.clone(), b2.clone()).tolist()}")  # [3.0, 28.0, 63.0]  ✗ off by -1

# The floor path does NOT have this issue (it uses _div_rn):
def div_floor(a, b):
    return torch.div(a, b, rounding_mode='floor')

print("Floor path (already fixed):")
print(f"  Eager:    {div_floor(a, b).tolist()}")
print(f"  Compiled: {torch.compile(div_floor)(a.clone(), b.clone()).tolist()}")  # matches eager ✓

---

if rounding_mode == "floor":
    # Use div_rn (IEEE round-to-nearest) instead of truediv here because
    # Triton's default division uses an approximate reciprocal, which can
    # produce a result slightly below the true quotient and cause floor()
    # to round down by one.
    return floordiv(a, b) if both_integer else floor(_div_rn(a, b))
if rounding_mode == "trunc":
    assert not both_boolean, "truncdiv operands can not be boolean at the same time"
    return truncdiv(a, b) if both_integer else trunc(div(a, b))  # ← should use _div_rn

---

if rounding_mode == "trunc":
    assert not both_boolean, "truncdiv operands can not be boolean at the same time"
    return truncdiv(a, b) if both_integer else trunc(_div_rn(a, b))

---

PyTorch: 2.13.0.dev20260513+cu126
GPU: NVIDIA Tesla T4
CUDA: 12.6
Triton: 3.3.1
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

🐛 Describe the bug

torch.div(a, b, rounding_mode='trunc') produces results that are off by 1.0 when compiled with torch.compile on CUDA. The bug affects both near-integer quotients and exact-integer quotients.

The floor division path (rounding_mode='floor') does not have this issue — it was previously fixed to use IEEE-correct division (div_rn), but the analogous fix was not applied to the trunc path.

Reproducer

import torch

def div_trunc(a, b):
    return torch.div(a, b, rounding_mode='trunc')

# Case 1: near-integer quotient — compiled rounds PAST the boundary
a = torch.tensor([14.999999, 26.999999, 13.999999], device='cuda')
b = torch.tensor([3.0, 3.0, 7.0], device='cuda')

print("Near-integer quotient:")
print(f"  Eager:    {div_trunc(a, b).tolist()}")                         # [4.0, 8.0, 1.0]  ✓
print(f"  Compiled: {torch.compile(div_trunc)(a.clone(), b.clone()).tolist()}")  # [5.0, 9.0, 2.0]  ✗ off by +1

# Case 2: EXACT integer quotient — compiled rounds BELOW the true value
a2 = torch.tensor([148.0, 1073.0, 2112.0], device='cuda')
b2 = torch.tensor([37.0, 37.0, 33.0], device='cuda')

print("Exact integer quotient (148/37=4, 1073/37=29, 2112/33=64):")
print(f"  Eager:    {div_trunc(a2, b2).tolist()}")                           # [4.0, 29.0, 64.0]  ✓
print(f"  Compiled: {torch.compile(div_trunc)(a2.clone(), b2.clone()).tolist()}")  # [3.0, 28.0, 63.0]  ✗ off by -1

# The floor path does NOT have this issue (it uses _div_rn):
def div_floor(a, b):
    return torch.div(a, b, rounding_mode='floor')

print("Floor path (already fixed):")
print(f"  Eager:    {div_floor(a, b).tolist()}")
print(f"  Compiled: {torch.compile(div_floor)(a.clone(), b.clone()).tolist()}")  # matches eager ✓

Root Cause

In torch/_inductor/lowering.py, the floor path uses _div_rn (IEEE round-to-nearest division) but the trunc path uses plain div (Triton's approximate reciprocal-based division with ~2⁻¹⁰ precision):

if rounding_mode == "floor":
    # Use div_rn (IEEE round-to-nearest) instead of truediv here because
    # Triton's default division uses an approximate reciprocal, which can
    # produce a result slightly below the true quotient and cause floor()
    # to round down by one.
    return floordiv(a, b) if both_integer else floor(_div_rn(a, b))
if rounding_mode == "trunc":
    assert not both_boolean, "truncdiv operands can not be boolean at the same time"
    return truncdiv(a, b) if both_integer else trunc(div(a, b))  # ← should use _div_rn

The comment on the floor path explains exactly why div_rn is needed. The same reasoning applies to trunc: approximate division can push the quotient slightly past an integer boundary, causing trunc() to return the wrong integer.

Suggested Fix

if rounding_mode == "trunc":
    assert not both_boolean, "truncdiv operands can not be boolean at the same time"
    return truncdiv(a, b) if both_integer else trunc(_div_rn(a, b))

Versions

Versions

PyTorch: 2.13.0.dev20260513+cu126
GPU: NVIDIA Tesla T4
CUDA: 12.6
Triton: 3.3.1

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix `torch.div(rounding_mode='trunc')` produces off-by-one results under `torch.compile` on CUDA [1 participants]