pytorch - 💡(How to fix) Fix torch.compile (inductor/aot_eager) produces nondeterministic / incorrect results on model with `add_`, `clamp_`, and `squeeze` [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178680Fetched 2026-04-08 01:45:11
View on GitHub
Comments
2
Participants
3
Timeline
31
Reactions
0
Author
Assignees
Timeline (top)
mentioned ×10subscribed ×10labeled ×6commented ×2

Error Message

Error logs

Code Example

import os
os.environ["TORCHINDUCTOR_FORCE_DISABLE_CACHES"] = "1"
import torch

def fn(x, y):
    x.add_(1)
    y.clamp_(min=0)
    return x.squeeze(1, 2).clone()

backend="inductor"

def run1():
    cfunc = torch.compile(fn, backend=backend)
    a = torch.full((1, 1, 1), -10.0)
    b = a.view(-1).clone()
    out1 = cfunc(a, b)
    print("out1: ", out1)  # expected: -9, actual: -9

def run2():
    cfunc = torch.compile(fn, backend=backend)
    a = torch.full((1, 1, 1), -10.0)
    b = a.view(-1)
    out2 = cfunc(a, b)
    print("out2: ", out2)  # expected: 0, actual: inductor gives 1, aot_eager gives -9

# if we directly execute run2(), out2's actual result is 0
# but if we execute run1() before run2(), out2's actual result becomes incorrect
run1()
run2()
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

When using torch.compile with inductor and aot_eager backends on a function including add_, clamp_, and squeeze, results are nondeterministic and depend on the execution order.

Here is the reproducible code and the actual problem:

import os
os.environ["TORCHINDUCTOR_FORCE_DISABLE_CACHES"] = "1"
import torch

def fn(x, y):
    x.add_(1)
    y.clamp_(min=0)
    return x.squeeze(1, 2).clone()

backend="inductor"

def run1():
    cfunc = torch.compile(fn, backend=backend)
    a = torch.full((1, 1, 1), -10.0)
    b = a.view(-1).clone()
    out1 = cfunc(a, b)
    print("out1: ", out1)  # expected: -9, actual: -9

def run2():
    cfunc = torch.compile(fn, backend=backend)
    a = torch.full((1, 1, 1), -10.0)
    b = a.view(-1)
    out2 = cfunc(a, b)
    print("out2: ", out2)  # expected: 0, actual: inductor gives 1, aot_eager gives -9

# if we directly execute run2(), out2's actual result is 0
# but if we execute run1() before run2(), out2's actual result becomes incorrect
run1()
run2()

Expected behavior: run1() -> -9 run2() -> 0

Actual behavior:

  • if execute run2() alone, both inductor and aot_eager output 0 (correct).
  • if first execute run1(), then execute run2(), inductor gives 1, aot_eager gives -9.

Error logs

No response

Versions

[pip3] numpy==2.4.2 [pip3] torch==2.12.0.dev20260328+cpu [pip3] torchvision==0.26.0.dev20260223+cpu [pip3] triton==3.6.0 [conda] numpy 2.4.2 pypi_0 pypi [conda] torch 2.12.0.dev20260328+cpu pypi_0 pypi [conda] torchvision 0.26.0.dev20260223+cpu pypi_0 pypi [conda] triton 3.6.0 pypi_0 pypi

cc @chauhang @penguinwu

extent analysis

Fix Plan

The issue seems to be related to the caching behavior of torch.compile with inductor and aot_eager backends. To fix this, we can try to disable caching or ensure that the compiled functions are properly cleaned up between runs.

Here are the concrete steps:

  • Disable caching by setting the TORCHINDUCTOR_FORCE_DISABLE_CACHES environment variable to 1 (already done in the provided code).
  • Ensure that the compiled functions are properly cleaned up between runs by using a with statement to limit the scope of the compiled function.

Example code:

import os
os.environ["TORCHINDUCTOR_FORCE_DISABLE_CACHES"] = "1"
import torch

def fn(x, y):
    x.add_(1)
    y.clamp_(min=0)
    return x.squeeze(1, 2).clone()

backend="inductor"

def run1():
    with torch.compile(fn, backend=backend):
        a = torch.full((1, 1, 1), -10.0)
        b = a.view(-1).clone()
        out1 = fn(a, b)
        print("out1: ", out1)  # expected: -9, actual: -9

def run2():
    with torch.compile(fn, backend=backend):
        a = torch.full((1, 1, 1), -10.0)
        b = a.view(-1)
        out2 = fn(a, b)
        print("out2: ", out2)  # expected: 0, actual: inductor gives 1, aot_eager gives -9

run1()
run2()

Verification

To verify that the fix worked, run the modified code and check that the output of run1() and run2() matches the expected behavior.

Extra Tips

  • Make sure to properly clean up compiled functions to avoid caching issues.
  • Consider using the with statement to limit the scope of compiled functions.
  • If issues persist, try updating to the latest version of PyTorch and related libraries.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING