pytorch - ✅(Solved) Fix Inductor should respect rand op order in eager function [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181136Fetched 2026-04-23 07:22:27
View on GitHub
Comments
0
Participants
1
Timeline
84
Reactions
0
Participants
Timeline (top)
mentioned ×38subscribed ×38labeled ×7unlabeled ×1

PR fix notes

PR #181245: Fix Inductor scheduler reordering random ops (#181136)

Description (problem / solution / changelog)

Fix #181136 when fallback_random = True

The Inductor scheduler reorders torch.randn calls based on data dependencies, causing them to consume global RNG state out of program order. This produces silent numerical mismatches vs eager mode.

Add _add_random_dep_chain() to chain nodes with the nondeterministic_seeded tag via WeakDep, preserving program order.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

Changed files

  • test/inductor/test_torchinductor.py (modified, +30/-0)
  • torch/_inductor/scheduler.py (modified, +42/-0)

Code Example

import torch

def fn():
    a = torch.randn(2, 4, 4, 4, device="cuda")
    b = torch.randn(2, 3, 4, 4, device="cuda")
    c = torch.randn(4, 3, 3, 3, device="cuda")
    y = torch.nn.functional.conv2d(b, c, padding=1)
    z = a + y
    return z

f_c = torch.compile(fn)

torch.manual_seed(42)
eager_out = fn()

torch.manual_seed(42)
eager_out1 = fn()

torch.manual_seed(42)
compile_out = f_c()

torch.testing.assert_close(eager_out, eager_out1)   # PASS
torch.testing.assert_close(compile_out, eager_out)    # FAIL
RAW_BUFFERClick to expand / collapse

The following code fails with compile. In eager code, rng order is a -> b -> c. In compiled code, rng order is b -> c -> a.

import torch

def fn():
    a = torch.randn(2, 4, 4, 4, device="cuda")
    b = torch.randn(2, 3, 4, 4, device="cuda")
    c = torch.randn(4, 3, 3, 3, device="cuda")
    y = torch.nn.functional.conv2d(b, c, padding=1)
    z = a + y
    return z

f_c = torch.compile(fn)

torch.manual_seed(42)
eager_out = fn()

torch.manual_seed(42)
eager_out1 = fn()

torch.manual_seed(42)
compile_out = f_c()

torch.testing.assert_close(eager_out, eager_out1)   # PASS
torch.testing.assert_close(compile_out, eager_out)    # FAIL

cc @pbelevich @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

The issue can be resolved by ensuring that the order of random number generation is consistent between eager and compiled code.

Guidance

  • Verify that the random seed is being set correctly before each function call, both in eager and compiled modes.
  • Check if the torch.compile function is modifying the random number generator state in any way.
  • Consider using a reproducible random number generator, such as torch.manual_seed with a fixed seed, to ensure consistency between eager and compiled code.
  • Test the compiled function with a simple operation that does not involve random number generation to isolate the issue.

Example

torch.manual_seed(42)
with torch.compile():
    compile_out = fn()

This example sets the random seed before compiling the function, which may help ensure consistency between eager and compiled code.

Notes

The issue seems to be related to the difference in random number generation order between eager and compiled code. However, without more information about the torch.compile function and its behavior, it is difficult to provide a more specific solution.

Recommendation

Apply workaround: Ensure that the random seed is set correctly before each function call, and consider using a reproducible random number generator to ensure consistency between eager and compiled code. This is because the issue seems to be related to the random number generation order, and setting the seed correctly can help resolve the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix Inductor should respect rand op order in eager function [1 pull requests, 1 participants]