pytorch - 💡(How to fix) Fix `make_graphed_callables` silently drops in-place mutations on user-provided input tensors [5 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178586Fetched 2026-04-08 01:40:40
View on GitHub
Comments
5
Participants
4
Timeline
87
Reactions
0
Timeline (top)
mentioned ×37subscribed ×37labeled ×7commented ×5

Error Message

torch.cuda.make_graphed_callables silently produces incorrect results when the captured callable mutates its input tensors in-place. The mutation occurs on an internal static copy but is never propagated back to the caller's tensor. No error or warning is raised.

  • Raise an error / emit a warning

Code Example

import torch

def fn(x):
    x.add_(1)  # in-place mutation
    return x * 2

sample = torch.randn(4, device="cuda")
graphed_fn = torch.cuda.make_graphed_callables(fn, (sample,))

x = torch.randn(4, device="cuda")
original = x.clone()
out = graphed_fn(x)

print(f"x before: {original}")
print(f"x after:  {x}")
print(f"x changed: {not torch.equal(x, original)}")
# Expected: x changed: True  (add_(1) should have mutated x)
# Actual:   x changed: False (mutation was silently lost)
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.cuda.make_graphed_callables silently produces incorrect results when the captured callable mutates its input tensors in-place. The mutation occurs on an internal static copy but is never propagated back to the caller's tensor. No error or warning is raised.

This contrasts with torch.compile(mode="reduce-overhead"), which detects input mutations and safely falls back to eager execution with a logged warning.

Reproduction

import torch

def fn(x):
    x.add_(1)  # in-place mutation
    return x * 2

sample = torch.randn(4, device="cuda")
graphed_fn = torch.cuda.make_graphed_callables(fn, (sample,))

x = torch.randn(4, device="cuda")
original = x.clone()
out = graphed_fn(x)

print(f"x before: {original}")
print(f"x after:  {x}")
print(f"x changed: {not torch.equal(x, original)}")
# Expected: x changed: True  (add_(1) should have mutated x)
# Actual:   x changed: False (mutation was silently lost)

Why this happens

User args are copied into a static buffer before replay, but the mutated static buffer is never copied back. The caller's tensor remains unchanged.

Note on module buffers/parameters: Module parameters and buffers are not affected — they are part of static_input_surface directly (same memory address), so their mutations persist correctly

Suggested fix: At minimum, add a .. warning:: to the docstring documenting that in-place mutations on user-provided input tensors are not propagated back. Ideally, detect the mutation (e.g., by comparing the static buffer before/after replay) and either:

  • Copy mutated values back to the original tensor, or
  • Raise an error / emit a warning

Versions

main

cc @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia @mcarilli @ezyang @eellison @penguinwu @BoyuanFeng

extent analysis

Fix Plan

To address the issue with torch.cuda.make_graphed_callables silently producing incorrect results when the captured callable mutates its input tensors in-place, we can implement a fix that detects the mutation and either copies the mutated values back to the original tensor or raises an error.

Step-by-Step Solution

  1. Detect Mutation: Compare the static buffer before and after replay to detect any mutations.
  2. Copy Mutated Values: If a mutation is detected, copy the mutated values back to the original tensor.
  3. Raise Error or Warning: Alternatively, raise an error or emit a warning to notify the user of the mutation.

Example Code

import torch

def fn(x):
    x.add_(1)  # in-place mutation
    return x * 2

def make_graphed_callables_with_mutation_detection(fn, args):
    # Create a static buffer to store the input tensors
    static_buffer = [arg.clone() for arg in args]
    
    # Replay the function with the static buffer
    output = fn(*static_buffer)
    
    # Detect mutation by comparing the static buffer before and after replay
    for i, arg in enumerate(args):
        if not torch.equal(arg, static_buffer[i]):
            # Copy mutated values back to the original tensor
            args[i].copy_(static_buffer[i])
            print("Warning: In-place mutation detected. Mutated values copied back to original tensor.")
    
    return output

sample = torch.randn(4, device="cuda")
x = torch.randn(4, device="cuda")
original = x.clone()
out = make_graphed_callables_with_mutation_detection(fn, (x,))

print(f"x before: {original}")
print(f"x after:  {x}")
print(f"x changed: {not torch.equal(x, original)}")

Verification

To verify that the fix worked, run the example code and check that the output tensor x has been mutated correctly. The x changed flag should be True, indicating that the in-place mutation was successfully detected and propagated back to the original tensor.

Extra Tips

  • When using torch.cuda.make_graphed_callables, be aware of the potential issue with in-place mutations on user-provided input tensors.
  • Consider using torch.compile(mode="reduce-overhead") as an alternative, which detects input mutations and safely falls back to eager execution with a logged warning.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix `make_graphed_callables` silently drops in-place mutations on user-provided input tensors [5 comments, 4 participants]