pytorch - ๐Ÿ’ก(How to fix) Fix AOTAutograd fails on distributed op

Official PRs (โ€ฆ)
ON THIS PAGE

Recommended Tools

ร—6

Utilities matched from this issueโ€™s tags and category โ€” try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful ยท Quick feedback

Loadingโ€ฆ

Error Message

[2026-05-12 13:00:49] devvm006:2374976:2374976 [0] misc/ibvwrap.cc:173 NCCL WARN lib wrapper not initialized. [rank0]: Traceback (most recent call last): [rank0]: File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 36, in <module> [rank0]: f(input_t, out_t) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1134, in compile_wrapper [rank0]: raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3075, in _call_user_compiler [rank0]: raise BackendCompilerFailed( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3049, in _call_user_compiler [rank0]: compiled_fn = compiler_fn(gm, example_inputs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 159, in call [rank0]: compiled_gm = compiler_fn(gm, example_inputs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/init.py", line 2482, in call [rank0]: return compile_fx( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2764, in compile_fx [rank0]: return _maybe_wrap_and_compile_fx_main( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2845, in _maybe_wrap_and_compile_fx_main [rank0]: return _compile_fx_main( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 3043, in _compile_fx_main [rank0]: return dynamo_common.aot_autograd( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 123, in call [rank0]: cg = aot_module_simplified(gm, example_inputs, **self.kwargs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1223, in aot_module_simplified [rank0]: aot_state = create_aot_state( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 582, in create_aot_state [rank0]: fw_metadata = run_functionalized_fw_and_collect_metadata( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/collect_metadata_analysis.py", line 220, in inner [rank0]: flat_f_outs = f(*flat_f_args) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1534, in functional_call [rank0]: out = PropagateUnbackedSymInts(mod).run(*args) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 197, in run [rank0]: self.env[node] = self.run_node(node) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 8700, in run_node [rank0]: result = super().run_node(n) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 294, in run_node [rank0]: return getattr(self, n.op)(n.target, args, kwargs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 377, in call_function [rank0]: return target(*args, **kwargs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_ops.py", line 871, in call [rank0]: return self._op(*args, **kwargs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_compile.py", line 54, in inner [rank0]: return disable_fn(*args, **kwargs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1382, in _fn [rank0]: return fn(*args, **kwargs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py", line 611, in torch_dispatch [rank0]: outs_unwrapped = func._op_dk( [rank0]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: [rank0]: RuntimeError: Found a custom (non-ATen) operator whose output has alias annotations: _c10d_functional::all_gather_into_tensor_out(Tensor input, int group_size, Any group_name, *, Tensor(a!) out) -> Tensor(a!). We only support functionalizing operators whose outputs do not have alias annotations (e.g. 'Tensor(a)' is a Tensor with an alias annotation whereas 'Tensor' is a Tensor without. The '(a)' is the alias annotation). The alias annotation specifies that the output Tensor shares storage with an input that has the same annotation. Please check if (1) the output needs to be an output (if not, don't return it), (2) if the output doesn't share storage with any inputs, then delete the alias annotation. (3) if the output indeed shares storage with an input, then add a .clone() before returning it to prevent storage sharing and then delete the alias annotation. Otherwise, please file an issue on GitHub.

[rank0]: While executing %y : [num_users=1] = call_function[target=torch.ops.c10d_functional.all_gather_into_tensor_out.default](args = (%l_input, 1, 0), kwargs = {out: %l_out_}) [rank0]: Original traceback: [rank0]: File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 26, in f [rank0]: y = torch.ops._c10d_functional.all_gather_into_tensor_out.default(

[rank0]: Use tlparse to see full graph. (https://github.com/pytorch/tlparse?tab=readme-ov-file#tlparse-parse-structured-pt2-logs)

[rank0]: Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

[rank0]:[W512 13:00:56.128309332 ProcessGroupNCCL.cpp:1648] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Fix Action

Fix / Workaround

f(input_t, out_t)

Fails with: 
```python
[2026-05-12 13:00:49] devvm006:2374976:2374976 [0] misc/ibvwrap.cc:173 NCCL WARN lib wrapper not initialized.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 36, in <module>
[rank0]:     f(input_t, out_t)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1134, in compile_wrapper
[rank0]:     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3075, in _call_user_compiler
[rank0]:     raise BackendCompilerFailed(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3049, in _call_user_compiler
[rank0]:     compiled_fn = compiler_fn(gm, example_inputs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 159, in __call__
[rank0]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/__init__.py", line 2482, in __call__
[rank0]:     return compile_fx(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2764, in compile_fx
[rank0]:     return _maybe_wrap_and_compile_fx_main(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2845, in _maybe_wrap_and_compile_fx_main
[rank0]:     return _compile_fx_main(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 3043, in _compile_fx_main
[rank0]:     return dynamo_common.aot_autograd(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 123, in __call__
[rank0]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1223, in aot_module_simplified
[rank0]:     aot_state = create_aot_state(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 582, in create_aot_state
[rank0]:     fw_metadata = run_functionalized_fw_and_collect_metadata(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/collect_metadata_analysis.py", line 220, in inner
[rank0]:     flat_f_outs = f(*flat_f_args)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1534, in functional_call
[rank0]:     out = PropagateUnbackedSymInts(mod).run(*args)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 197, in run
[rank0]:     self.env[node] = self.run_node(node)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 8700, in run_node
[rank0]:     result = super().run_node(n)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 294, in run_node
[rank0]:     return getattr(self, n.op)(n.target, args, kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 377, in call_function
[rank0]:     return target(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_ops.py", line 871, in __call__
[rank0]:     return self._op(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_compile.py", line 54, in inner
[rank0]:     return disable_fn(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1382, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py", line 611, in __torch_dispatch__
[rank0]:     outs_unwrapped = func._op_dk(
[rank0]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank0]: RuntimeError: Found a custom (non-ATen) operator whose output has alias annotations: _c10d_functional::all_gather_into_tensor_out(Tensor input, int group_size, Any group_name, *, Tensor(a!) out) -> Tensor(a!). We only support functionalizing operators whose outputs do not have alias annotations (e.g. 'Tensor(a)' is a Tensor with an alias annotation whereas 'Tensor' is a Tensor without. The '(a)' is the alias annotation). The alias annotation specifies that the output Tensor shares storage with an input that has the same annotation. Please check if (1) the output needs to be an output (if not, don't return it), (2) if the output doesn't share storage with any inputs, then delete the alias annotation. (3) if the output indeed shares storage with an input, then add a .clone() before returning it to prevent storage sharing and then delete the alias annotation. Otherwise, please file an issue on GitHub.

Code Example

import os

import torch
import torch.distributed as dist
from torch.fx.experimental.proxy_tensor import make_fx


os.environ.setdefault("MASTER_ADDR", "127.0.0.1")
os.environ.setdefault("MASTER_PORT", "29500")
os.environ.setdefault("NCCL_SOCKET_IFNAME", "lo")

torch.cuda.set_device("cuda:0")
dist.init_process_group(
    "cpu:gloo,cuda:nccl",
    rank=0,
    world_size=1,
    device_id=torch.device("cuda:0"),
)

GROUP_NAME = dist.distributed_c10d._get_default_group().group_name
GROUP_SIZE = 1


@torch.compile 
def f(input, out):
    y = torch.ops._c10d_functional.all_gather_into_tensor_out.default(
        input, GROUP_SIZE, GROUP_NAME, out=out
    )
    y = torch.ops._c10d_functional.wait_tensor.default(y)
    return [y + 1]


input_t = torch.ones(4, device="cuda")
out_t = torch.empty(4 * GROUP_SIZE, device="cuda")

f(input_t, out_t)

---

[2026-05-12 13:00:49] devvm006:2374976:2374976 [0] misc/ibvwrap.cc:173 NCCL WARN lib wrapper not initialized.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 36, in <module>
[rank0]:     f(input_t, out_t)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1134, in compile_wrapper
[rank0]:     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3075, in _call_user_compiler
[rank0]:     raise BackendCompilerFailed(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3049, in _call_user_compiler
[rank0]:     compiled_fn = compiler_fn(gm, example_inputs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 159, in __call__
[rank0]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/__init__.py", line 2482, in __call__
[rank0]:     return compile_fx(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2764, in compile_fx
[rank0]:     return _maybe_wrap_and_compile_fx_main(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2845, in _maybe_wrap_and_compile_fx_main
[rank0]:     return _compile_fx_main(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 3043, in _compile_fx_main
[rank0]:     return dynamo_common.aot_autograd(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 123, in __call__
[rank0]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1223, in aot_module_simplified
[rank0]:     aot_state = create_aot_state(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 582, in create_aot_state
[rank0]:     fw_metadata = run_functionalized_fw_and_collect_metadata(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/collect_metadata_analysis.py", line 220, in inner
[rank0]:     flat_f_outs = f(*flat_f_args)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1534, in functional_call
[rank0]:     out = PropagateUnbackedSymInts(mod).run(*args)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 197, in run
[rank0]:     self.env[node] = self.run_node(node)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 8700, in run_node
[rank0]:     result = super().run_node(n)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 294, in run_node
[rank0]:     return getattr(self, n.op)(n.target, args, kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 377, in call_function
[rank0]:     return target(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_ops.py", line 871, in __call__
[rank0]:     return self._op(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_compile.py", line 54, in inner
[rank0]:     return disable_fn(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1382, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py", line 611, in __torch_dispatch__
[rank0]:     outs_unwrapped = func._op_dk(
[rank0]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank0]: RuntimeError: Found a custom (non-ATen) operator whose output has alias annotations: _c10d_functional::all_gather_into_tensor_out(Tensor input, int group_size, Any group_name, *, Tensor(a!) out) -> Tensor(a!). We only support functionalizing operators whose outputs do not have alias annotations (e.g. 'Tensor(a)' is a Tensor with an alias annotation whereas 'Tensor' is a Tensor without. The '(a)' is the alias annotation). The alias annotation specifies that the output Tensor shares storage with an input that has the same annotation. Please check if (1) the output needs to be an output (if not, don't return it), (2) if the output doesn't share storage with any inputs, then delete the alias annotation. (3) if the output indeed shares storage with an input, then add a .clone() before returning it to prevent storage sharing and then delete the alias annotation. Otherwise, please file an issue on GitHub.

[rank0]: While executing %y : [num_users=1] = call_function[target=torch.ops._c10d_functional.all_gather_into_tensor_out.default](args = (%l_input_, 1, 0), kwargs = {out: %l_out_})
[rank0]: Original traceback:
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 26, in f
[rank0]:     y = torch.ops._c10d_functional.all_gather_into_tensor_out.default(

[rank0]: Use tlparse to see full graph. (https://github.com/pytorch/tlparse?tab=readme-ov-file#tlparse-parse-structured-pt2-logs)

[rank0]: Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

[rank0]:[W512 13:00:56.128309332 ProcessGroupNCCL.cpp:1648] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
RAW_BUFFERClick to expand / collapse

๐Ÿ› Describe the bug

import os

import torch
import torch.distributed as dist
from torch.fx.experimental.proxy_tensor import make_fx


os.environ.setdefault("MASTER_ADDR", "127.0.0.1")
os.environ.setdefault("MASTER_PORT", "29500")
os.environ.setdefault("NCCL_SOCKET_IFNAME", "lo")

torch.cuda.set_device("cuda:0")
dist.init_process_group(
    "cpu:gloo,cuda:nccl",
    rank=0,
    world_size=1,
    device_id=torch.device("cuda:0"),
)

GROUP_NAME = dist.distributed_c10d._get_default_group().group_name
GROUP_SIZE = 1


@torch.compile 
def f(input, out):
    y = torch.ops._c10d_functional.all_gather_into_tensor_out.default(
        input, GROUP_SIZE, GROUP_NAME, out=out
    )
    y = torch.ops._c10d_functional.wait_tensor.default(y)
    return [y + 1]


input_t = torch.ones(4, device="cuda")
out_t = torch.empty(4 * GROUP_SIZE, device="cuda")

f(input_t, out_t)

Fails with:

[2026-05-12 13:00:49] devvm006:2374976:2374976 [0] misc/ibvwrap.cc:173 NCCL WARN lib wrapper not initialized.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 36, in <module>
[rank0]:     f(input_t, out_t)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1134, in compile_wrapper
[rank0]:     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3075, in _call_user_compiler
[rank0]:     raise BackendCompilerFailed(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3049, in _call_user_compiler
[rank0]:     compiled_fn = compiler_fn(gm, example_inputs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 159, in __call__
[rank0]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/__init__.py", line 2482, in __call__
[rank0]:     return compile_fx(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2764, in compile_fx
[rank0]:     return _maybe_wrap_and_compile_fx_main(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2845, in _maybe_wrap_and_compile_fx_main
[rank0]:     return _compile_fx_main(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 3043, in _compile_fx_main
[rank0]:     return dynamo_common.aot_autograd(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 123, in __call__
[rank0]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1223, in aot_module_simplified
[rank0]:     aot_state = create_aot_state(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 582, in create_aot_state
[rank0]:     fw_metadata = run_functionalized_fw_and_collect_metadata(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/collect_metadata_analysis.py", line 220, in inner
[rank0]:     flat_f_outs = f(*flat_f_args)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1534, in functional_call
[rank0]:     out = PropagateUnbackedSymInts(mod).run(*args)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 197, in run
[rank0]:     self.env[node] = self.run_node(node)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 8700, in run_node
[rank0]:     result = super().run_node(n)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 294, in run_node
[rank0]:     return getattr(self, n.op)(n.target, args, kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 377, in call_function
[rank0]:     return target(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_ops.py", line 871, in __call__
[rank0]:     return self._op(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_compile.py", line 54, in inner
[rank0]:     return disable_fn(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1382, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py", line 611, in __torch_dispatch__
[rank0]:     outs_unwrapped = func._op_dk(
[rank0]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank0]: RuntimeError: Found a custom (non-ATen) operator whose output has alias annotations: _c10d_functional::all_gather_into_tensor_out(Tensor input, int group_size, Any group_name, *, Tensor(a!) out) -> Tensor(a!). We only support functionalizing operators whose outputs do not have alias annotations (e.g. 'Tensor(a)' is a Tensor with an alias annotation whereas 'Tensor' is a Tensor without. The '(a)' is the alias annotation). The alias annotation specifies that the output Tensor shares storage with an input that has the same annotation. Please check if (1) the output needs to be an output (if not, don't return it), (2) if the output doesn't share storage with any inputs, then delete the alias annotation. (3) if the output indeed shares storage with an input, then add a .clone() before returning it to prevent storage sharing and then delete the alias annotation. Otherwise, please file an issue on GitHub.

[rank0]: While executing %y : [num_users=1] = call_function[target=torch.ops._c10d_functional.all_gather_into_tensor_out.default](args = (%l_input_, 1, 0), kwargs = {out: %l_out_})
[rank0]: Original traceback:
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 26, in f
[rank0]:     y = torch.ops._c10d_functional.all_gather_into_tensor_out.default(

[rank0]: Use tlparse to see full graph. (https://github.com/pytorch/tlparse?tab=readme-ov-file#tlparse-parse-structured-pt2-logs)

[rank0]: Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

[rank0]:[W512 13:00:56.128309332 ProcessGroupNCCL.cpp:1648] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Versions

main

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @weifengpy @bdhirsh @ezyang @chauhang @penguinwu @bobrenjc93 @aorenste

Vote matrix ยท Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loadingโ€ฆ

Still need to ship something?

ร—6

Another batch ranked right after the header list โ€” different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ๐Ÿ’ก(How to fix) Fix AOTAutograd fails on distributed op