pytorch - 💡(How to fix) Fix AOTAutograd fails on distributed op

pytorch2026-05-12 20:02:28

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

[2026-05-12 13:00:49] devvm006:2374976:2374976 [0] misc/ibvwrap.cc:173 NCCL WARN lib wrapper not initialized. [rank0]: Traceback (most recent call last): [rank0]: File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 36, in <module> [rank0]: f(input_t, out_t) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1134, in compile_wrapper [rank0]: raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3075, in _call_user_compiler [rank0]: raise BackendCompilerFailed( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3049, in _call_user_compiler [rank0]: compiled_fn = compiler_fn(gm, example_inputs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 159, in call [rank0]: compiled_gm = compiler_fn(gm, example_inputs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/init.py", line 2482, in call [rank0]: return compile_fx( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2764, in compile_fx [rank0]: return _maybe_wrap_and_compile_fx_main( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2845, in _maybe_wrap_and_compile_fx_main [rank0]: return _compile_fx_main( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 3043, in _compile_fx_main [rank0]: return dynamo_common.aot_autograd( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 123, in call [rank0]: cg = aot_module_simplified(gm, example_inputs, **self.kwargs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1223, in aot_module_simplified [rank0]: aot_state = create_aot_state( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 582, in create_aot_state [rank0]: fw_metadata = run_functionalized_fw_and_collect_metadata( [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/collect_metadata_analysis.py", line 220, in inner [rank0]: flat_f_outs = f(*flat_f_args) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1534, in functional_call [rank0]: out = PropagateUnbackedSymInts(mod).run(*args) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 197, in run [rank0]: self.env[node] = self.run_node(node) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 8700, in run_node [rank0]: result = super().run_node(n) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 294, in run_node [rank0]: return getattr(self, n.op)(n.target, args, kwargs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 377, in call_function [rank0]: return target(*args, **kwargs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_ops.py", line 871, in call [rank0]: return self._op(*args, **kwargs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_compile.py", line 54, in inner [rank0]: return disable_fn(*args, **kwargs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1382, in _fn [rank0]: return fn(*args, **kwargs) [rank0]: File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py", line 611, in torch_dispatch [rank0]: outs_unwrapped = func._op_dk( [rank0]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: [rank0]: RuntimeError: Found a custom (non-ATen) operator whose output has alias annotations: _c10d_functional::all_gather_into_tensor_out(Tensor input, int group_size, Any group_name, *, Tensor(a!) out) -> Tensor(a!). We only support functionalizing operators whose outputs do not have alias annotations (e.g. 'Tensor(a)' is a Tensor with an alias annotation whereas 'Tensor' is a Tensor without. The '(a)' is the alias annotation). The alias annotation specifies that the output Tensor shares storage with an input that has the same annotation. Please check if (1) the output needs to be an output (if not, don't return it), (2) if the output doesn't share storage with any inputs, then delete the alias annotation. (3) if the output indeed shares storage with an input, then add a .clone() before returning it to prevent storage sharing and then delete the alias annotation. Otherwise, please file an issue on GitHub.

[rank0]: While executing %y : [num_users=1] = call_function[target=torch.ops.c10d_functional.all_gather_into_tensor_out.default](args = (%l_input, 1, 0), kwargs = {out: %l_out_}) [rank0]: Original traceback: [rank0]: File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 26, in f [rank0]: y = torch.ops._c10d_functional.all_gather_into_tensor_out.default(

[rank0]: Use tlparse to see full graph. (https://github.com/pytorch/tlparse?tab=readme-ov-file#tlparse-parse-structured-pt2-logs)

[rank0]: Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

[rank0]:[W512 13:00:56.128309332 ProcessGroupNCCL.cpp:1648] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Fix Action

Fix / Workaround

f(input_t, out_t)

Fails with: 
```python
[2026-05-12 13:00:49] devvm006:2374976:2374976 [0] misc/ibvwrap.cc:173 NCCL WARN lib wrapper not initialized.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 36, in <module>
[rank0]:     f(input_t, out_t)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1134, in compile_wrapper
[rank0]:     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3075, in _call_user_compiler
[rank0]:     raise BackendCompilerFailed(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3049, in _call_user_compiler
[rank0]:     compiled_fn = compiler_fn(gm, example_inputs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 159, in __call__
[rank0]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/__init__.py", line 2482, in __call__
[rank0]:     return compile_fx(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2764, in compile_fx
[rank0]:     return _maybe_wrap_and_compile_fx_main(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2845, in _maybe_wrap_and_compile_fx_main
[rank0]:     return _compile_fx_main(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 3043, in _compile_fx_main
[rank0]:     return dynamo_common.aot_autograd(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 123, in __call__
[rank0]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1223, in aot_module_simplified
[rank0]:     aot_state = create_aot_state(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 582, in create_aot_state
[rank0]:     fw_metadata = run_functionalized_fw_and_collect_metadata(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/collect_metadata_analysis.py", line 220, in inner
[rank0]:     flat_f_outs = f(*flat_f_args)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1534, in functional_call
[rank0]:     out = PropagateUnbackedSymInts(mod).run(*args)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 197, in run
[rank0]:     self.env[node] = self.run_node(node)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 8700, in run_node
[rank0]:     result = super().run_node(n)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 294, in run_node
[rank0]:     return getattr(self, n.op)(n.target, args, kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 377, in call_function
[rank0]:     return target(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_ops.py", line 871, in __call__
[rank0]:     return self._op(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_compile.py", line 54, in inner
[rank0]:     return disable_fn(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1382, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py", line 611, in __torch_dispatch__
[rank0]:     outs_unwrapped = func._op_dk(
[rank0]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank0]: RuntimeError: Found a custom (non-ATen) operator whose output has alias annotations: _c10d_functional::all_gather_into_tensor_out(Tensor input, int group_size, Any group_name, *, Tensor(a!) out) -> Tensor(a!). We only support functionalizing operators whose outputs do not have alias annotations (e.g. 'Tensor(a)' is a Tensor with an alias annotation whereas 'Tensor' is a Tensor without. The '(a)' is the alias annotation). The alias annotation specifies that the output Tensor shares storage with an input that has the same annotation. Please check if (1) the output needs to be an output (if not, don't return it), (2) if the output doesn't share storage with any inputs, then delete the alias annotation. (3) if the output indeed shares storage with an input, then add a .clone() before returning it to prevent storage sharing and then delete the alias annotation. Otherwise, please file an issue on GitHub.

Code Example

import os

import torch
import torch.distributed as dist
from torch.fx.experimental.proxy_tensor import make_fx


os.environ.setdefault("MASTER_ADDR", "127.0.0.1")
os.environ.setdefault("MASTER_PORT", "29500")
os.environ.setdefault("NCCL_SOCKET_IFNAME", "lo")

torch.cuda.set_device("cuda:0")
dist.init_process_group(
    "cpu:gloo,cuda:nccl",
    rank=0,
    world_size=1,
    device_id=torch.device("cuda:0"),
)

GROUP_NAME = dist.distributed_c10d._get_default_group().group_name
GROUP_SIZE = 1


@torch.compile 
def f(input, out):
    y = torch.ops._c10d_functional.all_gather_into_tensor_out.default(
        input, GROUP_SIZE, GROUP_NAME, out=out
    )
    y = torch.ops._c10d_functional.wait_tensor.default(y)
    return [y + 1]


input_t = torch.ones(4, device="cuda")
out_t = torch.empty(4 * GROUP_SIZE, device="cuda")

f(input_t, out_t)

---

[2026-05-12 13:00:49] devvm006:2374976:2374976 [0] misc/ibvwrap.cc:173 NCCL WARN lib wrapper not initialized.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 36, in <module>
[rank0]:     f(input_t, out_t)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1134, in compile_wrapper
[rank0]:     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3075, in _call_user_compiler
[rank0]:     raise BackendCompilerFailed(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3049, in _call_user_compiler
[rank0]:     compiled_fn = compiler_fn(gm, example_inputs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 159, in __call__
[rank0]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/__init__.py", line 2482, in __call__
[rank0]:     return compile_fx(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2764, in compile_fx
[rank0]:     return _maybe_wrap_and_compile_fx_main(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2845, in _maybe_wrap_and_compile_fx_main
[rank0]:     return _compile_fx_main(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 3043, in _compile_fx_main
[rank0]:     return dynamo_common.aot_autograd(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 123, in __call__
[rank0]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1223, in aot_module_simplified
[rank0]:     aot_state = create_aot_state(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 582, in create_aot_state
[rank0]:     fw_metadata = run_functionalized_fw_and_collect_metadata(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/collect_metadata_analysis.py", line 220, in inner
[rank0]:     flat_f_outs = f(*flat_f_args)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1534, in functional_call
[rank0]:     out = PropagateUnbackedSymInts(mod).run(*args)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 197, in run
[rank0]:     self.env[node] = self.run_node(node)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 8700, in run_node
[rank0]:     result = super().run_node(n)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 294, in run_node
[rank0]:     return getattr(self, n.op)(n.target, args, kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 377, in call_function
[rank0]:     return target(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_ops.py", line 871, in __call__
[rank0]:     return self._op(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_compile.py", line 54, in inner
[rank0]:     return disable_fn(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1382, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py", line 611, in __torch_dispatch__
[rank0]:     outs_unwrapped = func._op_dk(
[rank0]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank0]: RuntimeError: Found a custom (non-ATen) operator whose output has alias annotations: _c10d_functional::all_gather_into_tensor_out(Tensor input, int group_size, Any group_name, *, Tensor(a!) out) -> Tensor(a!). We only support functionalizing operators whose outputs do not have alias annotations (e.g. 'Tensor(a)' is a Tensor with an alias annotation whereas 'Tensor' is a Tensor without. The '(a)' is the alias annotation). The alias annotation specifies that the output Tensor shares storage with an input that has the same annotation. Please check if (1) the output needs to be an output (if not, don't return it), (2) if the output doesn't share storage with any inputs, then delete the alias annotation. (3) if the output indeed shares storage with an input, then add a .clone() before returning it to prevent storage sharing and then delete the alias annotation. Otherwise, please file an issue on GitHub.

[rank0]: While executing %y : [num_users=1] = call_function[target=torch.ops._c10d_functional.all_gather_into_tensor_out.default](args = (%l_input_, 1, 0), kwargs = {out: %l_out_})
[rank0]: Original traceback:
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 26, in f
[rank0]:     y = torch.ops._c10d_functional.all_gather_into_tensor_out.default(

[rank0]: Use tlparse to see full graph. (https://github.com/pytorch/tlparse?tab=readme-ov-file#tlparse-parse-structured-pt2-logs)

[rank0]: Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

[rank0]:[W512 13:00:56.128309332 ProcessGroupNCCL.cpp:1648] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

import os

import torch
import torch.distributed as dist
from torch.fx.experimental.proxy_tensor import make_fx


os.environ.setdefault("MASTER_ADDR", "127.0.0.1")
os.environ.setdefault("MASTER_PORT", "29500")
os.environ.setdefault("NCCL_SOCKET_IFNAME", "lo")

torch.cuda.set_device("cuda:0")
dist.init_process_group(
    "cpu:gloo,cuda:nccl",
    rank=0,
    world_size=1,
    device_id=torch.device("cuda:0"),
)

GROUP_NAME = dist.distributed_c10d._get_default_group().group_name
GROUP_SIZE = 1


@torch.compile 
def f(input, out):
    y = torch.ops._c10d_functional.all_gather_into_tensor_out.default(
        input, GROUP_SIZE, GROUP_NAME, out=out
    )
    y = torch.ops._c10d_functional.wait_tensor.default(y)
    return [y + 1]


input_t = torch.ones(4, device="cuda")
out_t = torch.empty(4 * GROUP_SIZE, device="cuda")

f(input_t, out_t)

Fails with:

[2026-05-12 13:00:49] devvm006:2374976:2374976 [0] misc/ibvwrap.cc:173 NCCL WARN lib wrapper not initialized.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 36, in <module>
[rank0]:     f(input_t, out_t)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1134, in compile_wrapper
[rank0]:     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3075, in _call_user_compiler
[rank0]:     raise BackendCompilerFailed(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 3049, in _call_user_compiler
[rank0]:     compiled_fn = compiler_fn(gm, example_inputs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 159, in __call__
[rank0]:     compiled_gm = compiler_fn(gm, example_inputs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/__init__.py", line 2482, in __call__
[rank0]:     return compile_fx(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2764, in compile_fx
[rank0]:     return _maybe_wrap_and_compile_fx_main(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2845, in _maybe_wrap_and_compile_fx_main
[rank0]:     return _compile_fx_main(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 3043, in _compile_fx_main
[rank0]:     return dynamo_common.aot_autograd(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 123, in __call__
[rank0]:     cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1223, in aot_module_simplified
[rank0]:     aot_state = create_aot_state(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 582, in create_aot_state
[rank0]:     fw_metadata = run_functionalized_fw_and_collect_metadata(
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/collect_metadata_analysis.py", line 220, in inner
[rank0]:     flat_f_outs = f(*flat_f_args)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_capture_wrappers.py", line 1534, in functional_call
[rank0]:     out = PropagateUnbackedSymInts(mod).run(*args)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 197, in run
[rank0]:     self.env[node] = self.run_node(node)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/experimental/symbolic_shapes.py", line 8700, in run_node
[rank0]:     result = super().run_node(n)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 294, in run_node
[rank0]:     return getattr(self, n.op)(n.target, args, kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/fx/interpreter.py", line 377, in call_function
[rank0]:     return target(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_ops.py", line 871, in __call__
[rank0]:     return self._op(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_compile.py", line 54, in inner
[rank0]:     return disable_fn(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1382, in _fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/home/tmanlaibaatar/.conda/envs/titan/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py", line 611, in __torch_dispatch__
[rank0]:     outs_unwrapped = func._op_dk(
[rank0]: torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
[rank0]: RuntimeError: Found a custom (non-ATen) operator whose output has alias annotations: _c10d_functional::all_gather_into_tensor_out(Tensor input, int group_size, Any group_name, *, Tensor(a!) out) -> Tensor(a!). We only support functionalizing operators whose outputs do not have alias annotations (e.g. 'Tensor(a)' is a Tensor with an alias annotation whereas 'Tensor' is a Tensor without. The '(a)' is the alias annotation). The alias annotation specifies that the output Tensor shares storage with an input that has the same annotation. Please check if (1) the output needs to be an output (if not, don't return it), (2) if the output doesn't share storage with any inputs, then delete the alias annotation. (3) if the output indeed shares storage with an input, then add a .clone() before returning it to prevent storage sharing and then delete the alias annotation. Otherwise, please file an issue on GitHub.

[rank0]: While executing %y : [num_users=1] = call_function[target=torch.ops._c10d_functional.all_gather_into_tensor_out.default](args = (%l_input_, 1, 0), kwargs = {out: %l_out_})
[rank0]: Original traceback:
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torchtitan/agent_space/c10d_out_functionalize_repro.py", line 26, in f
[rank0]:     y = torch.ops._c10d_functional.all_gather_into_tensor_out.default(

[rank0]: Use tlparse to see full graph. (https://github.com/pytorch/tlparse?tab=readme-ov-file#tlparse-parse-structured-pt2-logs)

[rank0]: Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

[rank0]:[W512 13:00:56.128309332 ProcessGroupNCCL.cpp:1648] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Versions

main

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @weifengpy @bdhirsh @ezyang @chauhang @penguinwu @bobrenjc93 @aorenste

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#GPU compatibility #latency issue #model loading #dependency error #configuration error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix AOTAutograd fails on distributed op

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

🐛 Describe the bug

Versions

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix AOTAutograd fails on distributed op

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

🐛 Describe the bug

Versions

Still need to ship something?

RELATED_DISCOVERY

TRENDING