pytorch - 💡(How to fix) Fix [torch.compile] AOTAutograd fails to coerce plain Tensor tangent for expected AsyncCollectiveTensor

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Error logs

[rank0]:[rank0]: Traceback (most recent call last):

Fix Action

Fix / Workaround

  1. Code path torch/_functorch/_aot_autograd/runtime_wrappers.py AOTDispatchAutograd.process_runtime_tangent()

[rank0]:[rank0]: Traceback (most recent call last): [rank0]:[rank0]: File "<frozen runpy>", line 198, in _run_module_as_main [rank0]:[rank0]: File "<frozen runpy>", line 88, in _run_code [rank0]:[rank0]: File "/data/z00944403/torchtitan-npu/torchtitan_npu/entry.py", line 110, in <module> [rank0]:[rank0]: trainer.train() [rank0]:[rank0]: File "/data/z00944403/torchtitan-npu/torchtitan_npu/train.py", line 158, in wrapper_train [rank0]:[rank0]: return _original(self) [rank0]:[rank0]: ^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 362, in wrapper [rank0]:[rank0]: return f(*args, **kwargs) [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torchtitan/train.py", line 675, in train [rank0]:[rank0]: self.train_step(data_iterator) [rank0]:[rank0]: File "/data/z00944403/torchtitan-npu/torchtitan_npu/train.py", line 87, in wrapper_train_step [rank0]:[rank0]: result = _original_train_step(self, *args, **kwargs) [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torchtitan/train.py", line 584, in train_step [rank0]:[rank0]: loss = self.forward_backward_step( [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torchtitan/train.py", line 542, in forward_backward_step [rank0]:[rank0]: loss.backward() [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/_tensor.py", line 630, in backward [rank0]:[rank0]: torch.autograd.backward( [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/autograd/init.py", line 364, in backward [rank0]:[rank0]: _engine_run_backward( [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/autograd/graph.py", line 865, in _engine_run_backward [rank0]:[rank0]: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/autograd/function.py", line 317, in apply [rank0]:[rank0]: return user_fn(self, *args) [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2276, in backward [rank0]:[rank0]: all_args = _backward_prologue_functional( [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1751, in _backward_prologue_functional [rank0]:[rank0]: flat_processed_tangents = list( [rank0]:[rank0]: ^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1754, in <genexpr> [rank0]:[rank0]: AOTDispatchAutograd.process_runtime_tangent( [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2046, in process_runtime_tangent [rank0]:[rank0]: new_elem, elem_leaves = AOTDispatchAutograd.process_runtime_tangent( [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2017, in process_runtime_tangent [rank0]:[rank0]: raise RuntimeError( [rank0]:[rank0]: RuntimeError: [rank0]:[rank0]: During the backward, we encountered a tensor subclass where we guessed its [rank0]:[rank0]: metadata incorrectly. [rank0]: [rank0]:[rank0]: Expected metadata: None, expected type: <class 'torch.distributed._functional_collectives.AsyncCollectiveTensor'> [rank0]: [rank0]:[rank0]: Runtime metadata: None, runtime type: <class 'torch.Tensor'> [rank0]: [rank0]:[rank0]: shape: torch.Size([1, 2048, 4096]) [rank0]:[rank0]: To fix this, your tensor subclass must implement the dunder method force_to_same_metadata.

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

  1. Actual behavior Expected type: torch.distributed._functional_collectives.AsyncCollectiveTensor Runtime type: torch.Tensor shape: torch.Size([1, 2048, 4096])

  2. Trigger condition torch.compile(model submodule) + TP/DTensor functional collective + backward

  3. Code path torch/_functorch/_aot_autograd/runtime_wrappers.py AOTDispatchAutograd.process_runtime_tangent()

  4. Why this seems like a PyTorch issue AsyncCollectiveTensor already implements: AsyncCollectiveTensor -> torch.Tensor

but the reverse runtime case is not handled: torch.Tensor runtime tangent expected_type = AsyncCollectiveTensor expected_metadata = None

Error logs

[rank0]:[rank0]: Traceback (most recent call last): [rank0]:[rank0]: File "<frozen runpy>", line 198, in _run_module_as_main [rank0]:[rank0]: File "<frozen runpy>", line 88, in _run_code [rank0]:[rank0]: File "/data/z00944403/torchtitan-npu/torchtitan_npu/entry.py", line 110, in <module> [rank0]:[rank0]: trainer.train() [rank0]:[rank0]: File "/data/z00944403/torchtitan-npu/torchtitan_npu/train.py", line 158, in wrapper_train [rank0]:[rank0]: return _original(self) [rank0]:[rank0]: ^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 362, in wrapper [rank0]:[rank0]: return f(*args, **kwargs) [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torchtitan/train.py", line 675, in train [rank0]:[rank0]: self.train_step(data_iterator) [rank0]:[rank0]: File "/data/z00944403/torchtitan-npu/torchtitan_npu/train.py", line 87, in wrapper_train_step [rank0]:[rank0]: result = _original_train_step(self, *args, **kwargs) [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torchtitan/train.py", line 584, in train_step [rank0]:[rank0]: loss = self.forward_backward_step( [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torchtitan/train.py", line 542, in forward_backward_step [rank0]:[rank0]: loss.backward() [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/_tensor.py", line 630, in backward [rank0]:[rank0]: torch.autograd.backward( [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/autograd/init.py", line 364, in backward [rank0]:[rank0]: _engine_run_backward( [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/autograd/graph.py", line 865, in _engine_run_backward [rank0]:[rank0]: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/autograd/function.py", line 317, in apply [rank0]:[rank0]: return user_fn(self, *args) [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2276, in backward [rank0]:[rank0]: all_args = _backward_prologue_functional( [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1751, in _backward_prologue_functional [rank0]:[rank0]: flat_processed_tangents = list( [rank0]:[rank0]: ^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1754, in <genexpr> [rank0]:[rank0]: AOTDispatchAutograd.process_runtime_tangent( [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2046, in process_runtime_tangent [rank0]:[rank0]: new_elem, elem_leaves = AOTDispatchAutograd.process_runtime_tangent( [rank0]:[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]:[rank0]: File "/usr/local/python3.11.15/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2017, in process_runtime_tangent [rank0]:[rank0]: raise RuntimeError( [rank0]:[rank0]: RuntimeError: [rank0]:[rank0]: During the backward, we encountered a tensor subclass where we guessed its [rank0]:[rank0]: metadata incorrectly. [rank0]: [rank0]:[rank0]: Expected metadata: None, expected type: <class 'torch.distributed._functional_collectives.AsyncCollectiveTensor'> [rank0]: [rank0]:[rank0]: Runtime metadata: None, runtime type: <class 'torch.Tensor'> [rank0]: [rank0]:[rank0]: shape: torch.Size([1, 2048, 4096]) [rank0]:[rank0]: To fix this, your tensor subclass must implement the dunder method force_to_same_metadata.

Versions

I am working on reducing this from a large distributed training workload. The current failure happens in torchtitan-npu DeepSeek-V4 with TP + model compile.

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @weifengpy @ezyang @albanD @chauhang @penguinwu @tianyu-l @XilunWu @SherlockNoMad @ppwwyyxx @bdhirsh @bobrenjc93 @aorenste

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix [torch.compile] AOTAutograd fails to coerce plain Tensor tangent for expected AsyncCollectiveTensor