pytorch - 💡(How to fix) Fix PG group + compile + COOR doesn't work on implicit PG group [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181890Fetched 2026-04-30 06:17:58
View on GitHub
Comments
2
Participants
3
Timeline
241
Reactions
0
Assignees
Timeline (top)
mentioned ×114subscribed ×114labeled ×7commented ×2

Error Message

[rank0]: return tracer.inline_call_() [rank0]: ^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/dynamo/symbolic_convert.py", line 5530, in inline_call [rank0]: self.run() [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1839, in run [rank0]: while self.step(): [rank0]: ^^^^^^^^^^^ [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1506, in step [rank0]: self.dispatch_table[inst.opcode](self, inst) [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1043, in wrapper [rank0]: return inner_fn(self, inst) [rank0]: ^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 4216, in CALL [rank0]: self._call(inst) [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 4207, in _call [rank0]: self.call_function(fn, args, kwargs) [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1407, in call_function [rank0]: self.push(fn.call_function(self, args, kwargs)) # type: ignore[arg-type] [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/variables/lazy.py", line 306, in realize_and_forward [rank0]: return getattr(self.realize(), name)(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/variables/torch.py", line 2729, in call_function [rank0]: *proxy_args_kwargs(args, kwargs), [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/utils.py", line 1440, in proxy_args_kwargs [rank0]: proxy_args = tuple(arg.as_proxy() for arg in args) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/utils.py", line 1440, in <genexpr> [rank0]: proxy_args = tuple(arg.as_proxy() for arg in args) [rank0]: ^^^^^^^^^^^^^^ [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/variables/script_object.py", line 280, in as_proxy [rank0]: assert is_opaque_value_type(type(self.proxy)) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: AssertionError:

[rank0]: from user code: [rank0]: File "/data/users/tmanlaibaatar/pytorch/agent_space/probe_compile_one_rank.py", line 19, in f [rank0]: dist.all_reduce(t) [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 1882, in _remapped_allreduce [rank0]: all_reduce_inplace(*args, **kwargs) [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/distributed/functional_collectives.py", line 1688, in all_reduce_inplace [rank0]: return tensor.copy(all_reduce(tensor, op, group, tag)) [rank0]: File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 179, in all_reduce [rank0]: tensor = torch.ops._c10d_functional.all_reduce(

Fix Action

Fix / Workaround

torch._dynamo.reset() with dist.config.patch(compile_on_one_rank=True): opt = torch.compile(f, backend="inductor", fullgraph=True) out = opt(x)

Fails with:

[rank0]:     return tracer.inline_call_()
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 5530, in inline_call_
[rank0]:     self.run()
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1839, in run
[rank0]:     while self.step():
[rank0]:           ^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1506, in step
[rank0]:     self.dispatch_table[inst.opcode](self, inst)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1043, in wrapper
[rank0]:     return inner_fn(self, inst)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 4216, in CALL
[rank0]:     self._call(inst)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 4207, in _call
[rank0]:     self.call_function(fn, args, kwargs)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1407, in call_function
[rank0]:     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/variables/lazy.py", line 306, in realize_and_forward
[rank0]:     return getattr(self.realize(), name)(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/variables/torch.py", line 2729, in call_function
[rank0]:     *proxy_args_kwargs(args, kwargs),
[rank0]:      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/utils.py", line 1440, in proxy_args_kwargs
[rank0]:     proxy_args = tuple(arg.as_proxy() for arg in args)
[rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/utils.py", line 1440, in <genexpr>
[rank0]:     proxy_args = tuple(arg.as_proxy() for arg in args)
[rank0]:                        ^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/variables/script_object.py", line 280, in as_proxy
[rank0]:     assert is_opaque_value_type(type(self.proxy))
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AssertionError:

Code Example

import os
import tempfile

import torch
import torch.distributed as dist


os.environ.setdefault("MASTER_ADDR", "127.0.0.1")
os.environ.setdefault("MASTER_PORT", "29522")
with tempfile.NamedTemporaryFile(delete=False) as fd:
    path = fd.name
dist.init_process_group(
    backend="gloo", rank=0, world_size=1, store=dist.FileStore(path, 1)
)


def f(t):
    t = t.clone()
    dist.all_reduce(t)
    return t + 1


x = torch.arange(4, dtype=torch.float32)

torch._dynamo.reset()
with dist.config.patch(compile_on_one_rank=True):
    opt = torch.compile(f, backend="inductor", fullgraph=True)
    out = opt(x)

---

[rank0]:     return tracer.inline_call_()
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 5530, in inline_call_
[rank0]:     self.run()
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1839, in run
[rank0]:     while self.step():
[rank0]:           ^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1506, in step
[rank0]:     self.dispatch_table[inst.opcode](self, inst)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1043, in wrapper
[rank0]:     return inner_fn(self, inst)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 4216, in CALL
[rank0]:     self._call(inst)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 4207, in _call
[rank0]:     self.call_function(fn, args, kwargs)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1407, in call_function
[rank0]:     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/variables/lazy.py", line 306, in realize_and_forward
[rank0]:     return getattr(self.realize(), name)(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/variables/torch.py", line 2729, in call_function
[rank0]:     *proxy_args_kwargs(args, kwargs),
[rank0]:      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/utils.py", line 1440, in proxy_args_kwargs
[rank0]:     proxy_args = tuple(arg.as_proxy() for arg in args)
[rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/utils.py", line 1440, in <genexpr>
[rank0]:     proxy_args = tuple(arg.as_proxy() for arg in args)
[rank0]:                        ^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/variables/script_object.py", line 280, in as_proxy
[rank0]:     assert is_opaque_value_type(type(self.proxy))
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AssertionError: 

[rank0]: from user code:
[rank0]:    File "/data/users/tmanlaibaatar/pytorch/agent_space/probe_compile_one_rank.py", line 19, in f
[rank0]:     dist.all_reduce(t)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 1882, in _remapped_allreduce
[rank0]:     all_reduce_inplace(*args, **kwargs)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 1688, in all_reduce_inplace
[rank0]:     return tensor.copy_(all_reduce(tensor, op, group, tag))
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 179, in all_reduce
[rank0]:     tensor = torch.ops._c10d_functional.all_reduce(
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug


import os
import tempfile

import torch
import torch.distributed as dist


os.environ.setdefault("MASTER_ADDR", "127.0.0.1")
os.environ.setdefault("MASTER_PORT", "29522")
with tempfile.NamedTemporaryFile(delete=False) as fd:
    path = fd.name
dist.init_process_group(
    backend="gloo", rank=0, world_size=1, store=dist.FileStore(path, 1)
)


def f(t):
    t = t.clone()
    dist.all_reduce(t)
    return t + 1


x = torch.arange(4, dtype=torch.float32)

torch._dynamo.reset()
with dist.config.patch(compile_on_one_rank=True):
    opt = torch.compile(f, backend="inductor", fullgraph=True)
    out = opt(x)

Fails with:

[rank0]:     return tracer.inline_call_()
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 5530, in inline_call_
[rank0]:     self.run()
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1839, in run
[rank0]:     while self.step():
[rank0]:           ^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1506, in step
[rank0]:     self.dispatch_table[inst.opcode](self, inst)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1043, in wrapper
[rank0]:     return inner_fn(self, inst)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 4216, in CALL
[rank0]:     self._call(inst)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 4207, in _call
[rank0]:     self.call_function(fn, args, kwargs)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/symbolic_convert.py", line 1407, in call_function
[rank0]:     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/variables/lazy.py", line 306, in realize_and_forward
[rank0]:     return getattr(self.realize(), name)(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/variables/torch.py", line 2729, in call_function
[rank0]:     *proxy_args_kwargs(args, kwargs),
[rank0]:      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/utils.py", line 1440, in proxy_args_kwargs
[rank0]:     proxy_args = tuple(arg.as_proxy() for arg in args)
[rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/utils.py", line 1440, in <genexpr>
[rank0]:     proxy_args = tuple(arg.as_proxy() for arg in args)
[rank0]:                        ^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/variables/script_object.py", line 280, in as_proxy
[rank0]:     assert is_opaque_value_type(type(self.proxy))
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AssertionError: 

[rank0]: from user code:
[rank0]:    File "/data/users/tmanlaibaatar/pytorch/agent_space/probe_compile_one_rank.py", line 19, in f
[rank0]:     dist.all_reduce(t)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 1882, in _remapped_allreduce
[rank0]:     all_reduce_inplace(*args, **kwargs)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 1688, in all_reduce_inplace
[rank0]:     return tensor.copy_(all_reduce(tensor, op, group, tag))
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 179, in all_reduce
[rank0]:     tensor = torch.ops._c10d_functional.all_reduce(

Versions

main

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @weifengpy @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @amjames @Lucaskabela @jataylo @azahed98

extent analysis

TL;DR

The issue is likely due to an incompatibility between PyTorch's torch.compile and dist.all_reduce, and a potential workaround is to avoid using torch.compile with distributed functions.

Guidance

  • Verify that the issue is specific to the torch.compile function by testing the code without compilation.
  • Check if the dist.all_reduce function is compatible with the torch.compile backend "inductor" by referring to the PyTorch documentation.
  • Consider using a different compilation backend or disabling compilation for the specific function that uses dist.all_reduce.
  • Test the code with a different version of PyTorch to see if the issue is version-specific.

Example

# Test the code without compilation
def f(t):
    t = t.clone()
    dist.all_reduce(t)
    return t + 1

x = torch.arange(4, dtype=torch.float32)
out = f(x)

Notes

The provided code snippet is using the "inductor" backend for compilation, which may not be compatible with the dist.all_reduce function. The issue may be specific to the "inductor" backend or the version of PyTorch being used.

Recommendation

Apply a workaround by avoiding the use of torch.compile with distributed functions, as it may not be compatible with all backends or versions of PyTorch.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix PG group + compile + COOR doesn't work on implicit PG group [2 comments, 3 participants]