pytorch - 💡(How to fix) Fix COOR + process group + compile crashes in inductor [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181893Fetched 2026-04-30 06:17:51
View on GitHub
Comments
0
Participants
1
Timeline
191
Reactions
0
Participants
Assignees
Timeline (top)
mentioned ×90subscribed ×90labeled ×7cross-referenced ×2

Error Message

import os import tempfile import traceback

import torch import torch.distributed as dist

os.environ.setdefault("MASTER_ADDR", "127.0.0.1") os.environ.setdefault("MASTER_PORT", "29530") with tempfile.NamedTemporaryFile(delete=False) as fd: p = fd.name dist.init_process_group( backend="gloo", rank=0, world_size=1, store=dist.FileStore(p, 1) ) import torch.distributed.tensor # ensure ProcessGroup is registered as opaque

pg = dist.distributed_c10d._get_default_group()

def f(t, group): t = t.clone() dist.all_reduce(t, group=group) return t + 1

x = torch.arange(4, dtype=torch.float32)

torch._dynamo.reset() with dist.config.patch(compile_on_one_rank=True): opt = torch.compile(f, backend="inductor", fullgraph=True) out = opt(x, pg)

Fix Action

Fix / Workaround

torch._dynamo.reset() with dist.config.patch(compile_on_one_rank=True): opt = torch.compile(f, backend="inductor", fullgraph=True) out = opt(x, pg)

Code Example

import os
import tempfile
import traceback

import torch
import torch.distributed as dist


os.environ.setdefault("MASTER_ADDR", "127.0.0.1")
os.environ.setdefault("MASTER_PORT", "29530")
with tempfile.NamedTemporaryFile(delete=False) as fd:
    p = fd.name
dist.init_process_group(
    backend="gloo", rank=0, world_size=1, store=dist.FileStore(p, 1)
)
import torch.distributed.tensor  # ensure ProcessGroup is registered as opaque

pg = dist.distributed_c10d._get_default_group()


def f(t, group):
    t = t.clone()
    dist.all_reduce(t, group=group)
    return t + 1


x = torch.arange(4, dtype=torch.float32)

torch._dynamo.reset()
with dist.config.patch(compile_on_one_rank=True):
    opt = torch.compile(f, backend="inductor", fullgraph=True)
    out = opt(x, pg)

---

rank0]:W0429 11:20:46.908000 3802376 /data/users/tmanlaibaatar/pytorch/torch/_inductor/codecache.py:891] [0/0]     self.dump(obj)
[rank0]:W0429 11:20:46.908000 3802376 /data/users/tmanlaibaatar/pytorch/torch/_inductor/codecache.py:891] [0/0] TypeError: cannot pickle 'torch._C._distributed_c10d.ProcessGroup' object
[rank0]: Traceback (most recent call last):
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/agent_space/probe_pg_as_input.py", line 35, in <module>
[rank0]:     out = opt(x, pg)
[rank0]:           ^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/eval_frame.py", line 1069, in compile_wrapper
[rank0]:     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1047, in _compile_fx_inner
[rank0]:     raise InductorError(e, currentframe()).with_traceback(
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1039, in _compile_fx_inner
[rank0]:     mb_compiled_graph = fx_codegen_and_compile(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1845, in fx_codegen_and_compile
[rank0]:     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1521, in codegen_and_compile
[rank0]:     graph.run(*example_inputs)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1063, in run
[rank0]:     return super().run(*args)
[rank0]:            ^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/fx/interpreter.py", line 197, in run
[rank0]:     self.env[node] = self.run_node(node)
[rank0]:                      ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1914, in run_node
[rank0]:     result = super().run_node(n)
[rank0]:              ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/fx/interpreter.py", line 294, in run_node
[rank0]:     return getattr(self, n.op)(n.target, args, kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1486, in call_function
[rank0]:     raise LoweringException(
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1463, in call_function
[rank0]:     out = lowerings[target](*args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/lowering.py", line 518, in wrapped
[rank0]:     out = decomp_fn(*args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/comm_lowering.py", line 242, in _all_reduce
[rank0]:     ir._AllReduce_Kernel.create_inplace(
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/ir.py", line 10551, in create_inplace
[rank0]:     tensor_arg.realize()
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/ir.py", line 767, in realize
[rank0]:     raise NotImplementedError(f"realize NYI on {type(self)}")
[rank0]: torch._inductor.exc.InductorError: LoweringException: NotImplementedError: realize NYI on <class 'torch._inductor.ir.TorchBindObject'>
[rank0]:   target: _c10d_functional.all_reduce.default
[rank0]:   args[0]: TensorBox(StorageBox(
[rank0]:     InputBuffer(name='arg0_1', layout=FixedLayout('cpu', torch.float32, size=[4], stride=[1]))
[rank0]:   ))
[rank0]:   args[1]: sum
[rank0]:   args[2]: TorchBindObject(name='_opaque_obj0', value=<torch.distributed.distributed_c10d.ProcessGroup object at 0x7f9c4eb88df0>)NotImplementedError: realize NYI on <class 'torch._inductor.ir.TorchBindObject'>
[rank0]:   target: _c10d_functional.all_reduce.default
[rank0]:   args[0]: TensorBox(StorageBox(
[rank0]:     InputBuffer(name='arg0_1', layout=FixedLayout('cpu', torch.float32, size=[4], stride=[1]))
[rank0]:   ))
[rank0]:   args[1]: sum
[rank0]:   args[2]: TorchBindObject(name='_opaque_obj0', value=<torch.distributed.distributed_c10d.ProcessGroup object at 0x7f9c4eb88df0>)
[rank0]: Found from : 
[rank0]:    File "/data/users/tmanlaibaatar/pytorch/agent_space/probe_pg_as_input.py", line 26, in f
[rank0]:     dist.all_reduce(t, group=group)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 1882, in _remapped_allreduce
[rank0]:     all_reduce_inplace(*args, **kwargs)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 1688, in all_reduce_inplace
[rank0]:     return tensor.copy_(all_reduce(tensor, op, group, tag))
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 179, in all_reduce
[rank0]:     tensor = torch.ops._c10d_functional.all_reduce(
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug


import os
import tempfile
import traceback

import torch
import torch.distributed as dist


os.environ.setdefault("MASTER_ADDR", "127.0.0.1")
os.environ.setdefault("MASTER_PORT", "29530")
with tempfile.NamedTemporaryFile(delete=False) as fd:
    p = fd.name
dist.init_process_group(
    backend="gloo", rank=0, world_size=1, store=dist.FileStore(p, 1)
)
import torch.distributed.tensor  # ensure ProcessGroup is registered as opaque

pg = dist.distributed_c10d._get_default_group()


def f(t, group):
    t = t.clone()
    dist.all_reduce(t, group=group)
    return t + 1


x = torch.arange(4, dtype=torch.float32)

torch._dynamo.reset()
with dist.config.patch(compile_on_one_rank=True):
    opt = torch.compile(f, backend="inductor", fullgraph=True)
    out = opt(x, pg)

Fails with:

rank0]:W0429 11:20:46.908000 3802376 /data/users/tmanlaibaatar/pytorch/torch/_inductor/codecache.py:891] [0/0]     self.dump(obj)
[rank0]:W0429 11:20:46.908000 3802376 /data/users/tmanlaibaatar/pytorch/torch/_inductor/codecache.py:891] [0/0] TypeError: cannot pickle 'torch._C._distributed_c10d.ProcessGroup' object
[rank0]: Traceback (most recent call last):
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/agent_space/probe_pg_as_input.py", line 35, in <module>
[rank0]:     out = opt(x, pg)
[rank0]:           ^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/eval_frame.py", line 1069, in compile_wrapper
[rank0]:     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1047, in _compile_fx_inner
[rank0]:     raise InductorError(e, currentframe()).with_traceback(
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1039, in _compile_fx_inner
[rank0]:     mb_compiled_graph = fx_codegen_and_compile(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1845, in fx_codegen_and_compile
[rank0]:     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1521, in codegen_and_compile
[rank0]:     graph.run(*example_inputs)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1063, in run
[rank0]:     return super().run(*args)
[rank0]:            ^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/fx/interpreter.py", line 197, in run
[rank0]:     self.env[node] = self.run_node(node)
[rank0]:                      ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1914, in run_node
[rank0]:     result = super().run_node(n)
[rank0]:              ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/fx/interpreter.py", line 294, in run_node
[rank0]:     return getattr(self, n.op)(n.target, args, kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1486, in call_function
[rank0]:     raise LoweringException(
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1463, in call_function
[rank0]:     out = lowerings[target](*args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/lowering.py", line 518, in wrapped
[rank0]:     out = decomp_fn(*args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/comm_lowering.py", line 242, in _all_reduce
[rank0]:     ir._AllReduce_Kernel.create_inplace(
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/ir.py", line 10551, in create_inplace
[rank0]:     tensor_arg.realize()
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/ir.py", line 767, in realize
[rank0]:     raise NotImplementedError(f"realize NYI on {type(self)}")
[rank0]: torch._inductor.exc.InductorError: LoweringException: NotImplementedError: realize NYI on <class 'torch._inductor.ir.TorchBindObject'>
[rank0]:   target: _c10d_functional.all_reduce.default
[rank0]:   args[0]: TensorBox(StorageBox(
[rank0]:     InputBuffer(name='arg0_1', layout=FixedLayout('cpu', torch.float32, size=[4], stride=[1]))
[rank0]:   ))
[rank0]:   args[1]: sum
[rank0]:   args[2]: TorchBindObject(name='_opaque_obj0', value=<torch.distributed.distributed_c10d.ProcessGroup object at 0x7f9c4eb88df0>)NotImplementedError: realize NYI on <class 'torch._inductor.ir.TorchBindObject'>
[rank0]:   target: _c10d_functional.all_reduce.default
[rank0]:   args[0]: TensorBox(StorageBox(
[rank0]:     InputBuffer(name='arg0_1', layout=FixedLayout('cpu', torch.float32, size=[4], stride=[1]))
[rank0]:   ))
[rank0]:   args[1]: sum
[rank0]:   args[2]: TorchBindObject(name='_opaque_obj0', value=<torch.distributed.distributed_c10d.ProcessGroup object at 0x7f9c4eb88df0>)
[rank0]: Found from : 
[rank0]:    File "/data/users/tmanlaibaatar/pytorch/agent_space/probe_pg_as_input.py", line 26, in f
[rank0]:     dist.all_reduce(t, group=group)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 1882, in _remapped_allreduce
[rank0]:     all_reduce_inplace(*args, **kwargs)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 1688, in all_reduce_inplace
[rank0]:     return tensor.copy_(all_reduce(tensor, op, group, tag))
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 179, in all_reduce
[rank0]:     tensor = torch.ops._c10d_functional.all_reduce(

Versions

main

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @weifengpy @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

The issue is likely due to the torch.distributed.distributed_c10d.ProcessGroup object not being serializable, causing a NotImplementedError when trying to compile the f function with torch.compile.

Guidance

  • The error occurs when trying to compile the f function, which uses dist.all_reduce, with torch.compile. This suggests that the issue is related to the compilation process.
  • The torch.distributed.distributed_c10d.ProcessGroup object is not serializable, which is likely causing the error.
  • To fix this issue, you may need to modify the f function to avoid using the ProcessGroup object directly, or find a way to serialize it.
  • You can try to verify this by checking if the error occurs when compiling other functions that use dist.all_reduce with a ProcessGroup object.

Example

# Example of how to modify the f function to avoid using the ProcessGroup object directly
def f(t):
    t = t.clone()
    dist.all_reduce(t)
    return t + 1

Note that this example is speculative and may not work as-is, but it illustrates the idea of modifying the f function to avoid using the ProcessGroup object directly.

Notes

  • The issue seems to be related to the torch.compile functionality, which is still experimental and may have limitations.
  • The error message suggests that the torch.distributed.distributed_c10d.ProcessGroup object is not serializable, which is a known limitation of the torch.compile functionality.

Recommendation

Apply workaround: Modify the f function to avoid using the ProcessGroup object directly, or find a way to serialize it. This may require significant changes to the code and may not be feasible in all cases.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix COOR + process group + compile crashes in inductor [1 participants]