pytorch - 💡(How to fix) Fix COOR + process group + compile crashes in inductor [1 participants]

pytorch2026-04-29 18:21:57

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#181893•Fetched 2026-04-30 06:17:51

View on GitHub

Comments

Participants

Timeline

191

Reactions

Author

tugsbayasgalan

Participants

tugsbayasgalan

Assignees

aorenste

Timeline (top)

mentioned ×90subscribed ×90labeled ×7cross-referenced ×2

Error Message

import os import tempfile import traceback

import torch import torch.distributed as dist

os.environ.setdefault("MASTER_ADDR", "127.0.0.1") os.environ.setdefault("MASTER_PORT", "29530") with tempfile.NamedTemporaryFile(delete=False) as fd: p = fd.name dist.init_process_group( backend="gloo", rank=0, world_size=1, store=dist.FileStore(p, 1) ) import torch.distributed.tensor # ensure ProcessGroup is registered as opaque

pg = dist.distributed_c10d._get_default_group()

def f(t, group): t = t.clone() dist.all_reduce(t, group=group) return t + 1

x = torch.arange(4, dtype=torch.float32)

torch._dynamo.reset() with dist.config.patch(compile_on_one_rank=True): opt = torch.compile(f, backend="inductor", fullgraph=True) out = opt(x, pg)

Fix Action

Fix / Workaround

torch._dynamo.reset() with dist.config.patch(compile_on_one_rank=True): opt = torch.compile(f, backend="inductor", fullgraph=True) out = opt(x, pg)

Code Example

import os
import tempfile
import traceback

import torch
import torch.distributed as dist


os.environ.setdefault("MASTER_ADDR", "127.0.0.1")
os.environ.setdefault("MASTER_PORT", "29530")
with tempfile.NamedTemporaryFile(delete=False) as fd:
    p = fd.name
dist.init_process_group(
    backend="gloo", rank=0, world_size=1, store=dist.FileStore(p, 1)
)
import torch.distributed.tensor  # ensure ProcessGroup is registered as opaque

pg = dist.distributed_c10d._get_default_group()


def f(t, group):
    t = t.clone()
    dist.all_reduce(t, group=group)
    return t + 1


x = torch.arange(4, dtype=torch.float32)

torch._dynamo.reset()
with dist.config.patch(compile_on_one_rank=True):
    opt = torch.compile(f, backend="inductor", fullgraph=True)
    out = opt(x, pg)

---

rank0]:W0429 11:20:46.908000 3802376 /data/users/tmanlaibaatar/pytorch/torch/_inductor/codecache.py:891] [0/0]     self.dump(obj)
[rank0]:W0429 11:20:46.908000 3802376 /data/users/tmanlaibaatar/pytorch/torch/_inductor/codecache.py:891] [0/0] TypeError: cannot pickle 'torch._C._distributed_c10d.ProcessGroup' object
[rank0]: Traceback (most recent call last):
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/agent_space/probe_pg_as_input.py", line 35, in <module>
[rank0]:     out = opt(x, pg)
[rank0]:           ^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/eval_frame.py", line 1069, in compile_wrapper
[rank0]:     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1047, in _compile_fx_inner
[rank0]:     raise InductorError(e, currentframe()).with_traceback(
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1039, in _compile_fx_inner
[rank0]:     mb_compiled_graph = fx_codegen_and_compile(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1845, in fx_codegen_and_compile
[rank0]:     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1521, in codegen_and_compile
[rank0]:     graph.run(*example_inputs)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1063, in run
[rank0]:     return super().run(*args)
[rank0]:            ^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/fx/interpreter.py", line 197, in run
[rank0]:     self.env[node] = self.run_node(node)
[rank0]:                      ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1914, in run_node
[rank0]:     result = super().run_node(n)
[rank0]:              ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/fx/interpreter.py", line 294, in run_node
[rank0]:     return getattr(self, n.op)(n.target, args, kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1486, in call_function
[rank0]:     raise LoweringException(
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1463, in call_function
[rank0]:     out = lowerings[target](*args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/lowering.py", line 518, in wrapped
[rank0]:     out = decomp_fn(*args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/comm_lowering.py", line 242, in _all_reduce
[rank0]:     ir._AllReduce_Kernel.create_inplace(
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/ir.py", line 10551, in create_inplace
[rank0]:     tensor_arg.realize()
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/ir.py", line 767, in realize
[rank0]:     raise NotImplementedError(f"realize NYI on {type(self)}")
[rank0]: torch._inductor.exc.InductorError: LoweringException: NotImplementedError: realize NYI on <class 'torch._inductor.ir.TorchBindObject'>
[rank0]:   target: _c10d_functional.all_reduce.default
[rank0]:   args[0]: TensorBox(StorageBox(
[rank0]:     InputBuffer(name='arg0_1', layout=FixedLayout('cpu', torch.float32, size=[4], stride=[1]))
[rank0]:   ))
[rank0]:   args[1]: sum
[rank0]:   args[2]: TorchBindObject(name='_opaque_obj0', value=<torch.distributed.distributed_c10d.ProcessGroup object at 0x7f9c4eb88df0>)NotImplementedError: realize NYI on <class 'torch._inductor.ir.TorchBindObject'>
[rank0]:   target: _c10d_functional.all_reduce.default
[rank0]:   args[0]: TensorBox(StorageBox(
[rank0]:     InputBuffer(name='arg0_1', layout=FixedLayout('cpu', torch.float32, size=[4], stride=[1]))
[rank0]:   ))
[rank0]:   args[1]: sum
[rank0]:   args[2]: TorchBindObject(name='_opaque_obj0', value=<torch.distributed.distributed_c10d.ProcessGroup object at 0x7f9c4eb88df0>)
[rank0]: Found from : 
[rank0]:    File "/data/users/tmanlaibaatar/pytorch/agent_space/probe_pg_as_input.py", line 26, in f
[rank0]:     dist.all_reduce(t, group=group)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 1882, in _remapped_allreduce
[rank0]:     all_reduce_inplace(*args, **kwargs)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 1688, in all_reduce_inplace
[rank0]:     return tensor.copy_(all_reduce(tensor, op, group, tag))
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 179, in all_reduce
[rank0]:     tensor = torch.ops._c10d_functional.all_reduce(

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug


import os
import tempfile
import traceback

import torch
import torch.distributed as dist


os.environ.setdefault("MASTER_ADDR", "127.0.0.1")
os.environ.setdefault("MASTER_PORT", "29530")
with tempfile.NamedTemporaryFile(delete=False) as fd:
    p = fd.name
dist.init_process_group(
    backend="gloo", rank=0, world_size=1, store=dist.FileStore(p, 1)
)
import torch.distributed.tensor  # ensure ProcessGroup is registered as opaque

pg = dist.distributed_c10d._get_default_group()


def f(t, group):
    t = t.clone()
    dist.all_reduce(t, group=group)
    return t + 1


x = torch.arange(4, dtype=torch.float32)

torch._dynamo.reset()
with dist.config.patch(compile_on_one_rank=True):
    opt = torch.compile(f, backend="inductor", fullgraph=True)
    out = opt(x, pg)

Fails with:

rank0]:W0429 11:20:46.908000 3802376 /data/users/tmanlaibaatar/pytorch/torch/_inductor/codecache.py:891] [0/0]     self.dump(obj)
[rank0]:W0429 11:20:46.908000 3802376 /data/users/tmanlaibaatar/pytorch/torch/_inductor/codecache.py:891] [0/0] TypeError: cannot pickle 'torch._C._distributed_c10d.ProcessGroup' object
[rank0]: Traceback (most recent call last):
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/agent_space/probe_pg_as_input.py", line 35, in <module>
[rank0]:     out = opt(x, pg)
[rank0]:           ^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_dynamo/eval_frame.py", line 1069, in compile_wrapper
[rank0]:     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1047, in _compile_fx_inner
[rank0]:     raise InductorError(e, currentframe()).with_traceback(
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1039, in _compile_fx_inner
[rank0]:     mb_compiled_graph = fx_codegen_and_compile(
[rank0]:                         ^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1845, in fx_codegen_and_compile
[rank0]:     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/compile_fx.py", line 1521, in codegen_and_compile
[rank0]:     graph.run(*example_inputs)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1063, in run
[rank0]:     return super().run(*args)
[rank0]:            ^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/fx/interpreter.py", line 197, in run
[rank0]:     self.env[node] = self.run_node(node)
[rank0]:                      ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1914, in run_node
[rank0]:     result = super().run_node(n)
[rank0]:              ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/fx/interpreter.py", line 294, in run_node
[rank0]:     return getattr(self, n.op)(n.target, args, kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1486, in call_function
[rank0]:     raise LoweringException(
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/graph.py", line 1463, in call_function
[rank0]:     out = lowerings[target](*args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/lowering.py", line 518, in wrapped
[rank0]:     out = decomp_fn(*args, **kwargs)
[rank0]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/comm_lowering.py", line 242, in _all_reduce
[rank0]:     ir._AllReduce_Kernel.create_inplace(
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/ir.py", line 10551, in create_inplace
[rank0]:     tensor_arg.realize()
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/_inductor/ir.py", line 767, in realize
[rank0]:     raise NotImplementedError(f"realize NYI on {type(self)}")
[rank0]: torch._inductor.exc.InductorError: LoweringException: NotImplementedError: realize NYI on <class 'torch._inductor.ir.TorchBindObject'>
[rank0]:   target: _c10d_functional.all_reduce.default
[rank0]:   args[0]: TensorBox(StorageBox(
[rank0]:     InputBuffer(name='arg0_1', layout=FixedLayout('cpu', torch.float32, size=[4], stride=[1]))
[rank0]:   ))
[rank0]:   args[1]: sum
[rank0]:   args[2]: TorchBindObject(name='_opaque_obj0', value=<torch.distributed.distributed_c10d.ProcessGroup object at 0x7f9c4eb88df0>)NotImplementedError: realize NYI on <class 'torch._inductor.ir.TorchBindObject'>
[rank0]:   target: _c10d_functional.all_reduce.default
[rank0]:   args[0]: TensorBox(StorageBox(
[rank0]:     InputBuffer(name='arg0_1', layout=FixedLayout('cpu', torch.float32, size=[4], stride=[1]))
[rank0]:   ))
[rank0]:   args[1]: sum
[rank0]:   args[2]: TorchBindObject(name='_opaque_obj0', value=<torch.distributed.distributed_c10d.ProcessGroup object at 0x7f9c4eb88df0>)
[rank0]: Found from : 
[rank0]:    File "/data/users/tmanlaibaatar/pytorch/agent_space/probe_pg_as_input.py", line 26, in f
[rank0]:     dist.all_reduce(t, group=group)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 1882, in _remapped_allreduce
[rank0]:     all_reduce_inplace(*args, **kwargs)
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 1688, in all_reduce_inplace
[rank0]:     return tensor.copy_(all_reduce(tensor, op, group, tag))
[rank0]:   File "/data/users/tmanlaibaatar/pytorch/torch/distributed/_functional_collectives.py", line 179, in all_reduce
[rank0]:     tensor = torch.ops._c10d_functional.all_reduce(

Versions

main

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @weifengpy @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

The issue is likely due to the torch.distributed.distributed_c10d.ProcessGroup object not being serializable, causing a NotImplementedError when trying to compile the f function with torch.compile.

Guidance

The error occurs when trying to compile the f function, which uses dist.all_reduce, with torch.compile. This suggests that the issue is related to the compilation process.
The torch.distributed.distributed_c10d.ProcessGroup object is not serializable, which is likely causing the error.
To fix this issue, you may need to modify the f function to avoid using the ProcessGroup object directly, or find a way to serialize it.
You can try to verify this by checking if the error occurs when compiling other functions that use dist.all_reduce with a ProcessGroup object.

Example

# Example of how to modify the f function to avoid using the ProcessGroup object directly
def f(t):
    t = t.clone()
    dist.all_reduce(t)
    return t + 1

Note that this example is speculative and may not work as-is, but it illustrates the idea of modifying the f function to avoid using the ProcessGroup object directly.

Notes

The issue seems to be related to the torch.compile functionality, which is still experimental and may have limitations.
The error message suggests that the torch.distributed.distributed_c10d.ProcessGroup object is not serializable, which is a known limitation of the torch.compile functionality.

Recommendation

Apply workaround: Modify the f function to avoid using the ProcessGroup object directly, or find a way to serialize it. This may require significant changes to the code and may not be feasible in all cases.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#tensor shape #autograd error #model save/load #optimization #mixed precision

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix COOR + process group + compile crashes in inductor [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

🐛 Describe the bug

Versions

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix COOR + process group + compile crashes in inductor [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

🐛 Describe the bug

Versions

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING