pytorch - 💡(How to fix) Fix DeviceMesh pickling warning spam in DCP process [1 participants]

pytorch2026-05-01 08:57:45

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#182102•Fetched 2026-05-02 05:27:14

View on GitHub

Comments

Participants

Timeline

Reactions

Author

awaelchli

Participants

awaelchli

Timeline (top)

mentioned ×15subscribed ×15labeled ×4

Root Cause

Warning appears multiple times per DTensor in the state dict
The commit that introduced the warning: https://github.com/pytorch/pytorch/commit/7c747a71f2670b7bfb3e3cc03d56bb249d0a03b6
Warning is emitted here because the DTensor is unpickled after sending it to the checkpointer subprocess. The subprocess does not have the PGs set up, understandably so.
False positive because the checkpointer subprocess does not look up PGs nor run torch.compile.
Suppressing the warning is possible but is somewhat non-trivial for end user since logger is used in subprocess

Fix Action

Fix / Workaround

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 192 On-line CPU(s) list: 0-191 Vendor ID: AuthenticAMD BIOS Vendor ID: Advanced Micro Devices, Inc. Model name: AMD EPYC 7R13 Processor BIOS Model name: AMD EPYC 7R13 Processor CPU @ 2.6GHz BIOS CPU family: 107 CPU family: 25 Model: 1 Thread(s) per core: 2 Core(s) per socket: 48 Socket(s): 2 Stepping: 1 BogoMIPS: 5299.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save vaes vpclmulqdq rdpid Hypervisor vendor: KVM Virtualization type: full L1d cache: 3 MiB (96 instances) L1i cache: 3 MiB (96 instances) L2 cache: 48 MiB (96 instances) L3 cache: 384 MiB (12 instances) NUMA node(s): 4 NUMA node0 CPU(s): 0-23,96-119 NUMA node1 CPU(s): 24-47,120-143 NUMA node2 CPU(s): 48-71,144-167 NUMA node3 CPU(s): 72-95,168-191 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Mitigation; Safe RET Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsa: Mitigation; Clear CPU buffers Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected

Code Example

[rank1]:W0430 22:51:37.612000 1258290 torch/distributed/device_mesh.py:385] It seems like pickling/unpickling of the DeviceMesh occurred before the PGs were created. This will cause PG lookup to fail when torch.compile is enabled

---

import os
import torch
import torch.multiprocessing as mp
from torch.distributed.device_mesh import init_device_mesh
from torch.distributed.tensor import Shard, distribute_tensor
import torch.distributed.checkpoint as dcp


def main(rank):
    os.environ["LOCAL_RANK"] = str(rank)
    torch.distributed.init_process_group("cpu:gloo,cuda:nccl", init_method="tcp://localhost:52000", world_size=8, rank=rank)
    device = torch.device("cuda", rank)
    mesh = init_device_mesh("cuda", mesh_shape=(2, 4), mesh_dim_names=["dim0", "dim1"])
    tensor = torch.rand(16, 16, device=device)
    dtensor = distribute_tensor(tensor, mesh, placements=(Shard(0), Shard(1)))
    state_dict = {"dtensor": dtensor}
    future = dcp.async_save(
        state_dict,
        checkpoint_id="/tmp/checkpoints/debug",
        async_checkpointer_type=dcp.state_dict_saver.AsyncCheckpointerType.PROCESS,
    )
    future.result()
    torch.distributed.destroy_process_group()


if __name__ == "__main__":
    mp.spawn(main, nprocs=8)

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

When saving a checkpoint with dcp.async_save, a large, false-positive warning spam appears from DeviceMesh:

[rank1]:W0430 22:51:37.612000 1258290 torch/distributed/device_mesh.py:385] It seems like pickling/unpickling of the DeviceMesh occurred before the PGs were created. This will cause PG lookup to fail when torch.compile is enabled

Repro:

import os
import torch
import torch.multiprocessing as mp
from torch.distributed.device_mesh import init_device_mesh
from torch.distributed.tensor import Shard, distribute_tensor
import torch.distributed.checkpoint as dcp


def main(rank):
    os.environ["LOCAL_RANK"] = str(rank)
    torch.distributed.init_process_group("cpu:gloo,cuda:nccl", init_method="tcp://localhost:52000", world_size=8, rank=rank)
    device = torch.device("cuda", rank)
    mesh = init_device_mesh("cuda", mesh_shape=(2, 4), mesh_dim_names=["dim0", "dim1"])
    tensor = torch.rand(16, 16, device=device)
    dtensor = distribute_tensor(tensor, mesh, placements=(Shard(0), Shard(1)))
    state_dict = {"dtensor": dtensor}
    future = dcp.async_save(
        state_dict,
        checkpoint_id="/tmp/checkpoints/debug",
        async_checkpointer_type=dcp.state_dict_saver.AsyncCheckpointerType.PROCESS,
    )
    future.result()
    torch.distributed.destroy_process_group()


if __name__ == "__main__":
    mp.spawn(main, nprocs=8)

Further notes:

Warning appears multiple times per DTensor in the state dict
The commit that introduced the warning: https://github.com/pytorch/pytorch/commit/7c747a71f2670b7bfb3e3cc03d56bb249d0a03b6
Warning is emitted here because the DTensor is unpickled after sending it to the checkpointer subprocess. The subprocess does not have the PGs set up, understandably so.
False positive because the checkpointer subprocess does not look up PGs nor run torch.compile.
Suppressing the warning is possible but is somewhat non-trivial for end user since logger is used in subprocess

Suggestions for possible fixes:

Filter the warning explicitly in the async checkpointer subprocess
Delay the warning to when torch.compile is actually used

Generally, in my opinion PyTorch should avoid warnings that get emitted at a time when it is not even known the user is going to use a problematic combination of features.

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @weifengpy @LucasLLC @pradeepfn @angelayi

Versions

Collecting environment information... PyTorch version: 2.11.0+cu130 Is debug build: False CUDA used to build PyTorch: 13.0 ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.4 LTS (x86_64) GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 Clang version: Could not collect CMake version: version 4.2.0 Libc version: glibc-2.39

Python version: 3.12.3 (main, Mar 23 2026, 19:04:32) [GCC 13.3.0] (64-bit runtime) Python platform: Linux-6.12.58-82.121.amzn2023.x86_64-x86_64-with-glibc2.39 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: GPU models and configuration: GPU 0: NVIDIA H200 GPU 1: NVIDIA H200 GPU 2: NVIDIA H200 GPU 3: NVIDIA H200 GPU 4: NVIDIA H200 GPU 5: NVIDIA H200 GPU 6: NVIDIA H200 GPU 7: NVIDIA H200

Nvidia driver version: 580.105.08 cuDNN version: Could not collect Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

Versions of relevant libraries: [pip3] Could not collect [conda] Could not collect

extent analysis

TL;DR

The warning can be mitigated by filtering it explicitly in the async checkpointer subprocess or delaying the warning to when torch.compile is actually used.

Guidance

The warning is a false positive because the checkpointer subprocess does not look up PGs nor run torch.compile.
To filter the warning, you can modify the logging configuration in the subprocess to ignore this specific warning.
Delaying the warning to when torch.compile is used would require modifying the PyTorch codebase to only emit the warning when necessary.
As a temporary workaround, you can suppress the warning by configuring the logging module to ignore warnings from the torch.distributed.device_mesh module.

Example

import logging
logging.getLogger('torch.distributed.device_mesh').setLevel(logging.ERROR)

This will suppress all warnings from the torch.distributed.device_mesh module, including the false positive warning.

Notes

The warning is emitted due to a change in the PyTorch codebase, and the suggested fixes require modifying either the PyTorch codebase or the logging configuration.
The issue is specific to the dcp.async_save function and the torch.distributed.device_mesh module.

Recommendation

Apply workaround: filtering or suppressing the warning, as modifying the PyTorch codebase may not be feasible for all users.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #logging issue #authentication issue #prompt issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix DeviceMesh pickling warning spam in DCP process [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

🐛 Describe the bug

Versions

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix DeviceMesh pickling warning spam in DCP process [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

🐛 Describe the bug

Versions

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING