pytorch - 💡(How to fix) Fix DeviceMesh pickling warning spam in DCP process [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#182102Fetched 2026-05-02 05:27:14
View on GitHub
Comments
0
Participants
1
Timeline
34
Reactions
0
Author
Participants
Timeline (top)
mentioned ×15subscribed ×15labeled ×4

Root Cause

  • Warning appears multiple times per DTensor in the state dict
  • The commit that introduced the warning: https://github.com/pytorch/pytorch/commit/7c747a71f2670b7bfb3e3cc03d56bb249d0a03b6
  • Warning is emitted here because the DTensor is unpickled after sending it to the checkpointer subprocess. The subprocess does not have the PGs set up, understandably so.
  • False positive because the checkpointer subprocess does not look up PGs nor run torch.compile.
  • Suppressing the warning is possible but is somewhat non-trivial for end user since logger is used in subprocess

Fix Action

Fix / Workaround

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 192 On-line CPU(s) list: 0-191 Vendor ID: AuthenticAMD BIOS Vendor ID: Advanced Micro Devices, Inc. Model name: AMD EPYC 7R13 Processor BIOS Model name: AMD EPYC 7R13 Processor CPU @ 2.6GHz BIOS CPU family: 107 CPU family: 25 Model: 1 Thread(s) per core: 2 Core(s) per socket: 48 Socket(s): 2 Stepping: 1 BogoMIPS: 5299.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save vaes vpclmulqdq rdpid Hypervisor vendor: KVM Virtualization type: full L1d cache: 3 MiB (96 instances) L1i cache: 3 MiB (96 instances) L2 cache: 48 MiB (96 instances) L3 cache: 384 MiB (12 instances) NUMA node(s): 4 NUMA node0 CPU(s): 0-23,96-119 NUMA node1 CPU(s): 24-47,120-143 NUMA node2 CPU(s): 48-71,144-167 NUMA node3 CPU(s): 72-95,168-191 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Mitigation; Safe RET Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsa: Mitigation; Clear CPU buffers Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected

Code Example

[rank1]:W0430 22:51:37.612000 1258290 torch/distributed/device_mesh.py:385] It seems like pickling/unpickling of the DeviceMesh occurred before the PGs were created. This will cause PG lookup to fail when torch.compile is enabled

---

import os
import torch
import torch.multiprocessing as mp
from torch.distributed.device_mesh import init_device_mesh
from torch.distributed.tensor import Shard, distribute_tensor
import torch.distributed.checkpoint as dcp


def main(rank):
    os.environ["LOCAL_RANK"] = str(rank)
    torch.distributed.init_process_group("cpu:gloo,cuda:nccl", init_method="tcp://localhost:52000", world_size=8, rank=rank)
    device = torch.device("cuda", rank)
    mesh = init_device_mesh("cuda", mesh_shape=(2, 4), mesh_dim_names=["dim0", "dim1"])
    tensor = torch.rand(16, 16, device=device)
    dtensor = distribute_tensor(tensor, mesh, placements=(Shard(0), Shard(1)))
    state_dict = {"dtensor": dtensor}
    future = dcp.async_save(
        state_dict,
        checkpoint_id="/tmp/checkpoints/debug",
        async_checkpointer_type=dcp.state_dict_saver.AsyncCheckpointerType.PROCESS,
    )
    future.result()
    torch.distributed.destroy_process_group()


if __name__ == "__main__":
    mp.spawn(main, nprocs=8)
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

When saving a checkpoint with dcp.async_save, a large, false-positive warning spam appears from DeviceMesh:

[rank1]:W0430 22:51:37.612000 1258290 torch/distributed/device_mesh.py:385] It seems like pickling/unpickling of the DeviceMesh occurred before the PGs were created. This will cause PG lookup to fail when torch.compile is enabled

Repro:

import os
import torch
import torch.multiprocessing as mp
from torch.distributed.device_mesh import init_device_mesh
from torch.distributed.tensor import Shard, distribute_tensor
import torch.distributed.checkpoint as dcp


def main(rank):
    os.environ["LOCAL_RANK"] = str(rank)
    torch.distributed.init_process_group("cpu:gloo,cuda:nccl", init_method="tcp://localhost:52000", world_size=8, rank=rank)
    device = torch.device("cuda", rank)
    mesh = init_device_mesh("cuda", mesh_shape=(2, 4), mesh_dim_names=["dim0", "dim1"])
    tensor = torch.rand(16, 16, device=device)
    dtensor = distribute_tensor(tensor, mesh, placements=(Shard(0), Shard(1)))
    state_dict = {"dtensor": dtensor}
    future = dcp.async_save(
        state_dict,
        checkpoint_id="/tmp/checkpoints/debug",
        async_checkpointer_type=dcp.state_dict_saver.AsyncCheckpointerType.PROCESS,
    )
    future.result()
    torch.distributed.destroy_process_group()


if __name__ == "__main__":
    mp.spawn(main, nprocs=8)

Further notes:

  • Warning appears multiple times per DTensor in the state dict
  • The commit that introduced the warning: https://github.com/pytorch/pytorch/commit/7c747a71f2670b7bfb3e3cc03d56bb249d0a03b6
  • Warning is emitted here because the DTensor is unpickled after sending it to the checkpointer subprocess. The subprocess does not have the PGs set up, understandably so.
  • False positive because the checkpointer subprocess does not look up PGs nor run torch.compile.
  • Suppressing the warning is possible but is somewhat non-trivial for end user since logger is used in subprocess

Suggestions for possible fixes:

  • Filter the warning explicitly in the async checkpointer subprocess
  • Delay the warning to when torch.compile is actually used

Generally, in my opinion PyTorch should avoid warnings that get emitted at a time when it is not even known the user is going to use a problematic combination of features.

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @weifengpy @LucasLLC @pradeepfn @angelayi

Versions

Collecting environment information... PyTorch version: 2.11.0+cu130 Is debug build: False CUDA used to build PyTorch: 13.0 ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.4 LTS (x86_64) GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 Clang version: Could not collect CMake version: version 4.2.0 Libc version: glibc-2.39

Python version: 3.12.3 (main, Mar 23 2026, 19:04:32) [GCC 13.3.0] (64-bit runtime) Python platform: Linux-6.12.58-82.121.amzn2023.x86_64-x86_64-with-glibc2.39 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: GPU models and configuration: GPU 0: NVIDIA H200 GPU 1: NVIDIA H200 GPU 2: NVIDIA H200 GPU 3: NVIDIA H200 GPU 4: NVIDIA H200 GPU 5: NVIDIA H200 GPU 6: NVIDIA H200 GPU 7: NVIDIA H200

Nvidia driver version: 580.105.08 cuDNN version: Could not collect Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 192 On-line CPU(s) list: 0-191 Vendor ID: AuthenticAMD BIOS Vendor ID: Advanced Micro Devices, Inc. Model name: AMD EPYC 7R13 Processor BIOS Model name: AMD EPYC 7R13 Processor CPU @ 2.6GHz BIOS CPU family: 107 CPU family: 25 Model: 1 Thread(s) per core: 2 Core(s) per socket: 48 Socket(s): 2 Stepping: 1 BogoMIPS: 5299.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save vaes vpclmulqdq rdpid Hypervisor vendor: KVM Virtualization type: full L1d cache: 3 MiB (96 instances) L1i cache: 3 MiB (96 instances) L2 cache: 48 MiB (96 instances) L3 cache: 384 MiB (12 instances) NUMA node(s): 4 NUMA node0 CPU(s): 0-23,96-119 NUMA node1 CPU(s): 24-47,120-143 NUMA node2 CPU(s): 48-71,144-167 NUMA node3 CPU(s): 72-95,168-191 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Mitigation; Safe RET Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsa: Mitigation; Clear CPU buffers Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected

Versions of relevant libraries: [pip3] Could not collect [conda] Could not collect

extent analysis

TL;DR

The warning can be mitigated by filtering it explicitly in the async checkpointer subprocess or delaying the warning to when torch.compile is actually used.

Guidance

  • The warning is a false positive because the checkpointer subprocess does not look up PGs nor run torch.compile.
  • To filter the warning, you can modify the logging configuration in the subprocess to ignore this specific warning.
  • Delaying the warning to when torch.compile is used would require modifying the PyTorch codebase to only emit the warning when necessary.
  • As a temporary workaround, you can suppress the warning by configuring the logging module to ignore warnings from the torch.distributed.device_mesh module.

Example

import logging
logging.getLogger('torch.distributed.device_mesh').setLevel(logging.ERROR)

This will suppress all warnings from the torch.distributed.device_mesh module, including the false positive warning.

Notes

  • The warning is emitted due to a change in the PyTorch codebase, and the suggested fixes require modifying either the PyTorch codebase or the logging configuration.
  • The issue is specific to the dcp.async_save function and the torch.distributed.device_mesh module.

Recommendation

Apply workaround: filtering or suppressing the warning, as modifying the PyTorch codebase may not be feasible for all users.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix DeviceMesh pickling warning spam in DCP process [1 participants]