pytorch - 💡(How to fix) Fix [DTensor] backward fails for cumprod, cummax, cummin [1 pull requests]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

backward fails with a mixed Tensor / DTensor error

Fix Action

Fixed

Code Example

import os
import sys

import torch
import torch.distributed as dist
from torch.distributed.tensor import Replicate, Shard, distribute_tensor, init_device_mesh

dist.init_process_group("nccl")
torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))
mesh = init_device_mesh("cuda", (dist.get_world_size(),))

op = sys.argv[1]

if op == "cumprod":
    x_local = torch.rand(12, 8, device="cuda").add_(0.1)
    x_local.select(0, x_local.size(0) // 2).zero_()
    x = distribute_tensor(x_local.requires_grad_(True), mesh, [Shard(0)])
    out = torch.cumprod(x, dim=0)
elif op == "masked_cumprod":
    x = distribute_tensor(
        torch.rand(12, 8, device="cuda").add_(0.1).requires_grad_(True),
        mesh,
        [Shard(0)],
    )
    row = torch.arange(12, device="cuda").unsqueeze(1)
    col = torch.arange(8, device="cuda").unsqueeze(0)
    mask = distribute_tensor((row + col) % 2 == 0, mesh, [Replicate()])
    out = torch.masked.cumprod(x, mask=mask, dim=0)
else:
    x = distribute_tensor(
        torch.randn(12, 8, device="cuda").requires_grad_(True),
        mesh,
        [Shard(0)],
    )
    out = getattr(torch, op)(x, dim=0)[0]

out.sum().backward()
print(f"[rank {dist.get_rank()}] {op} backward done", flush=True)

dist.destroy_process_group()
RAW_BUFFERClick to expand / collapse

Added pr fixing this

repro

import os
import sys

import torch
import torch.distributed as dist
from torch.distributed.tensor import Replicate, Shard, distribute_tensor, init_device_mesh

dist.init_process_group("nccl")
torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))
mesh = init_device_mesh("cuda", (dist.get_world_size(),))

op = sys.argv[1]

if op == "cumprod":
    x_local = torch.rand(12, 8, device="cuda").add_(0.1)
    x_local.select(0, x_local.size(0) // 2).zero_()
    x = distribute_tensor(x_local.requires_grad_(True), mesh, [Shard(0)])
    out = torch.cumprod(x, dim=0)
elif op == "masked_cumprod":
    x = distribute_tensor(
        torch.rand(12, 8, device="cuda").add_(0.1).requires_grad_(True),
        mesh,
        [Shard(0)],
    )
    row = torch.arange(12, device="cuda").unsqueeze(1)
    col = torch.arange(8, device="cuda").unsqueeze(0)
    mask = distribute_tensor((row + col) % 2 == 0, mesh, [Replicate()])
    out = torch.masked.cumprod(x, mask=mask, dim=0)
else:
    x = distribute_tensor(
        torch.randn(12, 8, device="cuda").requires_grad_(True),
        mesh,
        [Shard(0)],
    )
    out = getattr(torch, op)(x, dim=0)[0]

out.sum().backward()
print(f"[rank {dist.get_rank()}] {op} backward done", flush=True)

dist.destroy_process_group()

backward fails with a mixed Tensor / DTensor error

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @weifengpy @tianyu-l @XilunWu @SherlockNoMad @ppwwyyxx

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix [DTensor] backward fails for cumprod, cummax, cummin [1 pull requests]