pytorch - 💡(How to fix) Fix [DTensor] backward fails for cumprod, cummax, cummin [1 pull requests]

pytorch2026-05-27 12:53:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

backward fails with a mixed Tensor / DTensor error

Fix Action

Fixed

Fixed by PR: [DTensor] Fix backward support for cumprod, cummax, cummin (https://github.com/pytorch/pytorch/pull/185228)

Code Example

import os
import sys

import torch
import torch.distributed as dist
from torch.distributed.tensor import Replicate, Shard, distribute_tensor, init_device_mesh

dist.init_process_group("nccl")
torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))
mesh = init_device_mesh("cuda", (dist.get_world_size(),))

op = sys.argv[1]

if op == "cumprod":
    x_local = torch.rand(12, 8, device="cuda").add_(0.1)
    x_local.select(0, x_local.size(0) // 2).zero_()
    x = distribute_tensor(x_local.requires_grad_(True), mesh, [Shard(0)])
    out = torch.cumprod(x, dim=0)
elif op == "masked_cumprod":
    x = distribute_tensor(
        torch.rand(12, 8, device="cuda").add_(0.1).requires_grad_(True),
        mesh,
        [Shard(0)],
    )
    row = torch.arange(12, device="cuda").unsqueeze(1)
    col = torch.arange(8, device="cuda").unsqueeze(0)
    mask = distribute_tensor((row + col) % 2 == 0, mesh, [Replicate()])
    out = torch.masked.cumprod(x, mask=mask, dim=0)
else:
    x = distribute_tensor(
        torch.randn(12, 8, device="cuda").requires_grad_(True),
        mesh,
        [Shard(0)],
    )
    out = getattr(torch, op)(x, dim=0)[0]

out.sum().backward()
print(f"[rank {dist.get_rank()}] {op} backward done", flush=True)

dist.destroy_process_group()

RAW_BUFFERClick to expand / collapse

Added pr fixing this

repro

import os
import sys

import torch
import torch.distributed as dist
from torch.distributed.tensor import Replicate, Shard, distribute_tensor, init_device_mesh

dist.init_process_group("nccl")
torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))
mesh = init_device_mesh("cuda", (dist.get_world_size(),))

op = sys.argv[1]

if op == "cumprod":
    x_local = torch.rand(12, 8, device="cuda").add_(0.1)
    x_local.select(0, x_local.size(0) // 2).zero_()
    x = distribute_tensor(x_local.requires_grad_(True), mesh, [Shard(0)])
    out = torch.cumprod(x, dim=0)
elif op == "masked_cumprod":
    x = distribute_tensor(
        torch.rand(12, 8, device="cuda").add_(0.1).requires_grad_(True),
        mesh,
        [Shard(0)],
    )
    row = torch.arange(12, device="cuda").unsqueeze(1)
    col = torch.arange(8, device="cuda").unsqueeze(0)
    mask = distribute_tensor((row + col) % 2 == 0, mesh, [Replicate()])
    out = torch.masked.cumprod(x, mask=mask, dim=0)
else:
    x = distribute_tensor(
        torch.randn(12, 8, device="cuda").requires_grad_(True),
        mesh,
        [Shard(0)],
    )
    out = getattr(torch, op)(x, dim=0)[0]

out.sum().backward()
print(f"[rank {dist.get_rank()}] {op} backward done", flush=True)

dist.destroy_process_group()

backward fails with a mixed Tensor / DTensor error

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @weifengpy @tianyu-l @XilunWu @SherlockNoMad @ppwwyyxx

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering