pytorch - 💡(How to fix) Fix `_common_pointwise_single_dim_strategy` should include `Partial()` as a valid output placement for unary ops

pytorch2026-05-11 20:29:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

This means that unary pointwise ops like convert_element_type (dtype cast) cannot preserve a Partial() input placement on their output. If a tensor has placement (Partial(), Partial()) and passes through a dtype cast, the DTensor op strategy forces a reduce before the cast because Partial() is not among the valid output placements.

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

_common_pointwise_single_dim_strategy in torch/distributed/tensor/_ops/_pointwise_ops.py only generates Shard(i) placements as output strategies for pointwise ops. It never generates Partial() as a valid output placement.

This is semantically incorrect — casting a partial-sum tensor from bf16 to f32 produces a valid f32 partial-sum tensor. The reduction should be deferrable past the cast.

Impact in practice: In AutoParallel's sharding optimizer for LLaMA-3 8B on a 2D mesh (DP=8, TP=8), backward weight gradients are P(sum)P(sum) in bf16 and need to reach S(0)S(0) in f32 (after a dtype cast for mixed-precision training). The ideal path is: cast bf16→f32 first, then do a single fused reduce-scatter in f32 for numerical accuracy. But because Partial() can't pass through the dtype cast node, the optimizer is forced to split the reduction: one reduce-scatter in bf16 before the cast, one in f32 after. This prevents the reduction from being fused into a single collective and forces part of the gradient reduction to happen in the lower-precision dtype.

Versions

PyTorch 2.13.0.dev20260509+cu130

cc @wanchaol @tianyu-l @wz337 @XilunWu @d4l3k @pragupta @SherlockNoMad @ppwwyyxx @weifengpy

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#mixed precision #index setup #retrieval issue #search optimization #API routing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix `_common_pointwise_single_dim_strategy` should include `Partial()` as a valid output placement for unary ops

Recommended Tools

GitHub issue graph ai analysis

Root Cause

🐛 Describe the bug

Versions

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix `_common_pointwise_single_dim_strategy` should include `Partial()` as a valid output placement for unary ops

Recommended Tools

GitHub issue graph ai analysis

Root Cause

🐛 Describe the bug

Versions

Still need to ship something?

RELATED_DISCOVERY

TRENDING