pytorch - ✅(Solved) Fix [DTensor] Dead double-shard validation in propagate_shape_and_sharding [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#177972Fetched 2026-04-08 01:07:45
View on GitHub
Comments
0
Participants
1
Timeline
49
Reactions
0
Participants
Assignees
Timeline (top)
mentioned ×16subscribed ×16referenced ×10labeled ×3

Fix Action

Fixed

PR fix notes

PR #177973: [DTensor] Fix double-shard validation in propagate_shape_and_sharding

Description (problem / solution / changelog)

Fixes https://github.com/pytorch/pytorch/issues/177972.

shard.dim == in_dim compares an int (Shard.dim) to an InputDim dataclass, which is always False. This makes the [Shard(0), Shard(0)] double-sharding submesh_size calculation dead code in the Split handler. Fix: compare against in_dim.input_dim. This activates previously-dead validation that correctly rejects incompatible double-sharding configs (e.g. reshape (12,)→(3,4) with [Shard(0), Shard(0)] on mesh (2,3)) which were previously silently producing incorrect sharding.

Also remove trailing commas so that the error messages are treated as strings and not tuples.

Changed files

  • test/distributed/tensor/test_view_ops.py (modified, +25/-0)
  • torch/distributed/tensor/_ops/_view_ops.py (modified, +8/-5)

Code Example

# Silently returns invalid sharding instead of erroring
from torch.distributed.tensor._ops._view_ops import propagate_shape_and_sharding, dim_maps
from torch.distributed.tensor.placement_types import Shard
import torch

propagate_shape_and_sharding(
    [Shard(0), Shard(0)], (12,),
    dim_maps[torch.Tensor.view](torch.empty(12), [3, 4]),
    (2, 3),  # split dim size 3 not divisible by submesh 2*3=6
)
RAW_BUFFERClick to expand / collapse

shard.dim == in_dim compares int to InputDim dataclass — always False. Makes the [Shard(0), Shard(0)] submesh check dead code, silently accepting invalid configs.

https://github.com/pytorch/pytorch/blob/d428a3f9c9e/torch/distributed/tensor/_ops/_view_ops.py#L676

# Silently returns invalid sharding instead of erroring
from torch.distributed.tensor._ops._view_ops import propagate_shape_and_sharding, dim_maps
from torch.distributed.tensor.placement_types import Shard
import torch

propagate_shape_and_sharding(
    [Shard(0), Shard(0)], (12,),
    dim_maps[torch.Tensor.view](torch.empty(12), [3, 4]),
    (2, 3),  # split dim size 3 not divisible by submesh 2*3=6
)

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @xmfan @tianyu-l @XilunWu @SherlockNoMad @ppwwyyxx

extent analysis

Fix Plan

To fix the issue, we need to modify the comparison to handle the InputDim dataclass correctly. We can achieve this by accessing the size attribute of the InputDim object.

Code Changes

We need to update the comparison in the _view_ops.py file:

# Replace the comparison with the correct one
if shard.dim == in_dim.size:
    # rest of the code remains the same

Alternatively, we can add a method to the InputDim dataclass to make the comparison more explicit:

# Add a method to the InputDim dataclass
class InputDim:
    # existing code
    def matches(self, dim):
        return self.size == dim

# Update the comparison
if in_dim.matches(shard.dim):
    # rest of the code remains the same

Verification

To verify the fix, we can test the propagate_shape_and_sharding function with the same input as before:

propagate_shape_and_sharding(
    [Shard(0), Shard(0)], (12,),
    dim_maps[torch.Tensor.view](torch.empty(12), [3, 4]),
    (2, 3),  # split dim size 3 not divisible by submesh 2*3=6
)

This should now correctly error out instead of silently accepting the invalid config.

Extra Tips

Make sure to update the documentation and tests accordingly to reflect the changes made to the comparison logic. Additionally, consider adding more test cases to cover different scenarios and ensure the fix is robust.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING