pytorch - 💡(How to fix) Fix [DTensor] nn.GroupNorm with affine=False crashes [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

import torch from torch.distributed._tensor import Shard, distribute_tensor, init_device_mesh

torch.distributed.init_process_group("fake", rank=0, world_size=2) mesh = init_device_mesh("cpu", (2,)) dt = distribute_tensor(torch.randn(4, 4, 8, 8), mesh, [Shard(0)])

gn = torch.nn.GroupNorm(2, 4, affine=False) gn(dt) # RuntimeError: Expected Optional[Tensor] for 'weight' but got int

Fix Action

Fixed

Code Example

import torch
 from torch.distributed._tensor import Shard, distribute_tensor, init_device_mesh

 torch.distributed.init_process_group("fake", rank=0, world_size=2)
 mesh = init_device_mesh("cpu", (2,))
 dt = distribute_tensor(torch.randn(4, 4, 8, 8), mesh, [Shard(0)])

 gn = torch.nn.GroupNorm(2, 4, affine=False)
 gn(dt)  # RuntimeError: Expected Optional[Tensor] for 'weight' but got int
RAW_BUFFERClick to expand / collapse

While adding a backwards strategy to nn.GroupNorm , I found that the GroupNorm crashes when affine=False is passed in since it does not handle the None value properly.


 import torch
 from torch.distributed._tensor import Shard, distribute_tensor, init_device_mesh

 torch.distributed.init_process_group("fake", rank=0, world_size=2)
 mesh = init_device_mesh("cpu", (2,))
 dt = distribute_tensor(torch.randn(4, 4, 8, 8), mesh, [Shard(0)])

 gn = torch.nn.GroupNorm(2, 4, affine=False)
 gn(dt)  # RuntimeError: Expected Optional[Tensor] for 'weight' but got int

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @weifengpy @tianyu-l @XilunWu @SherlockNoMad @ppwwyyxx

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix [DTensor] nn.GroupNorm with affine=False crashes [1 pull requests]