pytorch - 💡(How to fix) Fix Add distributed_backend() hook to DeviceTypeTestBase [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fixed

Code Example

if "cuda" in device:
    return "nccl"
elif "hpu" in device:
    return "hccl"
elif "xpu" in device:
    return "xccl"
else:
    return "gloo"

---

class DeviceTypeTestBase(TestCase):
    @classmethod
    def distributed_backend(cls) -> str:
        import torch.distributed as dist
        return dist.get_default_backend_for_device(cls.device_type)
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

Motivation

Distributed tests currently contain many hardcoded mappings between device types and distributed backends, for example:

if "cuda" in device:
    return "nccl"
elif "hpu" in device:
    return "hccl"
elif "xpu" in device:
    return "xccl"
else:
    return "gloo"

These hardcoded mappings block out-of-tree backends from reusing distributed test infrastructure and existing distributed test cases.

Default distributed backend selection is fundamentally a device-type-level property, but today the logic is duplicated across distributed test helpers instead of being provided by DeviceTypeTestBase.

Proposal

Add a default distributed backend hook to DeviceTypeTestBase:

class DeviceTypeTestBase(TestCase):
    @classmethod
    def distributed_backend(cls) -> str:
        import torch.distributed as dist
        return dist.get_default_backend_for_device(cls.device_type)

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @weifengpy @fffrog

Alternatives

No response

Additional context

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix Add distributed_backend() hook to DeviceTypeTestBase [1 pull requests]