pytorch - 💡(How to fix) Fix UNSTABLE trunk / linux-jammy-rocm-py3.10-mi355 / test (distributed) [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178884Fetched 2026-04-08 01:57:10
View on GitHub
Comments
2
Participants
2
Timeline
121
Reactions
0
Author
Timeline (top)
subscribed ×70mentioned ×37labeled ×7added_to_project_v2 ×2
RAW_BUFFERClick to expand / collapse

ROCm trunk distributed started timing out due to rocshmem tests. Moving to disabled while we work on fixing the issue.

cc @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @seemethere @malfet @pytorch/pytorch-dev-infra @mruberry

extent analysis

TL;DR

Disable the rocshmem tests to prevent timeouts in the ROCm trunk distributed environment.

Guidance

  • Identify the specific rocshmem tests causing the timeouts to understand the root cause of the issue.
  • Temporarily disable the problematic tests to prevent timeouts and allow the environment to function.
  • Notify the mentioned individuals (@sunway513, @jithunnair-amd, etc.) to collaborate on fixing the issue.
  • Consider setting up a separate environment to debug and fix the rocshmem tests without affecting the main ROCm trunk distributed environment.

Notes

The provided information lacks technical details about the tests and the environment, making it challenging to provide a more specific solution.

Recommendation

Apply workaround: Disabling the problematic tests is the most straightforward solution to prevent timeouts, allowing for further investigation and debugging of the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix UNSTABLE trunk / linux-jammy-rocm-py3.10-mi355 / test (distributed) [2 comments, 2 participants]