pytorch - ✅(Solved) Fix UNSTABLE rocm-nightly / linux-noble-rocm-nightly-py3.12-gfx942 / test (default) [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181647Fetched 2026-04-28 06:24:07
View on GitHub
Comments
1
Participants
2
Timeline
77
Reactions
0
Timeline (top)
subscribed ×46mentioned ×24labeled ×5added_to_project_v2 ×1

PR fix notes

PR #168377: [ROCm][CI] Build theRock with pytorch nightly

Description (problem / solution / changelog)

This PR aims to build the PyTorch nightly branch against theRock nightly ROCm packages. A new rocm-nightly.yml workflow was added to run CI builds targeting gfx942 (MI300). In the future we will add MI355 support. TheRock installs via wheels, so the traditional /opt/rocm/ installation path is no longer valid. We will have a followup PR to enable testing with TheRock https://github.com/pytorch/pytorch/pull/170105

NOTE: PyTorch is built without Magma support for nightly ROCm as of this PR.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @EikanWang @jgong5 @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela @xmfan @dllehr-amd

Changed files

  • .ci/docker/build.sh (modified, +19/-1)
  • .ci/docker/common/install_amdsmi.sh (modified, +9/-1)
  • .ci/docker/common/install_cache.sh (modified, +39/-39)
  • .ci/docker/common/install_rocm.sh (modified, +100/-0)
  • .ci/docker/common/install_ucc.sh (modified, +12/-2)
  • .ci/docker/ubuntu-rocm/Dockerfile (modified, +7/-9)
  • .ci/pytorch/build.sh (modified, +11/-6)
  • .ci/pytorch/common.sh (modified, +6/-0)
  • .ci/pytorch/common_utils.sh (modified, +0/-3)
  • .github/workflows/docker-builds.yml (modified, +1/-0)
  • .github/workflows/rocm-nightly.yml (added, +35/-0)
  • cmake/Dependencies.cmake (modified, +6/-0)
RAW_BUFFERClick to expand / collapse

Default config testing for rocm-nightly enabled as of https://github.com/pytorch/pytorch/pull/168377 Marking rocm-nightly workflow jobs as unstable until we resolve failures

cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @seemethere @malfet @pytorch/pytorch-dev-infra

extent analysis

TL;DR

Investigate and resolve the failures in the rocm-nightly workflow jobs to stabilize them.

Guidance

  • Review the changes introduced in the pull request https://github.com/pytorch/pytorch/pull/168377 to understand the potential causes of the failures.
  • Check the workflow job logs for specific error messages or patterns that could indicate the root cause of the instability.
  • Collaborate with the mentioned team members and stakeholders to gather more information and expertise on resolving the issues.

Notes

The issue lacks specific technical details about the failures, so a more detailed analysis of the workflow job logs and collaboration with the team are necessary to provide a more accurate fix.

Recommendation

Apply workaround: Investigate and address the failures in the rocm-nightly workflow jobs to stabilize them, as the exact cause is not specified in the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix UNSTABLE rocm-nightly / linux-noble-rocm-nightly-py3.12-gfx942 / test (default) [2 pull requests, 1 comments, 2 participants]