pytorch - ✅(Solved) Fix [ROCm][Release Blocker] All ROCm 7.1/7.2 nightly builds broken since March 29 [7 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#179524Fetched 2026-04-08 03:00:38
View on GitHub
Comments
1
Participants
2
Timeline
64
Reactions
0
Author
Participants
Timeline (top)
subscribed ×31mentioned ×19labeled ×8added_to_project_v2 ×2

All ROCm nightly builds (both ROCm 7.1 and ROCm 7.2) have been broken since ~March 29, 2026. No ROCm nightly wheels have been published for over a week. This is a release blocker for PyTorch 2.12.0 — we cannot cut a ROCm release if the nightly build pipeline is red.

Two distinct failure modes are active:

  1. linux-binary-manywheel ROCm builds: All 14 ROCm build jobs fail at the "Build PyTorch binary" step (compilation failure). Consistently broken
    April 4–6, with intermittent failures March 29–April 2.
  2. rocm-nightly workflow: startup_failure March 29–31, then cancelled every day April 1–6. The linux-noble-rocm-nightly-py3.12-gfx942 build is never reached.

Failure Timeline

  | Date | `linux-binary-manywheel` (ROCm jobs) | `rocm-nightly` workflow |                                                                               
  |------|--------------------------------------|------------------------|
  | March 28 | All green | Success |                                                                                                                      
  | March 29–31 | All 14 ROCm builds FAIL | startup_failure |                                                                                             
  | April 1 | All green (after reverts) | cancelled |                                                                                                     
  | April 2 | All 14 ROCm builds FAIL | cancelled |                                                                                                       
  | April 3 | All green (after reverts) | cancelled |                                                                                                     
  | April 4–6 | **All 14 ROCm builds FAIL** | cancelled |

Root Causes

Manywheel build failures (April 4–present)

The primary cause is Enforce C++20 minimum in CMake build files (#178662) which landed in the April 4 nightly. ROCm's clang is not picking up devtoolset's gcc-toolchain, so C++20 headers are not found during compilation.

Error:

2026-04-06T08:51:49.0855345Z FAILED: [code=1] c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o 
2026-04-06T08:51:49.0861142Z cd /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl && /opt/_internal/cpython-3.10.20/lib/python3.10/site-packages/cmake/data/bin/cmake -E make_directory /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/. && /opt/_internal/cpython-3.10.20/lib/python3.10/site-packages/cmake/data/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/./c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o -P /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o.cmake
2026-04-06T08:51:49.0865607Z In file included from /pytorch/c10/hip/test/impl/HIPAssertionsTest_1_var_test.hip:3:
2026-04-06T08:51:49.0866519Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/gmock.h:56:
2026-04-06T08:51:49.0867511Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/gmock-actions.h:146:
2026-04-06T08:51:49.0868633Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/internal/gmock-internal-utils.h:50:
2026-04-06T08:51:49.0869871Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/internal/gmock-port.h:58:
2026-04-06T08:51:49.0871065Z /pytorch/cmake/../third_party/googletest/googletest/include/gtest/internal/gtest-port.h:291:10: fatal error: 'version' file not found
2026-04-06T08:51:49.0871985Z   291 | #include <version>  // C++20 or <version> support.
2026-04-06T08:51:49.0872411Z       |          ^~~~~~~~~
2026-04-06T08:51:49.0872731Z 1 error generated when compiling for host.

Previous (March 29–April 2) intermittent failures were caused by:

  • [ROCm] Add hipDNN backend support for convolution (#178515) — reverted
  • [ROCm] amdgcnspirv support (#172559) — reverted, relanded, reverted again
  • [CD] Update CMake version in Dockerfile (#179052) — CMake distribution removed upstream

rocm-nightly workflow failures (March 29–present)

TheRock nightly tarball changes introduced new dependencies (libdrm, liblzma via pkg-config) that are missing from the Docker image. See
#179009 for details:
CMake Error: The following required packages were not found: - libdrm /usr/bin/ld: warning: librocm_sysdeps_liblzma.so.5, needed by libaotriton_v2.so, not found

Failing Jobs (April 6 nightly)

All 14 ROCm manywheel builds fail — 0% pass rate:

  • manywheel-py3_10-rocm7_1-build
  • manywheel-py3_10-rocm7_2-build
  • manywheel-py3_11-rocm7_1-build
  • manywheel-py3_11-rocm7_2-build
  • manywheel-py3_12-rocm7_1-build
  • manywheel-py3_12-rocm7_2-build
  • manywheel-py3_13-rocm7_1-build
  • manywheel-py3_13-rocm7_2-build
  • manywheel-py3_13t-rocm7_1-build
  • manywheel-py3_13t-rocm7_2-build
  • manywheel-py3_14-rocm7_1-build
  • manywheel-py3_14-rocm7_2-build
  • manywheel-py3_14t-rocm7_1-build
  • manywheel-py3_14t-rocm7_2-build

Example failed run: https://github.com/pytorch/pytorch/actions/runs/24024207448

HUD view: https://hud.pytorch.org/hud/pytorch/pytorch/nightly/1?per_page=50&name_filter=rocm7

Fix Attempts (as of April 6)

The following fixes have landed on main but are not yet in a nightly release:

  • #179504[ROCm][CI] set --gcc-toolchain path for manywheel build (addresses C++20/devtoolset issue)
  • #179353[ROCm][CI] additional wheel dependencies (libdw.so.1, libhsa-amd-aqlprofile64.so)
  • Revert "[ROCm] amdgcnspirv support (#172559)" — reverted again on April 6

Still open / not merged:

  • #179517[DO NOT MERGE][Fix][ROCm][CI] Install libtbb-dev in ROCm CI image
  • #179274[ROCM][CI] Add build_env check for rocm
  • #179035[ROCm][CI] Refactor ROCm installation script

Action Items for AMD Team

  1. [P0] Verify that the April 7 nightly (which includes #179504 and #179353) restores green manywheel builds for both ROCm 7.1 and 7.2.
  2. [P0] Fix the rocm-nightly workflow — Docker image needs libdrm-dev, liblzma-dev (pkg-config discoverable), and correct paths for TheRock nightly tarballs.
  3. [P0] Land #179517 (libtbb-dev) if still needed for torchbench/FBGEMM builds.
  4. [P1] Stabilize amdgcnspirv support (#172559) — it has been landed and reverted 3 times. It should not be relanded until it passes the full
    nightly matrix.
  5. [P1] Re-evaluate hipDNN convolution backend (#178515) — also reverted due to build breakage.
  6. [P1] Add CI gating to prevent ROCm-breaking changes from landing without ROCm build validation.

Impact

  • No ROCm nightly wheels available for users since ~March 28
  • Release blocker for PyTorch 2.12.0 — ROCm release candidates cannot be built
  • ROCm CI test coverage gaprocm-nightly test workflow has not run in over a week

cc @ezyang @gchanan @kadeng @msaroufim @seemethere @malfet @tinglvv @nWEIdia @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @pytorch/pytorch-dev-infra

Error Message

2026-04-06T08:51:49.0855345Z FAILED: [code=1] c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o 2026-04-06T08:51:49.0861142Z cd /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl && /opt/_internal/cpython-3.10.20/lib/python3.10/site-packages/cmake/data/bin/cmake -E make_directory /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/. && /opt/_internal/cpython-3.10.20/lib/python3.10/site-packages/cmake/data/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/./c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o -P /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o.cmake 2026-04-06T08:51:49.0865607Z In file included from /pytorch/c10/hip/test/impl/HIPAssertionsTest_1_var_test.hip:3: 2026-04-06T08:51:49.0866519Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/gmock.h:56: 2026-04-06T08:51:49.0867511Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/gmock-actions.h:146: 2026-04-06T08:51:49.0868633Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/internal/gmock-internal-utils.h:50: 2026-04-06T08:51:49.0869871Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/internal/gmock-port.h:58: 2026-04-06T08:51:49.0871065Z /pytorch/cmake/../third_party/googletest/googletest/include/gtest/internal/gtest-port.h:291:10: fatal error: 'version' file not found 2026-04-06T08:51:49.0871985Z 291 | #include <version> // C++20 or <version> support. 2026-04-06T08:51:49.0872411Z | ^~~~~~~~~ 2026-04-06T08:51:49.0872731Z 1 error generated when compiling for host.

Root Cause

Root Causes

PR fix notes

PR #178662: Enforce C++20 minimum in CMake build files (#178662)

Description (problem / solution / changelog)

Summary:

Update all CMake files in fbcode/caffe2 that hardcoded C++ standard versions earlier than C++20, which caused build failures when the C++20 header guard enforcement (D95101335) was applied.

Authored with Claude.

Test Plan: Sandcastle

Changed files

  • android/pytorch_android/CMakeLists.txt (modified, +1/-1)
  • android/pytorch_android_torchvision/CMakeLists.txt (modified, +1/-1)
  • aten/src/ATen/nnapi/CMakeLists.txt (modified, +1/-1)
  • aten/src/ATen/test/test_install/CMakeLists.txt (modified, +1/-4)
  • cmake/Dependencies.cmake (modified, +1/-1)

PR #179009: [ROCm][CI] adjust paths for rocm-nightly Docker image

Description (problem / solution / changelog)

TheRock nightly tarballs now require libdrm to be discoverable via pkg-config at cmake configure time, due to ROCm/rocm-systems@63f78a98d7c2 which added find_dependency(PkgConfig) and pkg_check_modules(REQUIRED libdrm) to rocm_smi-config.cmake.in (fixing ROCm/TheRock#3702). This was not caught until the Apr 1 scheduled run (https://github.com/pytorch/pytorch/actions/runs/23826393469) because prior runs used a cached Docker image that predated the change.

The build failed with two errors:

  1. CMake configure could not find libdrm or liblzma via pkg-config:
CMake Error at FindPkgConfig.cmake:645: The following required packages were not found: - libdrm
CMake Error at FindPkgConfig.cmake:938: None of the required 'liblzma' found
  1. The linker could not find the bundled rocm_sysdeps libraries needed by aotriton:
/usr/bin/ld: warning: librocm_sysdeps_liblzma.so.5, needed by libaotriton_v2.so, not found
undefined reference to `lzma_stream_decoder@AMDROCM_SYSDEPS_1.0'

The TheRock tarball bundles these dependencies under lib/rocm_sysdeps/ with their own .pc files and shared libraries. This PR adds rocm_sysdeps/lib to LD_LIBRARY_PATH and its pkgconfig directory to PKG_CONFIG_PATH in rocm_env.sh so both cmake and the linker discover them without requiring system -dev packages. build and tests pass successfully: https://hud.pytorch.org/pr/179009 <img width="475" height="288" alt="{A430D67B-D525-48FA-9B09-3B5C36E96FCA}" src="https://github.com/user-attachments/assets/64cb3c9b-c05a-4f27-ba3b-490a6f3867eb" />

The test failure was seen also on main: FAILED [0.5276s] inductor/test_torchinductor_opinfo_properties.py::TestOpInfoPropertiesCUDA::test_binary_ufunc_numerical_fmod_backend_inductor_default_cuda_bfloat16 - AssertionError: XPASS: fmod/inductor_default/torch.bfloat16 - remove from binary_numerical xfails https://github.com/pytorch/pytorch/actions/runs/23928615757/job/69793171359

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang

Changed files

  • .ci/docker/common/install_rocm.sh (modified, +7/-0)

PR #179504: [ROCm][CI] set --gcc-toolchain path for manywheel build

Description (problem / solution / changelog)

Exposed by recent c++20 changes, devtoolset was not getting picked up by ROCm's clang. Set --gcc-toolchain path correctly.

cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang

Changed files

  • .ci/docker/manywheel/Dockerfile_2_28 (modified, +2/-1)

PR #179353: [ROCm][CI] additional wheel dependencies

Description (problem / solution / changelog)

Dependencies introduced by librocprofiler-sdk.so

  • libdw.so.1
  • libhsa-amd-aqlprofile64.so

cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang

Changed files

  • .ci/manywheel/build_rocm.sh (modified, +4/-0)

PR #179517: [ROCm][CI] Install libtbb-dev in ROCm CI image

Description (problem / solution / changelog)

Restore TBB explicitly in the ROCm Docker image so ROCm torchbench jobs can build the pinned FBGEMM dependency after the old vision/OpenCV path stopped providing it transitively.

Made-with: Cursor

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @pragupta @jerrymannil @xinyazhang

Changed files

  • .ci/docker/ubuntu-rocm/Dockerfile (modified, +1/-0)

PR #179274: [ROCM][CI] Add build_env check for rocm

Description (problem / solution / changelog)

https://github.com/pytorch/pytorch/commit/1c6eb4d746e8eb769f60fc9071ffa1ad6cccb7db This commit appears to be causing failures with persistent runners. The _work files end up in a state where they can't be deleted, which causes the following jobs to fail like the following one: https://github.com/pytorch/pytorch/actions/runs/23372593644/job/68000498120

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang

Changed files

  • .ci/pytorch/test.sh (modified, +2/-1)

PR #179035: [ROCm][CI] Refactor ROCm installation script

Description (problem / solution / changelog)

cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang

Changed files

  • .ci/docker/common/install_rocm.sh (modified, +1/-116)

Code Example

| Date | `linux-binary-manywheel` (ROCm jobs) | `rocm-nightly` workflow |                                                                               
  |------|--------------------------------------|------------------------|
  | March 28 | All green | Success |                                                                                                                      
  | March 2931 | All 14 ROCm builds FAIL | startup_failure |                                                                                             
  | April 1 | All green (after reverts) | cancelled |                                                                                                     
  | April 2 | All 14 ROCm builds FAIL | cancelled |                                                                                                       
  | April 3 | All green (after reverts) | cancelled |                                                                                                     
  | April 46 | **All 14 ROCm builds FAIL** | cancelled |

---

2026-04-06T08:51:49.0855345Z FAILED: [code=1] c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o 
2026-04-06T08:51:49.0861142Z cd /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl && /opt/_internal/cpython-3.10.20/lib/python3.10/site-packages/cmake/data/bin/cmake -E make_directory /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/. && /opt/_internal/cpython-3.10.20/lib/python3.10/site-packages/cmake/data/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/./c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o -P /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o.cmake
2026-04-06T08:51:49.0865607Z In file included from /pytorch/c10/hip/test/impl/HIPAssertionsTest_1_var_test.hip:3:
2026-04-06T08:51:49.0866519Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/gmock.h:56:
2026-04-06T08:51:49.0867511Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/gmock-actions.h:146:
2026-04-06T08:51:49.0868633Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/internal/gmock-internal-utils.h:50:
2026-04-06T08:51:49.0869871Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/internal/gmock-port.h:58:
2026-04-06T08:51:49.0871065Z /pytorch/cmake/../third_party/googletest/googletest/include/gtest/internal/gtest-port.h:291:10: fatal error: 'version' file not found
2026-04-06T08:51:49.0871985Z   291 | #include <version>  // C++20 or <version> support.
2026-04-06T08:51:49.0872411Z       |          ^~~~~~~~~
2026-04-06T08:51:49.0872731Z 1 error generated when compiling for host.
RAW_BUFFERClick to expand / collapse

Summary

All ROCm nightly builds (both ROCm 7.1 and ROCm 7.2) have been broken since ~March 29, 2026. No ROCm nightly wheels have been published for over a week. This is a release blocker for PyTorch 2.12.0 — we cannot cut a ROCm release if the nightly build pipeline is red.

Two distinct failure modes are active:

  1. linux-binary-manywheel ROCm builds: All 14 ROCm build jobs fail at the "Build PyTorch binary" step (compilation failure). Consistently broken
    April 4–6, with intermittent failures March 29–April 2.
  2. rocm-nightly workflow: startup_failure March 29–31, then cancelled every day April 1–6. The linux-noble-rocm-nightly-py3.12-gfx942 build is never reached.

Failure Timeline

  | Date | `linux-binary-manywheel` (ROCm jobs) | `rocm-nightly` workflow |                                                                               
  |------|--------------------------------------|------------------------|
  | March 28 | All green | Success |                                                                                                                      
  | March 29–31 | All 14 ROCm builds FAIL | startup_failure |                                                                                             
  | April 1 | All green (after reverts) | cancelled |                                                                                                     
  | April 2 | All 14 ROCm builds FAIL | cancelled |                                                                                                       
  | April 3 | All green (after reverts) | cancelled |                                                                                                     
  | April 4–6 | **All 14 ROCm builds FAIL** | cancelled |

Root Causes

Manywheel build failures (April 4–present)

The primary cause is Enforce C++20 minimum in CMake build files (#178662) which landed in the April 4 nightly. ROCm's clang is not picking up devtoolset's gcc-toolchain, so C++20 headers are not found during compilation.

Error:

2026-04-06T08:51:49.0855345Z FAILED: [code=1] c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o 
2026-04-06T08:51:49.0861142Z cd /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl && /opt/_internal/cpython-3.10.20/lib/python3.10/site-packages/cmake/data/bin/cmake -E make_directory /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/. && /opt/_internal/cpython-3.10.20/lib/python3.10/site-packages/cmake/data/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/./c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o -P /pytorch/build/c10/hip/test/CMakeFiles/c10_cuda_HIPAssertionsTest_1_var_test.dir/impl/c10_cuda_HIPAssertionsTest_1_var_test_generated_HIPAssertionsTest_1_var_test.hip.o.cmake
2026-04-06T08:51:49.0865607Z In file included from /pytorch/c10/hip/test/impl/HIPAssertionsTest_1_var_test.hip:3:
2026-04-06T08:51:49.0866519Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/gmock.h:56:
2026-04-06T08:51:49.0867511Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/gmock-actions.h:146:
2026-04-06T08:51:49.0868633Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/internal/gmock-internal-utils.h:50:
2026-04-06T08:51:49.0869871Z In file included from /pytorch/cmake/../third_party/googletest/googlemock/include/gmock/internal/gmock-port.h:58:
2026-04-06T08:51:49.0871065Z /pytorch/cmake/../third_party/googletest/googletest/include/gtest/internal/gtest-port.h:291:10: fatal error: 'version' file not found
2026-04-06T08:51:49.0871985Z   291 | #include <version>  // C++20 or <version> support.
2026-04-06T08:51:49.0872411Z       |          ^~~~~~~~~
2026-04-06T08:51:49.0872731Z 1 error generated when compiling for host.

Previous (March 29–April 2) intermittent failures were caused by:

  • [ROCm] Add hipDNN backend support for convolution (#178515) — reverted
  • [ROCm] amdgcnspirv support (#172559) — reverted, relanded, reverted again
  • [CD] Update CMake version in Dockerfile (#179052) — CMake distribution removed upstream

rocm-nightly workflow failures (March 29–present)

TheRock nightly tarball changes introduced new dependencies (libdrm, liblzma via pkg-config) that are missing from the Docker image. See
#179009 for details:
CMake Error: The following required packages were not found: - libdrm /usr/bin/ld: warning: librocm_sysdeps_liblzma.so.5, needed by libaotriton_v2.so, not found

Failing Jobs (April 6 nightly)

All 14 ROCm manywheel builds fail — 0% pass rate:

  • manywheel-py3_10-rocm7_1-build
  • manywheel-py3_10-rocm7_2-build
  • manywheel-py3_11-rocm7_1-build
  • manywheel-py3_11-rocm7_2-build
  • manywheel-py3_12-rocm7_1-build
  • manywheel-py3_12-rocm7_2-build
  • manywheel-py3_13-rocm7_1-build
  • manywheel-py3_13-rocm7_2-build
  • manywheel-py3_13t-rocm7_1-build
  • manywheel-py3_13t-rocm7_2-build
  • manywheel-py3_14-rocm7_1-build
  • manywheel-py3_14-rocm7_2-build
  • manywheel-py3_14t-rocm7_1-build
  • manywheel-py3_14t-rocm7_2-build

Example failed run: https://github.com/pytorch/pytorch/actions/runs/24024207448

HUD view: https://hud.pytorch.org/hud/pytorch/pytorch/nightly/1?per_page=50&name_filter=rocm7

Fix Attempts (as of April 6)

The following fixes have landed on main but are not yet in a nightly release:

  • #179504[ROCm][CI] set --gcc-toolchain path for manywheel build (addresses C++20/devtoolset issue)
  • #179353[ROCm][CI] additional wheel dependencies (libdw.so.1, libhsa-amd-aqlprofile64.so)
  • Revert "[ROCm] amdgcnspirv support (#172559)" — reverted again on April 6

Still open / not merged:

  • #179517[DO NOT MERGE][Fix][ROCm][CI] Install libtbb-dev in ROCm CI image
  • #179274[ROCM][CI] Add build_env check for rocm
  • #179035[ROCm][CI] Refactor ROCm installation script

Action Items for AMD Team

  1. [P0] Verify that the April 7 nightly (which includes #179504 and #179353) restores green manywheel builds for both ROCm 7.1 and 7.2.
  2. [P0] Fix the rocm-nightly workflow — Docker image needs libdrm-dev, liblzma-dev (pkg-config discoverable), and correct paths for TheRock nightly tarballs.
  3. [P0] Land #179517 (libtbb-dev) if still needed for torchbench/FBGEMM builds.
  4. [P1] Stabilize amdgcnspirv support (#172559) — it has been landed and reverted 3 times. It should not be relanded until it passes the full
    nightly matrix.
  5. [P1] Re-evaluate hipDNN convolution backend (#178515) — also reverted due to build breakage.
  6. [P1] Add CI gating to prevent ROCm-breaking changes from landing without ROCm build validation.

Impact

  • No ROCm nightly wheels available for users since ~March 28
  • Release blocker for PyTorch 2.12.0 — ROCm release candidates cannot be built
  • ROCm CI test coverage gaprocm-nightly test workflow has not run in over a week

cc @ezyang @gchanan @kadeng @msaroufim @seemethere @malfet @tinglvv @nWEIdia @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @pytorch/pytorch-dev-infra

extent analysis

TL;DR

The most likely fix for the broken ROCm nightly builds is to apply the fixes that have landed on main but are not yet in a nightly release, including setting the --gcc-toolchain path for manywheel build and adding additional wheel dependencies.

Guidance

  • Verify that the April 7 nightly, which includes fixes #179504 and #179353, restores green manywheel builds for both ROCm 7.1 and 7.2.
  • Fix the rocm-nightly workflow by adding libdrm-dev, liblzma-dev (pkg-config discoverable), and correct paths for TheRock nightly tarballs to the Docker image.
  • Land #179517 (libtbb-dev) if still needed for torchbench/FBGEMM builds.
  • Stabilize amdgcnspirv support (#172559) and re-evaluate hipDNN convolution backend (#178515) to prevent future build breakage.

Example

No code snippet is provided as the issue is related to build and CI configuration.

Notes

The fixes mentioned in the issue are not yet in a nightly release, so applying them to the current build process may resolve the issue. However, it's essential to verify that these fixes work as expected and do not introduce new problems.

Recommendation

Apply the workaround by including the fixes #179504 and #179353 in the nightly build process, as they address the C++20/devtoolset issue and add additional wheel dependencies, respectively.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING