pytorch - 💡(How to fix) Fix [ROCm][CD] All nightly ROCm manywheel builds timing out at 420 minutes

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

All ROCm manywheel build jobs in the nightly linux-binary-manywheel workflow are hitting the 420-minute build timeout and getting cancelled. The 420 min limit was already raised from 240 in #179596 (see linux_binary_build_workflow.yml.j2:116), but builds are now exhausting even that.

This affects every Python x ROCm combination (py3.10/3.11/3.12/3.13/3.14/3.14t x rocm7.1 / rocm7.2).

Root Cause

All ROCm manywheel build jobs in the nightly linux-binary-manywheel workflow are hitting the 420-minute build timeout and getting cancelled. The 420 min limit was already raised from 240 in #179596 (see linux_binary_build_workflow.yml.j2:116), but builds are now exhausting even that.

This affects every Python x ROCm combination (py3.10/3.11/3.12/3.13/3.14/3.14t x rocm7.1 / rocm7.2).

RAW_BUFFERClick to expand / collapse

Summary

All ROCm manywheel build jobs in the nightly linux-binary-manywheel workflow are hitting the 420-minute build timeout and getting cancelled. The 420 min limit was already raised from 240 in #179596 (see linux_binary_build_workflow.yml.j2:116), but builds are now exhausting even that.

This affects every Python x ROCm combination (py3.10/3.11/3.12/3.13/3.14/3.14t x rocm7.1 / rocm7.2).

Evidence

Example run (2026-05-10 nightly): https://github.com/pytorch/pytorch/actions/runs/25623830506

All 12 ROCm build jobs cancelled at ~7h 0m wall-clock. Example: https://github.com/pytorch/pytorch/actions/runs/25623830506/job/75215213118 (manywheel-py3_14-rocm7_1-build / build, 08:20:07Z -> 15:20:55Z).

Prior nightly (run 25596200981) shows the same 12/12 cancellations, so this is not a one-off.

Downstream impact: no ROCm wheels are being produced for nightly, so ROCm test jobs and uploads cannot run.

Possible directions

  • Increase the build timeout in .github/templates/linux_binary_build_workflow.yml.j2 beyond 420 minutes (e.g. 480/600).
  • Investigate why ROCm builds have regressed in wall time - recent changes to gfx targets, ROCm version (7.1/7.2), sccache hit rate, or runner class.
  • Consider sharding/parallelizing the ROCm build.

cc @ezyang @gchanan @kadeng @msaroufim @seemethere @malfet @tinglvv @nWEIdia @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @atalman

/label "module: rocm" "module: binaries" "oncall: binaries" "high priority"

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING