pytorch - ✅(Solved) Fix The CUTLASS CI has been failing for some time, blocking PRs landing [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#177945Fetched 2026-04-08 01:02:53
View on GitHub
Comments
2
Participants
2
Timeline
37
Reactions
0
Author
Participants
Timeline (top)
subscribed ×24mentioned ×4labeled ×3commented ×2

Fix Action

Fixed

PR fix notes

PR #177941: submodule init third_party/cutlass in test_h100_cutlass_backend

Description (problem / solution / changelog)

inductor/test_cutlass_backend.py's test_aoti_workspace_ptr failed in some commits e.g. 719c659e7c7f8db1a973d9da5d054e64fcc0c6e0. Looking into the job in question, the test failed to import CUTLASS lib (ref):

inductor/test_cutlass_backend.py::TestCutlassBackend::test_aoti_workspace_ptr W0319 07:39:18.937000 287 site-packages/torch/_inductor/utils.py:2157] [0/0] Failed to import CUTLASS lib. Please check whether _inductor.config.cutlass.cutlass_dir /var/lib/jenkins/workspace/third_party/cutlass is set correctly. Skipping CUTLASS backend for now.
('RERUN', {'yellow': True}) [1.8045s] [  0%]

Also, according to https://github.com/pytorch/pytorch/actions/runs/23283286504/job/67704458771#step:3:231, it seems that submodules are not initialized in the test while TORCHINDUCTOR_CUTLASS_DIR seems to be correctly set. Thus I update test_h100_cutlass_backend to init third_party/cutlass.

Changed files

  • .ci/pytorch/test.sh (modified, +1/-0)
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

I observed that this CUTLASS CI job https://github.com/pytorch/pytorch/actions/workflows/h100-cutlass-backend.yml has been continuously failing for a while. After comparing the current failed logs and the logs last passed , it appears that the third_party/cutlass directory has not been initialized, which causes try_import_cutlass to fail.

Failed log does not do /usr/bin/git submodule sync --recursive : https://productionresultssa6.blob.core.windows.net/actions-results/381ec337-355b-4b38-bae5-ce93f13ed9d1/workflow-job-run-d40c05a5-d8db-55e0-9426-1e25164abf2f/logs/job/job-logs.txt?rsct=text%2Fplain&se=2026-03-20T05%3A55%3A45Z&sig=n5o0CxYuzebUFMUAUO3e%2BPoARQF%2BobhSOcKICR%2FSSU8%3D&ske=2026-03-20T08%3A14%3A48Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2026-03-20T04%3A14%3A48Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2025-11-05&sp=r&spr=https&sr=b&st=2026-03-20T05%3A45%3A40Z&sv=2025-11-05

Passed log does /usr/bin/git submodule sync --recursive: https://productionresultssa8.blob.core.windows.net/actions-results/5960dd22-700e-434f-a877-89d6fe79baa5/workflow-job-run-859abd75-bfa7-5fda-a04d-35603207ec82/logs/job/job-logs.txt?rsct=text%2Fplain&se=2026-03-20T06%3A32%3A10Z&sig=00W%2BDR0k1TCZH0PaIr%2BYSsrikzedHyldDrfnoVmr2ec%3D&ske=2026-03-20T08%3A15%3A32Z&skoid=ca7593d4-ee42-46cd-af88-8b886a2f84eb&sks=b&skt=2026-03-20T04%3A15%3A32Z&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skv=2025-11-05&sp=r&spr=https&sr=b&st=2026-03-20T06%3A22%3A05Z&sv=2025-11-05

Versions

Collecting environment information... PyTorch version: N/A Is debug build: N/A CUDA used to build PyTorch: N/A ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.3 LTS (x86_64) GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 Clang version: Could not collect CMake version: version 4.2.3 Libc version: glibc-2.39

cc @seemethere @malfet @pytorch/pytorch-dev-infra

extent analysis

Fix Plan

To resolve the issue, we need to ensure that the third_party/cutlass directory is properly initialized by running /usr/bin/git submodule sync --recursive before attempting to import Cutlass.

Here are the concrete steps:

  • Add the following command to the CI workflow before the try_import_cutlass step:
    - name: Sync submodules
      run: /usr/bin/git submodule sync --recursive
  • Alternatively, you can also use the following command to initialize and update the submodules:
    - name: Init and update submodules
      run: |
        git submodule init
        git submodule update --recursive
  • Verify that the third_party/cutlass directory is properly initialized after running the above commands.

Verification

To verify that the fix worked, check the CI job logs to ensure that the third_party/cutlass directory is initialized and the try_import_cutlass step succeeds.

Extra Tips

  • Make sure to test the changes in a non-production environment before deploying them to production.
  • Consider adding a check to ensure that the third_party/cutlass directory is properly initialized before attempting to import Cutlass, to prevent similar issues in the future.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix The CUTLASS CI has been failing for some time, blocking PRs landing [1 pull requests, 2 comments, 2 participants]