pytorch - ✅(Solved) Fix [inductor] cudagraph skip reason not logged for non-CUDA GPU backends [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Error logs

N/A — this is a code-inspection issue, not a runtime failure. No traceback is produced because the affected code path silently bumps a counter without logging.

Root Cause

In torch/_inductor/output_code.py, two locations gate the cudagraph-skip diagnostic log on a hardcoded "cuda" in ...device_types check. When compilation targets a non-CUDA GPU backend (XPU, MPS, MTIA), cudagraphs are disabled and counters["inductor"]["cudagraph_skips"] is still incremented, but log_cudagraph_skip_and_bump_counter(...) is never called — the skip reason is dropped on the floor.

Fix Action

Fix / Workaround

On any non-CUDA accelerator — XPU / MPS / MTIA, and also out-of-tree backends registered via the PrivateUse1 dispatch key — when disabled_cudagraphs_reason is set (or cudagraph_fail_reasons is non-empty) and the compiled graph targets such a device, the skip counter increments silently. Users lose visibility into why cudagraph wrapping was skipped, which is a significant diagnostic gap when diagnosing torch.compile performance on non-CUDA accelerators.

PR fix notes

PR #180971: [inductor] Using is_gpu() instead of hardcoded "cuda" check

Description (problem / solution / changelog)

Resolves: https://github.com/pytorch/pytorch/issues/180951

Issue:

  • In the pytorch/torch/_inductor/output_code.py file, the logging of cudagraph skip reasons happens only when the device_types has "cuda".
  • This check occurs in three locations in the file, that are fixed with this pr
  • This hard coded "cuda" check is inconsistent with that of done in pytorch/torch/_inductor/scheduler.py

Fix:

  • Wherever the hard coded "cuda" check is used, this pr replaces them with is_gpu() api, which also checks for other targets like {mps, xpu, mtia, cuda}

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

Changed files

  • torch/_inductor/output_code.py (modified, +7/-4)

Code Example

PyTorch version: 2.10.0+cu128
Is debug build: False
CUDA used to build PyTorch: 12.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0
Clang version: Could not collect
CMake version: version 3.31.2
Libc version: glibc-2.35

Python version: 3.10.12 (main, Mar  3 2026, 11:56:32) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.8.0-106-generic-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU: 13th Gen Intel(R) Core(TM) i9-13900H (20 cores)

Versions of relevant libraries:
[pip3] numpy==2.2.6
[pip3] torch==2.10.0+cpu
[pip3] triton==3.6.0

Note: reporter's machine is CPU-only; this is a code-inspection report against
current main (SHA c4ec73b4b52e7c878e3c2522cac61e035ee72520), not a live runtime
repro. Applies to any compile target where `device_types` contains a non-CUDA
GPU (mps/xpu/mtia).
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

In torch/_inductor/output_code.py, two locations gate the cudagraph-skip diagnostic log on a hardcoded "cuda" in ...device_types check. When compilation targets a non-CUDA GPU backend (XPU, MPS, MTIA), cudagraphs are disabled and counters["inductor"]["cudagraph_skips"] is still incremented, but log_cudagraph_skip_and_bump_counter(...) is never called — the skip reason is dropped on the floor.

The sibling file torch/_inductor/scheduler.py makes the equivalent GPU-or-not decision correctly by using is_gpu() from torch/_inductor/utils.py, which covers cuda | mps | xpu | mtia. The two sites in output_code.py are inconsistent with that pattern.

Inconsistent locations (hardcoded "cuda"):

  1. https://github.com/pytorch/pytorch/blob/c4ec73b4b52e7c878e3c2522cac61e035ee72520/torch/_inductor/output_code.py#L277-L287
  2. https://github.com/pytorch/pytorch/blob/c4ec73b4b52e7c878e3c2522cac61e035ee72520/torch/_inductor/output_code.py#L593-L600

Consistent counter-example (uses is_gpu()):

is_gpu / GPU_TYPES definitions:

Effect:

On any non-CUDA accelerator — XPU / MPS / MTIA, and also out-of-tree backends registered via the PrivateUse1 dispatch key — when disabled_cudagraphs_reason is set (or cudagraph_fail_reasons is non-empty) and the compiled graph targets such a device, the skip counter increments silently. Users lose visibility into why cudagraph wrapping was skipped, which is a significant diagnostic gap when diagnosing torch.compile performance on non-CUDA accelerators.

Expected Behavior:

log_cudagraph_skip_and_bump_counter(...) should fire for any GPU device (as determined by is_gpu()), matching the behavior already established in _log_graph_partitions in scheduler.py.

Scope note: This report is intentionally limited to the devices covered by GPU_TYPES in utils.py (cuda | mps | xpu | mtia). Extending cudagraph diagnostics to PrivateUse1-registered accelerators requires broader changes to is_gpu/GPU_TYPES and is a separate concern, not in scope here.

Error logs

N/A — this is a code-inspection issue, not a runtime failure. No traceback is produced because the affected code path silently bumps a counter without logging.

Versions

PyTorch version: 2.10.0+cu128
Is debug build: False
CUDA used to build PyTorch: 12.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0
Clang version: Could not collect
CMake version: version 3.31.2
Libc version: glibc-2.35

Python version: 3.10.12 (main, Mar  3 2026, 11:56:32) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.8.0-106-generic-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU: 13th Gen Intel(R) Core(TM) i9-13900H (20 cores)

Versions of relevant libraries:
[pip3] numpy==2.2.6
[pip3] torch==2.10.0+cpu
[pip3] triton==3.6.0

Note: reporter's machine is CPU-only; this is a code-inspection report against
current main (SHA c4ec73b4b52e7c878e3c2522cac61e035ee72520), not a live runtime
repro. Applies to any compile target where `device_types` contains a non-CUDA
GPU (mps/xpu/mtia).

cc @mcarilli @ezyang @eellison @penguinwu @BoyuanFeng @chauhang @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

Replace the hardcoded "cuda" in ...device_types checks in torch/_inductor/output_code.py with a call to is_gpu() from torch/_inductor/utils.py to ensure consistent handling of non-CUDA GPU backends.

Guidance

  • Identify the two locations in output_code.py where the hardcoded "cuda" check is used (lines 277-287 and 593-600) and replace them with a call to is_gpu().
  • Verify that the is_gpu() function correctly identifies non-CUDA GPU devices (XPU, MPS, MTIA) by checking the GPU_TYPES definition in utils.py.
  • Update the log_cudagraph_skip_and_bump_counter() function to use the is_gpu() check, ensuring that it fires for any GPU device, not just CUDA.
  • Test the changes with different GPU backends (e.g., XPU, MPS, MTIA) to ensure that the skip counter increments correctly and the skip reason is logged.

Example

from torch._inductor.utils import is_gpu

# Replace hardcoded "cuda" check with is_gpu() call
if is_gpu(device_type):
    log_cudagraph_skip_and_bump_counter(...)

Notes

This fix only addresses the inconsistent handling of non-CUDA GPU backends and does not extend cudagraph diagnostics to PrivateUse1-registered accelerators, which requires broader changes to is_gpu() and GPU_TYPES.

Recommendation

Apply the workaround by replacing the hardcoded "cuda" checks with a call to is_gpu() to ensure consistent handling of non-CUDA GPU backends.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix [inductor] cudagraph skip reason not logged for non-CUDA GPU backends [1 pull requests]