pytorch - 💡(How to fix) Fix torch.cuda.Stream traced incorrectly by torch.compile [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180179Fetched 2026-04-13 05:35:25
View on GitHub
Comments
1
Participants
2
Timeline
97
Reactions
0
Author
Participants
Timeline (top)
mentioned ×45subscribed ×45labeled ×6commented ×1
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

A fix to the previously reported bug , #173172 appears to only work for the default stream. I believe the fix reads 'stream_id' into 'cuda_stream' pointer, from what I see in debug print. For default stream, both are 0 so that works:

root@aea89da2c313:/git/kernelcatcher/cuequivariance_ops_torch/tests# python compile_repro.py current_cuda_stream: id = 0 cuda_stream = 0 In the op, cuda_stream = 0 no-stream call successful Created stream <torch.cuda.Stream device=cuda:0 cuda_stream=0x1ee529d0>, id = 3 cuda_stream = 1ee529d0 current_cuda_stream: id = 3 cuda_stream = 1ee529d0 In the op, cuda_stream = 3 Segmentation fault (core dumped)

Please find repro script attached. You need to 'pip install cuda-bindings cuequivariance_ops_torch_cu13' (or cuequivariance_ops_torch_cu12 on CUDA12).

compile_repro.py

Versions

Collecting environment information... PyTorch version: 2.12.0.dev20260411+cu130 Is debug build: False CUDA used to build PyTorch: 13.0 ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.4 LTS (x86_64) GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 Clang version: Could not collect CMake version: version 3.31.6 Libc version: glibc-2.39

Python version: 3.12.3 (main, Mar 3 2026, 12:15:18) [GCC 13.3.0] (64-bit runtime) Python platform: Linux-6.8.0-79-generic-x86_64-with-glibc2.39 Is CUDA available: True CUDA runtime version: 13.2.51 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA RTX A6000 Nvidia driver version: 580.82.07 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.9.20.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.20.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.20.0 /usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.20.0 /usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.20.0 /usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.20.0 /usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.20.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.20.0 Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

cc @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @amjames @Lucaskabela @jataylo @azahed98

extent analysis

TL;DR

The issue is likely due to the fix for the previously reported bug only working for the default stream, and a workaround or code change is needed to handle non-default streams.

Guidance

  • The problem seems to be related to the cuda_stream pointer being read incorrectly for non-default streams, so verifying the stream ID and cuda_stream pointer values for different streams could help identify the issue.
  • Checking the code that reads the stream_id into the cuda_stream pointer to ensure it handles non-default streams correctly might provide insight into the problem.
  • Running the repro script with different stream IDs to see if the issue is consistent across all non-default streams could help determine the root cause.
  • Reviewing the fix for the previously reported bug (#173172) to understand how it handles stream IDs and cuda_stream pointers might provide clues to resolving this issue.

Notes

The issue seems to be specific to PyTorch version 2.12.0.dev20260411+cu130 and CUDA 13.0, so any solution or workaround might need to be tailored to this version and configuration.

Recommendation

Apply a workaround to handle non-default streams correctly, as the current fix seems to only work for the default stream. This is because the issue is likely due to the fix not properly handling non-default streams, and a workaround can provide a temporary solution until a more permanent fix is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix torch.cuda.Stream traced incorrectly by torch.compile [1 comments, 2 participants]