pytorch - 💡(How to fix) Fix [cudagraphs] torch.cond does not work with cudagraphs when nccl collectives are used [8 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180960Fetched 2026-04-22 07:43:20
View on GitHub
Comments
8
Participants
3
Timeline
153
Reactions
0
Author
Participants
Timeline (top)
subscribed ×69mentioned ×68commented ×8labeled ×8

Code Example

torchrun --nproc_per_node=2 cond_cg_funcol_repro.py
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

When I try to use nccl collectives inside torch.cond, cudagraphs capture does not work. Got claude to generate a simple repro: https://gist.github.com/oulgen/1ea85a9ec628d92dce04b787a80bc24a

torchrun --nproc_per_node=2 cond_cg_funcol_repro.py

Versions

master

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @xmfan @weifengpy @mcarilli @ezyang @eellison @penguinwu @BoyuanFeng @chauhang @ydwu4 @bdhirsh @bobrenjc93 @aorenste

extent analysis

TL;DR

  • The issue may be resolved by modifying the cond_cg_funcol_repro.py script to properly handle cudagraphs capture with nccl collectives inside torch.cond.

Guidance

  • Review the cond_cg_funcol_repro.py script to ensure correct usage of torch.cond and nccl collectives.
  • Verify that cudagraphs capture is enabled and configured correctly.
  • Check the PyTorch documentation for any known limitations or issues with using nccl collectives inside torch.cond.
  • Test the script with a different PyTorch version or configuration to isolate the issue.

Example

Notes

  • The issue may be specific to the master branch or the current PyTorch version.
  • Without more information, it's difficult to provide a definitive solution.

Recommendation

  • Apply workaround: Modify the cond_cg_funcol_repro.py script to handle cudagraphs capture correctly, as the issue may be related to the script's implementation rather than a PyTorch version issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING