pytorch - ✅(Solved) Fix Add Dynamo support for dist.record_comm (capture_profiler_record_comm) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178820Fetched 2026-04-08 01:52:06
View on GitHub
Comments
1
Participants
2
Timeline
67
Reactions
0
Author
Participants
Assignees
Timeline (top)
mentioned ×27subscribed ×27labeled ×7referenced ×3

Root Cause

Problem

dist.record_comm inside torch.compile causes a graph break because its internal calls to torch._C._distributed_c10d._get_comm_profiling_name() and _set_comm_profiling_name() are pybind11 C++ functions with no Python source. Dynamo classifies them as SkipFunctionVariable and graph-breaks.

PR fix notes

PR #179093: Make dist.record_comm dynamo traceable

Description (problem / solution / changelog)

Fixes https://github.com/pytorch/pytorch/issues/178820

Insert nodes in the graph for record comm enter/exit. Use opaque type to ensure names survive AOTAutograd. Add unit tests and validated using profiler trace that comm name survives in the compiled region

<img width="1273" height="355" alt="Screenshot 2026-04-01 at 4 49 37 PM" src="https://github.com/user-attachments/assets/03c0c988-a801-462d-9127-23b241c8c719" /> <!-- ps-id: 883de669-6e75-4458-bc9c-4c8b0707a1ba -->

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo @azahed98

Changed files

  • test/distributed/test_dynamo_distributed.py (modified, +128/-0)
  • torch/_dynamo/config.py (modified, +6/-0)
  • torch/_dynamo/variables/ctx_manager.py (modified, +68/-0)
  • torch/_dynamo/variables/torch.py (modified, +8/-0)
  • torch/distributed/_functional_collectives.py (modified, +84/-0)
  • torch/distributed/distributed_c10d.py (modified, +12/-3)

Code Example

@torch.compile
  def fn(x):
      with dist.record_comm("my_collective"):
          work = dist.all_reduce(x, async_op=True)
      work.wait()
      return x
RAW_BUFFERClick to expand / collapse

Problem

dist.record_comm inside torch.compile causes a graph break because its internal calls to torch._C._distributed_c10d._get_comm_profiling_name() and _set_comm_profiling_name() are pybind11 C++ functions with no Python source. Dynamo classifies them as SkipFunctionVariable and graph-breaks.

  @torch.compile
  def fn(x):
      with dist.record_comm("my_collective"):
          work = dist.all_reduce(x, async_op=True)
      work.wait()
      return x

This is the same class of problem that torch.profiler.record_function had before
capture_profiler_record_function was added.

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @xmfan @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @amjames @Lucaskabela @jataylo

extent analysis

Fix Plan

To resolve the issue, we need to add support for dist.record_comm in torch.compile by creating a custom wrapper function that can be compiled.

Step-by-Step Solution

  1. Create a custom wrapper function for dist.record_comm:

import torch import torch.distributed as dist

def record_comm_wrapper(name): def wrapper(func): def inner(*args, **kwargs): with dist.record_comm(name): return func(*args, **kwargs) return inner return wrapper

2. **Apply the wrapper to the function**:
   ```python
@torch.compile
@record_comm_wrapper("my_collective")
def fn(x):
    work = dist.all_reduce(x, async_op=True)
    work.wait()
    return x
  1. Verify the fix by checking if the function compiles without errors and the collective communication is recorded correctly.

Verification

To verify the fix, you can use the following code:

import torch
import torch.distributed as dist

# Initialize the distributed backend
dist.init_process_group("nccl", init_method="env://")

# Define the function with the wrapper
@torch.compile
@record_comm_wrapper("my_collective")
def fn(x):
    work = dist.all_reduce(x, async_op=True)
    work.wait()
    return x

# Test the function
x = torch.tensor([1.0])
result = fn(x)
print(result)

This should compile and run without errors, and the collective communication should be recorded correctly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING