pytorch - 💡(How to fix) Fix Profiler FunctionEvent missing flow_id, flow_start, and flow_type for CUDA [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#184363Fetched 2026-05-20 03:39:05
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
cross-referenced ×1renamed ×1

PyTorch profiler upstream tests validating async CPU→GPU flow metadata currently fail for the backend because FunctionEvent objects do not expose:

  • flow_id
  • flow_start
  • flow_type

Error Message

ERROR: test_profiler_flow_events_parity (test_profiler.TestspyreProfiler.test_profiler_flow_events_parity) Traceback (most recent call last): AttributeError Traceback (most recent call last)

Root Cause

PyTorch profiler upstream tests validating async CPU→GPU flow metadata currently fail for the backend because FunctionEvent objects do not expose:

Code Example

def test_profiler_flow_events_parity():
        """Verify that async CPU->GPU flow fields on events() match Chrome trace JSON."""
        with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA]) as prof:
            x = torch.randn(32, 32, device="cuda")
            torch.mm(x, x)

        for e in prof.events():
            print(e)

        # Collect async CPU->GPU flow info from events()
        events_with_flow = [
            e for e in prof.events() if e.flow_id is not None and e.flow_id != 0
        ]
        self.assertGreater(
            len(events_with_flow), 0, "No flow events found via events()"
        )

        for e in events_with_flow:
            assert isinstance(e.flow_id, int)
            assert isinstance(e.flow_type, int)
            assert isinstance(e.flow_start, bool)
RAW_BUFFERClick to expand / collapse

Summary

PyTorch profiler upstream tests validating async CPU→GPU flow metadata currently fail for the backend because FunctionEvent objects do not expose:

  • flow_id
  • flow_start
  • flow_type

Reproducer

Example profiling output from Spyre backend:

def test_profiler_flow_events_parity():
        """Verify that async CPU->GPU flow fields on events() match Chrome trace JSON."""
        with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA]) as prof:
            x = torch.randn(32, 32, device="cuda")
            torch.mm(x, x)

        for e in prof.events():
            print(e)

        # Collect async CPU->GPU flow info from events()
        events_with_flow = [
            e for e in prof.events() if e.flow_id is not None and e.flow_id != 0
        ]
        self.assertGreater(
            len(events_with_flow), 0, "No flow events found via events()"
        )

        for e in events_with_flow:
            assert isinstance(e.flow_id, int)
            assert isinstance(e.flow_type, int)
            assert isinstance(e.flow_start, bool)

Observed events:

<FunctionEvent id=1106 name=void splitKreduce_kernel<32, 16, int, float, float, float, float, true, false, false>(cublasSplitKParams<float>, float const*, float const*, float*, float const*, float const*, float const*, float const*, float*, void*, long, float*, int*) device_type=DeviceType.CUDA node_id=-1 cpu_time=0.000us start_us=102132.769 end_us=102134.561 cpu_children=[] cuda_time=1.792us name=void splitKreduce_kernel<32, 16, int, float, float, float, float, true, false, false>(cublasSplitKParams<float>, float const*, float const*, float*, float const*, float const*, float const*, float const*, float*, void*, long, float*, int*) thread=1 input_shapes=[] cpu_memory_usage=0 cuda_memory_usage=0 is_async=False is_remote=False seq_nr=-1 is_legacy=False>

Observed results:

ERROR: test_profiler_flow_events_parity (test_profiler.TestspyreProfiler.test_profiler_flow_events_parity) Verify that async CPU->GPU flow fields on events() match Chrome trace JSON.

Traceback (most recent call last): File "/home/pushpak28/Documents/cuda/test_profiler.py", line 83, in test_profiler_flow_events_parity self.assertIsInstance(e.flow_id, int) ^^^^^^^^^ AttributeError: 'FunctionEvent' object has no attribute 'flow_id'


Ran 1 test in 0.109s

colab T4 Gpu : 2.10.0+cu128

results :

AttributeError Traceback (most recent call last) /tmp/ipykernel_6839/1651979183.py in <cell line: 0>() 12 e for e in prof.events() if e.flow_id is not None and e.flow_id != 0 13 ] ---> 14 test_profiler_flow_events_parity()

/tmp/ipykernel_6839/1651979183.py in test_profiler_flow_events_parity() 10 # Collect async CPU->GPU flow info from events() 11 events_with_flow = [ ---> 12 e for e in prof.events() if e.flow_id is not None and e.flow_id != 0 13 ] 14 test_profiler_flow_events_parity()

AttributeError: 'FunctionEvent' object has no attribute 'flow_id'

Versions

Python platform: Linux-6.18.8-100.fc42.x86_64-x86_64-with-glibc2.41 Is CUDA available: True CUDA runtime version: 13.1.115 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA RTX 2000 Ada Generation Laptop GPU Nvidia driver version: 590.48.01 cuDNN version: Could not collect Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 22 On-line CPU(s) list: 0-21

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING