pytorch - ✅(Solved) Fix `PrivateUse1ProfilerRegistry::registerWithKineto()` is public but requires caller to hold internal mutex [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180331Fetched 2026-04-16 06:35:26
View on GitHub
Comments
0
Participants
1
Timeline
31
Reactions
0
Author
Participants
Timeline (top)
mentioned ×14subscribed ×14labeled ×2cross-referenced ×1

PrivateUse1ProfilerRegistry::registerWithKineto() is declared public in torch/csrc/profiler/standalone/privateuse1_profiler.h, but its implementation contains the comment:

void PrivateUse1ProfilerRegistry::registerWithKineto() {
  // Note: Caller must hold mutex_
  ...
}

The method accesses and mutates shared state (factory_, registered_with_kineto_) without acquiring mutex_, relying entirely on the assumption that the caller already holds it. Because the method is public, any external caller can invoke it without the lock, resulting in a data race and undefined behavior.

The race can cause:

  • registered_with_kineto_ written without visibility guarantees → Kineto registered multiple times
  • factory_ read in a partially-constructed state → crash or silent use of wrong profiler

The method is an implementation detail of the registry's internal state machine. The public API contract (registerFactory, onKinetoInit) is sufficient for all legitimate callers.

cc @robieta @chaekit @guotuofeng @guyang3532 @dzhulgakov @davidberard98 @briancoutinho @sraikund16 @sanrise @mwootton @divyanshk @jiannanWang @scotts @ryanzhang22

Root Cause

The method accesses and mutates shared state (factory_, registered_with_kineto_) without acquiring mutex_, relying entirely on the assumption that the caller already holds it. Because the method is public, any external caller can invoke it without the lock, resulting in a data race and undefined behavior.

Fix Action

Fixed

PR fix notes

PR #180332: [Profiler][PrivateUse1] Make PrivateUse1ProfilerRegistry::registerWithKineto() private to fix data race

Description (problem / solution / changelog)

Fixes #180331

Summary

Move PrivateUse1ProfilerRegistry::registerWithKineto() from public to private to eliminate a thread-safety bug introduced by the method's implicit lock requirement not being enforced at the API boundary.

Problem

registerWithKineto() reads and writes factory_ and registered_with_kineto_ without acquiring mutex_. All internal callers (registerFactory, onKinetoInit) correctly hold mutex_ before calling it, but because the method was declared public, external callers could invoke it lock-free, creating a data race.

cc @divyanshk

Changed files

  • torch/csrc/profiler/standalone/privateuse1_profiler.h (modified, +4/-5)

Code Example

void PrivateUse1ProfilerRegistry::registerWithKineto() {
  // Note: Caller must hold mutex_
  ...
}
RAW_BUFFERClick to expand / collapse

Summary

PrivateUse1ProfilerRegistry::registerWithKineto() is declared public in torch/csrc/profiler/standalone/privateuse1_profiler.h, but its implementation contains the comment:

void PrivateUse1ProfilerRegistry::registerWithKineto() {
  // Note: Caller must hold mutex_
  ...
}

The method accesses and mutates shared state (factory_, registered_with_kineto_) without acquiring mutex_, relying entirely on the assumption that the caller already holds it. Because the method is public, any external caller can invoke it without the lock, resulting in a data race and undefined behavior.

The race can cause:

  • registered_with_kineto_ written without visibility guarantees → Kineto registered multiple times
  • factory_ read in a partially-constructed state → crash or silent use of wrong profiler

The method is an implementation detail of the registry's internal state machine. The public API contract (registerFactory, onKinetoInit) is sufficient for all legitimate callers.

cc @robieta @chaekit @guotuofeng @guyang3532 @dzhulgakov @davidberard98 @briancoutinho @sraikund16 @sanrise @mwootton @divyanshk @jiannanWang @scotts @ryanzhang22

extent analysis

TL;DR

Change the visibility of PrivateUse1ProfilerRegistry::registerWithKineto() from public to private or protected to prevent external callers from invoking it without proper synchronization.

Guidance

  • Review the implementation of PrivateUse1ProfilerRegistry to ensure that all methods accessing shared state (factory_, registered_with_kineto_) acquire the mutex_ lock as needed.
  • Verify that all legitimate callers of registerWithKineto() already hold the mutex_ lock before invoking the method.
  • Consider adding a check at the beginning of registerWithKineto() to assert that the mutex_ lock is held by the caller, to help catch any misuse.
  • Evaluate whether the public API contract (registerFactory, onKinetoInit) can be modified to eliminate the need for registerWithKineto() to be public.

Example

// Change the method declaration to private
class PrivateUse1ProfilerRegistry {
private:
    void registerWithKineto();
};

Notes

The fix assumes that the registerWithKineto() method is only intended to be used internally by the PrivateUse1ProfilerRegistry class. If there are legitimate external callers that need to invoke this method, additional synchronization mechanisms may be required.

Recommendation

Apply workaround: Change the visibility of PrivateUse1ProfilerRegistry::registerWithKineto() to prevent external callers from invoking it without proper synchronization, as this is the most straightforward way to prevent the data race and undefined behavior.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING