pytorch - 💡(How to fix) Fix [Inductor] Improve compile time of `build_fusion_regions` [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181076Fetched 2026-04-23 07:22:53
View on GitHub
Comments
1
Participants
2
Timeline
54
Reactions
0
Timeline (top)
subscribed ×22mentioned ×21labeled ×8unlabeled ×2
RAW_BUFFERClick to expand / collapse

#176994 added a cost model to make better decision on whether inlining or materializing a tensor. The cost mode l relies on #users of the tensor, # read bytes and # output bytes.

One question is how to decide #users. A naive approach is to count the number of users of a node in the FX Graph. However, inductor may fuse multiple users together such that these users can share a single read. As @eellison suggested, we call these fused users as a single effective user for the cost model purpose. Since this fusion decision is done after the should_realize_on_reuse decision, we need some heuristics to estimate the number of effective users.

I first tried build_fusion_regions. Overall, there are some memory saving at the cost of higher compilation time. Dashboard Result <img width="1601" height="388" alt="Image" src="https://github.com/user-attachments/assets/4949a3bc-8d85-424a-be44-0b6b8f039522" />

To cut the compile time cost, #176994 uses a more coarse-grained _build_estimated_effective_users. This approach does not show compile time regression. But the memory saving is also gone. Dashboard Result

<img width="1597" height="394" alt="Image" src="https://github.com/user-attachments/assets/645cc7ba-e0e1-4c3d-88b2-700eb15ac8f6" />

Ideally build_fusion_regions could be faster so we can save the memory while not increasing compile time.

cc @jerryzh168 @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

Optimize the build_fusion_regions function to reduce compilation time while maintaining memory savings.

Guidance

  • Investigate the current implementation of build_fusion_regions to identify performance bottlenecks.
  • Consider applying optimizations such as caching, memoization, or parallel processing to improve the function's efficiency.
  • Evaluate the trade-off between compilation time and memory savings to determine the optimal approach.
  • Compare the results of build_fusion_regions and _build_estimated_effective_users to understand the impact of each approach on compilation time and memory usage.

Example

No specific code snippet is provided, but the optimization of build_fusion_regions could involve techniques such as:

# Pseudo-code example of caching fusion regions
fusion_regions_cache = {}

def build_fusion_regions(...):
    if result in fusion_regions_cache:
        return fusion_regions_cache[result]
    # Calculate fusion regions
    result =...
    fusion_regions_cache[result] = result
    return result

Notes

The ideal solution depends on the specific requirements and constraints of the project, including the acceptable compilation time and memory usage.

Recommendation

Apply optimizations to the build_fusion_regions function to reduce compilation time while maintaining memory savings, as this approach has shown promise in reducing memory usage.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING