pytorch - 💡(How to fix) Fix NVGEMM: drop hand-picked supplement configs once nvMatmulHeuristics covers them [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181908Fetched 2026-04-30 06:17:47
View on GitHub
Comments
1
Participants
1
Timeline
82
Reactions
0
Participants
Timeline (top)
mentioned ×37subscribed ×37labeled ×6closed ×1

torch/_inductor/template_heuristics/nv_universal_gemm.py (gated by config.nvgemm_supplement_configs) appends a hand-picked list of 29 tile/cluster configurations that nvMatmulHeuristics doesn't currently explore. This was introduced in #180735 to recover meaningful perf on shapes the heuristic misses.

Root Cause

torch/_inductor/template_heuristics/nv_universal_gemm.py (gated by config.nvgemm_supplement_configs) appends a hand-picked list of 29 tile/cluster configurations that nvMatmulHeuristics doesn't currently explore. This was introduced in #180735 to recover meaningful perf on shapes the heuristic misses.

Fix Action

Resolution

Once nvMatmulHeuristics' search space is extended to include these configurations, the supplement list and the nvgemm_supplement_configs config flag can both be removed. The hand-picked list at template_heuristics/nv_universal_gemm.py:179-211 should match what the heuristic recommends after the upstream update; once that's verified, drop the whole if config.nvgemm_supplement_configs: block.

RAW_BUFFERClick to expand / collapse

Summary

torch/_inductor/template_heuristics/nv_universal_gemm.py (gated by config.nvgemm_supplement_configs) appends a hand-picked list of 29 tile/cluster configurations that nvMatmulHeuristics doesn't currently explore. This was introduced in #180735 to recover meaningful perf on shapes the heuristic misses.

Resolution

Once nvMatmulHeuristics' search space is extended to include these configurations, the supplement list and the nvgemm_supplement_configs config flag can both be removed. The hand-picked list at template_heuristics/nv_universal_gemm.py:179-211 should match what the heuristic recommends after the upstream update; once that's verified, drop the whole if config.nvgemm_supplement_configs: block.

Disclosure

Filed by Claude on behalf of @NikhilAPatel.

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

Remove the nvgemm_supplement_configs config flag and the associated hand-picked list of tile/cluster configurations once nvMatmulHeuristics' search space is extended to include these configurations.

Guidance

  • Verify that nvMatmulHeuristics' search space includes the configurations in the hand-picked list at template_heuristics/nv_universal_gemm.py:179-211.
  • Remove the if config.nvgemm_supplement_configs: block once the verification is successful.
  • Update the config.nvgemm_supplement_configs flag to be removed or disabled.
  • Test the performance with the updated nvMatmulHeuristics to ensure it matches or exceeds the previous performance.

Notes

This fix assumes that the upstream update to nvMatmulHeuristics will include the necessary configurations to replace the hand-picked list.

Recommendation

Apply workaround: Remove the nvgemm_supplement_configs config flag and the associated hand-picked list of tile/cluster configurations once nvMatmulHeuristics' search space is extended, as this will simplify the code and reduce maintenance overhead.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING