pytorch - 💡(How to fix) Fix DISABLED test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True (__main__.TestMaxAutotune) [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#182093Fetched 2026-05-02 05:27:26
View on GitHub
Comments
1
Participants
1
Timeline
113
Reactions
0
Participants
Timeline (top)
mentioned ×54subscribed ×54labeled ×4commented ×1

Error Message

Traceback (most recent call last): File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor yield File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 591, in run self._callTestMethod(testMethod) File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod method() File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3528, in wrapper method(*args, **kwargs) File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/mock.py", line 1379, in patched return func(*newargs, **newkeywargs) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 603, in instantiated_test test(self, **param_kwargs) File "/var/lib/jenkins/workspace/test/inductor/test_max_autotune.py", line 3335, in test_triton_gemm_epilogue_fusion_truncates_accumulator out, code = run_and_get_code(torch.compile(fn), x, bias) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2637, in run_and_get_code result = fn(*args, **kwargs) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1112, in compile_wrapper raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1078, in _compile_fx_inner raise InductorError(e, currentframe()).with_traceback( File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1058, in _compile_fx_inner mb_compiled_graph = fx_codegen_and_compile( File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1845, in fx_codegen_and_compile return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1606, in codegen_and_compile compiled_module = graph.compile_to_module() File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2678, in compile_to_module return self._compile_to_module() File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2684, in _compile_to_module self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen() File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2571, in codegen_with_cpp_wrapper return self.codegen() File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2627, in codegen result = self.wrapper_code.generate(self.is_inference) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_gpu.py", line 920, in generate return super().generate(is_inference) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_cpu.py", line 1297, in generate return super().generate(is_inference) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2054, in generate return self._generate(is_inference) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2121, in _generate self.generate_and_run_autotune_block() File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2209, in generate_and_run_autotune_block raise RuntimeError(f"Failed to run autotuning code block: {e}") from e torch._inductor.exc.InductorError: RuntimeError: Failed to run autotuning code block: out of resource: shared memory, Required: 106496, Hardware limit: 101376. Reducing block sizes or num_stages may help.

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

To execute this test, run the following from the base repo dir: python test/inductor/test_max_autotune.py TestMaxAutotune.test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Root Cause

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Fix Action

Fix / Workaround

Traceback (most recent call last):
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3528, in wrapper
    method(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/mock.py", line 1379, in patched
    return func(*newargs, **newkeywargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 603, in instantiated_test
    test(self, **param_kwargs)
  File "/var/lib/jenkins/workspace/test/inductor/test_max_autotune.py", line 3335, in test_triton_gemm_epilogue_fusion_truncates_accumulator
    out, code = run_and_get_code(torch.compile(fn), x, bias)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2637, in run_and_get_code
    result = fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1112, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1078, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1058, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1845, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1606, in codegen_and_compile
    compiled_module = graph.compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2678, in compile_to_module
    return self._compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2684, in _compile_to_module
    self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2571, in codegen_with_cpp_wrapper
    return self.codegen()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2627, in codegen
    result = self.wrapper_code.generate(self.is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_gpu.py", line 920, in generate
    return super().generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_cpu.py", line 1297, in generate
    return super().generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2054, in generate
    return self._generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2121, in _generate
    self.generate_and_run_autotune_block()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2209, in generate_and_run_autotune_block
    raise RuntimeError(f"Failed to run autotuning code block: {e}") from e
torch._inductor.exc.InductorError: RuntimeError: Failed to run autotuning code block: out of resource: shared memory, Required: 106496, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.

Code Example

Traceback (most recent call last):
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3528, in wrapper
    method(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/mock.py", line 1379, in patched
    return func(*newargs, **newkeywargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 603, in instantiated_test
    test(self, **param_kwargs)
  File "/var/lib/jenkins/workspace/test/inductor/test_max_autotune.py", line 3335, in test_triton_gemm_epilogue_fusion_truncates_accumulator
    out, code = run_and_get_code(torch.compile(fn), x, bias)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2637, in run_and_get_code
    result = fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1112, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1078, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1058, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1845, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1606, in codegen_and_compile
    compiled_module = graph.compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2678, in compile_to_module
    return self._compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2684, in _compile_to_module
    self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2571, in codegen_with_cpp_wrapper
    return self.codegen()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2627, in codegen
    result = self.wrapper_code.generate(self.is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_gpu.py", line 920, in generate
    return super().generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_cpu.py", line 1297, in generate
    return super().generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2054, in generate
    return self._generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2121, in _generate
    self.generate_and_run_autotune_block()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2209, in generate_and_run_autotune_block
    raise RuntimeError(f"Failed to run autotuning code block: {e}") from e
torch._inductor.exc.InductorError: RuntimeError: Failed to run autotuning code block: out of resource: shared memory, Required: 106496, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"


To execute this test, run the following from the base repo dir:
    python test/inductor/test_max_autotune.py TestMaxAutotune.test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
RAW_BUFFERClick to expand / collapse

Platforms: inductor

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Over the past 6 hours, it has been determined flaky in 4 workflow(s) with 4 failures and 4 successes.

Debugging instructions (after clicking on the recent samples link): DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs. To find relevant log snippets:

  1. Click on the workflow logs linked above
  2. Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
  3. Grep for test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True
  4. There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.
<details><summary>Sample error message</summary>
Traceback (most recent call last):
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3528, in wrapper
    method(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/mock.py", line 1379, in patched
    return func(*newargs, **newkeywargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 603, in instantiated_test
    test(self, **param_kwargs)
  File "/var/lib/jenkins/workspace/test/inductor/test_max_autotune.py", line 3335, in test_triton_gemm_epilogue_fusion_truncates_accumulator
    out, code = run_and_get_code(torch.compile(fn), x, bias)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2637, in run_and_get_code
    result = fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1112, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1078, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1058, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1845, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1606, in codegen_and_compile
    compiled_module = graph.compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2678, in compile_to_module
    return self._compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2684, in _compile_to_module
    self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2571, in codegen_with_cpp_wrapper
    return self.codegen()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2627, in codegen
    result = self.wrapper_code.generate(self.is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_gpu.py", line 920, in generate
    return super().generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_cpu.py", line 1297, in generate
    return super().generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2054, in generate
    return self._generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2121, in _generate
    self.generate_and_run_autotune_block()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2209, in generate_and_run_autotune_block
    raise RuntimeError(f"Failed to run autotuning code block: {e}") from e
torch._inductor.exc.InductorError: RuntimeError: Failed to run autotuning code block: out of resource: shared memory, Required: 106496, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"


To execute this test, run the following from the base repo dir:
    python test/inductor/test_max_autotune.py TestMaxAutotune.test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
</details>

Test file path: inductor/test_max_autotune.py

For all disabled tests (by GitHub issue), see https://hud.pytorch.org/disabled.

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

The most likely fix for the failing test is to reduce block sizes or num_stages to alleviate the "out of resource: shared memory" error.

Guidance

  • Review the test test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True in inductor/test_max_autotune.py to understand the current block sizes and num_stages used.
  • Consider reducing the block sizes or num_stages to decrease the shared memory requirements, as suggested by the error message.
  • Run the test with the modified block sizes or num_stages to verify if the issue is resolved.
  • If the issue persists, set TORCHDYNAMO_VERBOSE=1 and TORCH_LOGS="+dynamo" to gather more detailed logs for further debugging.

Example

No specific code snippet is provided, as the issue is related to a test failure and requires modification of the test parameters rather than code changes.

Notes

The error message indicates a hardware limit on shared memory, which may vary across different environments. Reducing block sizes or num_stages may help, but the optimal values may need to be determined through trial and error.

Recommendation

Apply a workaround by reducing block sizes or num_stages to alleviate the shared memory error, as this is a more feasible solution than upgrading to a fixed version, which is not mentioned in the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING