pytorch - 💡(How to fix) Fix DISABLED test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True (main.TestMaxAutotune) [1 comments, 1 participants]

pytorch2026-05-01 07:29:49

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#182093•Fetched 2026-05-02 05:27:26

View on GitHub

Comments

Participants

Timeline

113

Reactions

Author

pytorch-bot[bot]

Participants

pytorch-bot[bot]

Timeline (top)

mentioned ×54subscribed ×54labeled ×4commented ×1

Error Message

Traceback (most recent call last): File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor yield File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 591, in run self._callTestMethod(testMethod) File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod method() File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3528, in wrapper method(*args, **kwargs) File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/mock.py", line 1379, in patched return func(*newargs, **newkeywargs) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 603, in instantiated_test test(self, **param_kwargs) File "/var/lib/jenkins/workspace/test/inductor/test_max_autotune.py", line 3335, in test_triton_gemm_epilogue_fusion_truncates_accumulator out, code = run_and_get_code(torch.compile(fn), x, bias) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2637, in run_and_get_code result = fn(*args, **kwargs) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1112, in compile_wrapper raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1078, in _compile_fx_inner raise InductorError(e, currentframe()).with_traceback( File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1058, in _compile_fx_inner mb_compiled_graph = fx_codegen_and_compile( File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1845, in fx_codegen_and_compile return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1606, in codegen_and_compile compiled_module = graph.compile_to_module() File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2678, in compile_to_module return self._compile_to_module() File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2684, in _compile_to_module self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen() File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2571, in codegen_with_cpp_wrapper return self.codegen() File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2627, in codegen result = self.wrapper_code.generate(self.is_inference) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_gpu.py", line 920, in generate return super().generate(is_inference) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_cpu.py", line 1297, in generate return super().generate(is_inference) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2054, in generate return self._generate(is_inference) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2121, in _generate self.generate_and_run_autotune_block() File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2209, in generate_and_run_autotune_block raise RuntimeError(f"Failed to run autotuning code block: {e}") from e torch._inductor.exc.InductorError: RuntimeError: Failed to run autotuning code block: out of resource: shared memory, Required: 106496, Hardware limit: 101376. Reducing block sizes or num_stages may help.

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

To execute this test, run the following from the base repo dir: python test/inductor/test_max_autotune.py TestMaxAutotune.test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Root Cause

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Fix Action

Fix / Workaround

Traceback (most recent call last):
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3528, in wrapper
    method(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/mock.py", line 1379, in patched
    return func(*newargs, **newkeywargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 603, in instantiated_test
    test(self, **param_kwargs)
  File "/var/lib/jenkins/workspace/test/inductor/test_max_autotune.py", line 3335, in test_triton_gemm_epilogue_fusion_truncates_accumulator
    out, code = run_and_get_code(torch.compile(fn), x, bias)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2637, in run_and_get_code
    result = fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1112, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1078, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1058, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1845, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1606, in codegen_and_compile
    compiled_module = graph.compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2678, in compile_to_module
    return self._compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2684, in _compile_to_module
    self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2571, in codegen_with_cpp_wrapper
    return self.codegen()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2627, in codegen
    result = self.wrapper_code.generate(self.is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_gpu.py", line 920, in generate
    return super().generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_cpu.py", line 1297, in generate
    return super().generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2054, in generate
    return self._generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2121, in _generate
    self.generate_and_run_autotune_block()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2209, in generate_and_run_autotune_block
    raise RuntimeError(f"Failed to run autotuning code block: {e}") from e
torch._inductor.exc.InductorError: RuntimeError: Failed to run autotuning code block: out of resource: shared memory, Required: 106496, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.

Code Example

Traceback (most recent call last):
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3528, in wrapper
    method(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/mock.py", line 1379, in patched
    return func(*newargs, **newkeywargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 603, in instantiated_test
    test(self, **param_kwargs)
  File "/var/lib/jenkins/workspace/test/inductor/test_max_autotune.py", line 3335, in test_triton_gemm_epilogue_fusion_truncates_accumulator
    out, code = run_and_get_code(torch.compile(fn), x, bias)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2637, in run_and_get_code
    result = fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1112, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1078, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1058, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1845, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1606, in codegen_and_compile
    compiled_module = graph.compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2678, in compile_to_module
    return self._compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2684, in _compile_to_module
    self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2571, in codegen_with_cpp_wrapper
    return self.codegen()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2627, in codegen
    result = self.wrapper_code.generate(self.is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_gpu.py", line 920, in generate
    return super().generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_cpu.py", line 1297, in generate
    return super().generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2054, in generate
    return self._generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2121, in _generate
    self.generate_and_run_autotune_block()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2209, in generate_and_run_autotune_block
    raise RuntimeError(f"Failed to run autotuning code block: {e}") from e
torch._inductor.exc.InductorError: RuntimeError: Failed to run autotuning code block: out of resource: shared memory, Required: 106496, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"


To execute this test, run the following from the base repo dir:
    python test/inductor/test_max_autotune.py TestMaxAutotune.test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

RAW_BUFFERClick to expand / collapse

Platforms: inductor

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Over the past 6 hours, it has been determined flaky in 4 workflow(s) with 4 failures and 4 successes.

Debugging instructions (after clicking on the recent samples link): DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs. To find relevant log snippets:

Click on the workflow logs linked above
Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
Grep for test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True
There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.

<details><summary>Sample error message</summary>

Traceback (most recent call last):
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3528, in wrapper
    method(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/mock.py", line 1379, in patched
    return func(*newargs, **newkeywargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 603, in instantiated_test
    test(self, **param_kwargs)
  File "/var/lib/jenkins/workspace/test/inductor/test_max_autotune.py", line 3335, in test_triton_gemm_epilogue_fusion_truncates_accumulator
    out, code = run_and_get_code(torch.compile(fn), x, bias)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2637, in run_and_get_code
    result = fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1112, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1078, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1058, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1845, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1606, in codegen_and_compile
    compiled_module = graph.compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2678, in compile_to_module
    return self._compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2684, in _compile_to_module
    self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2571, in codegen_with_cpp_wrapper
    return self.codegen()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2627, in codegen
    result = self.wrapper_code.generate(self.is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_gpu.py", line 920, in generate
    return super().generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/cpp_wrapper_cpu.py", line 1297, in generate
    return super().generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2054, in generate
    return self._generate(is_inference)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2121, in _generate
    self.generate_and_run_autotune_block()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/codegen/wrapper.py", line 2209, in generate_and_run_autotune_block
    raise RuntimeError(f"Failed to run autotuning code block: {e}") from e
torch._inductor.exc.InductorError: RuntimeError: Failed to run autotuning code block: out of resource: shared memory, Required: 106496, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"


To execute this test, run the following from the base repo dir:
    python test/inductor/test_max_autotune.py TestMaxAutotune.test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

</details>

Test file path: inductor/test_max_autotune.py

For all disabled tests (by GitHub issue), see https://hud.pytorch.org/disabled.

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

The most likely fix for the failing test is to reduce block sizes or num_stages to alleviate the "out of resource: shared memory" error.

Guidance

Review the test test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True in inductor/test_max_autotune.py to understand the current block sizes and num_stages used.
Consider reducing the block sizes or num_stages to decrease the shared memory requirements, as suggested by the error message.
Run the test with the modified block sizes or num_stages to verify if the issue is resolved.
If the issue persists, set TORCHDYNAMO_VERBOSE=1 and TORCH_LOGS="+dynamo" to gather more detailed logs for further debugging.

Example

No specific code snippet is provided, as the issue is related to a test failure and requires modification of the test parameters rather than code changes.

Notes

The error message indicates a hardware limit on shared memory, which may vary across different environments. Reducing block sizes or num_stages may help, but the optimal values may need to be determined through trial and error.

Recommendation

Apply a workaround by reducing block sizes or num_stages to alleviate the shared memory error, as this is a more feasible solution than upgrading to a fixed version, which is not mentioned in the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#index setup #retrieval issue #search optimization #API routing #API middleware

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix DISABLED test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True (main.TestMaxAutotune) [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix DISABLED test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True (__main__.TestMaxAutotune) [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

pytorch - 💡(How to fix) Fix DISABLED test_triton_gemm_epilogue_fusion_truncates_accumulator_float32_use_addmm_True (main.TestMaxAutotune) [1 comments, 1 participants]