pytorch - 💡(How to fix) Fix InductorError when compile backward graph using cpp_wrapper [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178845Fetched 2026-04-08 01:57:41
View on GitHub
Comments
1
Participants
1
Timeline
111
Reactions
0
Author
Participants
Assignees
Timeline (top)
mentioned ×50subscribed ×50labeled ×6assigned ×1

Error Message

Error logs

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Minimal repro case: `import torch

def func(x, output_grad): layer_norm = torch.nn.LayerNorm(normalized_shape=4).cuda() output = layer_norm(x) output.backward(output_grad) return output

x = torch.randn(2, 3, 4, device="cuda") output_grad = torch.randn(2, 3, 4, device="cuda")

opt_func = torch.compile(options={"cpp_wrapper": True})(func) output_compile = opt_func(x, output_grad) ` Setting torch._inductor.config.cpp_wrapper = True is work.

Error logs

File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py", line 2364, in codegen result = self.wrapper_code.generate(self.is_inference) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/cpp_wrapper_gpu.py", line 516, in generate return super().generate(is_inference) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/cpp_wrapper_cpu.py", line 1041, in generate return super().generate(is_inference) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/wrapper.py", line 1770, in generate return self._generate(is_inference) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/wrapper.py", line 1858, in _generate self.finalize_prefix() File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/cpp_wrapper_gpu.py", line 528, in finalize_prefix kernel.generate(self) File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/cpp_wrapper_gpu.py", line 76, in generate assert params, f"CudaKernelParamCache not populated for {self.kernel_name}" ^^^^^^ torch._inductor.exc.InductorError: AssertionError: CudaKernelParamCache not populated for triton_poi_fused_native_layer_norm_native_layer_norm_backward_1

Versions

PyTorch version: 2.11.0a0+eb65b36914.nv26.02 Is debug build: False CUDA used to build PyTorch: 13.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.3 LTS (x86_64) GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 Clang version: Could not collect CMake version: version 3.31.6 Libc version: glibc-2.39

Python version: 3.12.3 (main, Jan 22 2026, 20:57:42) [GCC 13.3.0] (64-bit runtime) Python platform: Linux-5.4.0-150-generic-x86_64-with-glibc2.39 Is CUDA available: True CUDA runtime version: 13.1.115 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA A100-PCIE-40GB Nvidia driver version: 570.133.20

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

Setting torch._inductor.config.cpp_wrapper = True before compiling the function with torch.compile may resolve the issue.

Guidance

  • The error message suggests a problem with the CudaKernelParamCache not being populated for the triton_poi_fused_native_layer_norm_native_layer_norm_backward_1 kernel.
  • The fact that setting torch._inductor.config.cpp_wrapper = True works implies that the issue might be related to the compilation options used by torch.compile.
  • To mitigate the issue, try setting torch._inductor.config.cpp_wrapper = True before compiling the function with torch.compile.
  • Verify that the issue is resolved by running the compiled function with the modified configuration.

Example

import torch

torch._inductor.config.cpp_wrapper = True  # Set cpp_wrapper to True
def func(x, output_grad):
    layer_norm = torch.nn.LayerNorm(normalized_shape=4).cuda()
    output = layer_norm(x)
    output.backward(output_grad)
    return output

x = torch.randn(2, 3, 4, device="cuda")
output_grad = torch.randn(2, 3, 4, device="cuda")

opt_func = torch.compile(options={"cpp_wrapper": True})(func)
output_compile = opt_func(x, output_grad)

Notes

The provided solution is based on the information given in the issue and may not be applicable in all cases. The root cause of the issue is not explicitly stated, and the provided workaround may not fix the underlying problem.

Recommendation

Apply the workaround by setting torch._inductor.config.cpp_wrapper = True before compiling the function with torch.compile, as it has been shown to resolve the issue in the given example.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING