pytorch - 💡(How to fix) Fix [inductor][cpu] resnet50_quantized_qat CPP wrapper torch._inductor.exc.InductorError: CppCompileError: C++ compile error in 2026-05-11 nightly release

pytorch2026-05-12 02:34:39

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

loading model: 0it [00:00, ?it/s] loading model: 0it [00:02, ?it/s] cpu eval resnet50_quantized_qat
W0511 02:40:26.117000 277152 /workspace/pytorch/torch/_inductor/cudagraph_utils.py:207] [0/0_1] [__cudagraphs] skipping cudagraphs due to cpp wrapper enabled W0511 02:40:26.647000 277152 /workspace/pytorch/torch/_inductor/cudagraph_utils.py:207] [1/0_1] [__cudagraphs] skipping cudagraphs due to cpp wrapper enabled W0511 02:40:27.119000 277152 /workspace/pytorch/torch/_inductor/ir.py:8704] [1/0_1] aten.quantize_per_tensor.tensor_qparams is missing a c-shim implementation, using proxy executor as fallback ERROR:common:Backend dynamo failed in warmup() Traceback (most recent call last): File "/workspace/pytorch/benchmarks/dynamo/common.py", line 2786, in warmup fn(model, example_inputs) File "/workspace/pytorch/torch/_dynamo/eval_frame.py", line 1059, in compile_wrapper raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 File "/workspace/pytorch/torch/_inductor/compile_fx.py", line 1069, in _compile_fx_inner raise InductorError(e, currentframe()).with_traceback( File "/workspace/pytorch/torch/_inductor/compile_fx.py", line 1049, in _compile_fx_inner mb_compiled_graph = fx_codegen_and_compile( File "/workspace/pytorch/torch/_inductor/compile_fx.py", line 1832, in fx_codegen_and_compile return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) File "/workspace/pytorch/torch/_inductor/compile_fx.py", line 1593, in codegen_and_compile compiled_module = graph.compile_to_module() File "/workspace/pytorch/torch/_inductor/graph.py", line 2612, in compile_to_module return self._compile_to_module() File "/workspace/pytorch/torch/_inductor/graph.py", line 2622, in _compile_to_module mod = self._compile_to_module_lines(wrapper_code) File "/workspace/pytorch/torch/_inductor/graph.py", line 2697, in _compile_to_module_lines mod = PyCodeCache.load_by_key_path( File "/workspace/pytorch/torch/_inductor/codecache.py", line 3961, in load_by_key_path mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) File "/workspace/pytorch/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module exec(code, mod.dict, mod.dict) File "/tmp/torchinductor_root/47/c47qxp3zo27fejrdp3lccngwuvx5uyltkqm2aw3gdmmprswxvl56.py", line 49, in <module> inductor_entry = CppWrapperCodeCache.load_pybinding( File "/workspace/pytorch/torch/_inductor/codecache.py", line 3438, in load_pybinding return cls.load_pybinding_async(*args, **kwargs)() File "/workspace/pytorch/torch/_inductor/codecache.py", line 3430, in future result = get_result() File "/workspace/pytorch/torch/_inductor/codecache.py", line 3217, in load_fn result = worker_fn() File "/workspace/pytorch/torch/_inductor/codecache.py", line 3246, in _worker_compile_cpp builder.build() File "/workspace/pytorch/torch/_inductor/cpp_builder.py", line 2261, in build run_compile_cmd(build_cmd, cwd=_build_tmp_dir) File "/workspace/pytorch/torch/_inductor/cpp_builder.py", line 688, in run_compile_cmd _run_compile_cmd(cmd_line, cwd) File "/workspace/pytorch/torch/_inductor/cpp_builder.py", line 683, in _run_compile_cmd raise exc.CppCompileError(cmd, output) from e torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command: g++ /tmp/torchinductor_root/yw/cywameyniwb3p7nqukrvnppidg4ygr7rkrj4ieekv2oxasdq7tiw.main.cpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDALONE_TORCH_HEADER -D TORCH_INDUCTOR_PRECOMPILE_HEADERS -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_AVX512 -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimizations -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fexcess-precision=fast -fno-tree-loop-vectorize -march=native -shared -fPIC -Wall -std=c++20 -Wno-unused-variable -Wno-unknown-pragmas -pedantic -fopenmp -include /tmp/torchinductor_root/precompiled_headers/c2djtqjhmnlmkqu6kqins7k5tjvqjv5vlec7yrx3cbvoqhgxwair.h -I/opt/conda/include/python3.10 -I/workspace/pytorch/torch/include -I/workspace/pytorch/torch/include/torch/csrc/api/include -mavx512f -mavx512dq -mavx512vl -mavx512bw -mfma -mavx512vnni -mavx512vl -mamx-tile -mamx-bf16 -mamx-int8 -mavx512bf16 -o /tmp/torchinductor_root/yw/cywameyniwb3p7nqukrvnppidg4ygr7rkrj4ieekv2oxasdq7tiw.main.so -ltorch -ltorch_cpu -ltorch_python -lgomp -L/opt/conda/lib -L/workspace/pytorch/torch/lib

Output: In file included from /workspace/pytorch/torch/include/torch/csrc/inductor/cpp_wrapper/common.h:81, from /workspace/pytorch/torch/include/torch/csrc/inductor/cpp_wrapper/cpu.h:3, from /tmp/torchinductor_root/precompiled_headers/c2djtqjhmnlmkqu6kqins7k5tjvqjv5vlec7yrx3cbvoqhgxwair.h:1: /workspace/pytorch/torch/include/torch/csrc/inductor/aoti_runtime/utils.h:466:47: error: â€˜aoti_torch_dtype_quint8â€™ was not declared in this scope; did you mean â€˜aoti_torch_dtype_uint8â€™? 466 | static auto cached_torch_dtype_##typename = aoti_torch_dtype_##typename() | ^~~~~~~~~~~~~~~~~ /tmp/torchinductor_root/yw/cywameyniwb3p7nqukrvnppidg4ygr7rkrj4ieekv2oxasdq7tiw.main.cpp:5:1: note: in expansion of macro â€˜CACHE_TORCH_DTYPEâ€™ 5 | CACHE_TORCH_DTYPE(quint8); | ^~~~~~~~~~~~~~~~~

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

warmup_failed 0

Code Example

loading model: 0it [00:00, ?it/s]
loading model: 0it [00:02, ?it/s]
cpu  eval  resnet50_quantized_qat             
W0511 02:40:26.117000 277152 /workspace/pytorch/torch/_inductor/cudagraph_utils.py:207] [0/0_1] [__cudagraphs] skipping cudagraphs due to cpp wrapper enabled
W0511 02:40:26.647000 277152 /workspace/pytorch/torch/_inductor/cudagraph_utils.py:207] [1/0_1] [__cudagraphs] skipping cudagraphs due to cpp wrapper enabled
W0511 02:40:27.119000 277152 /workspace/pytorch/torch/_inductor/ir.py:8704] [1/0_1] aten.quantize_per_tensor.tensor_qparams is missing a c-shim implementation, using proxy executor as fallback
ERROR:common:Backend dynamo failed in warmup()
Traceback (most recent call last):
  File "/workspace/pytorch/benchmarks/dynamo/common.py", line 2786, in warmup
    fn(model, example_inputs)
  File "/workspace/pytorch/torch/_dynamo/eval_frame.py", line 1059, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/workspace/pytorch/torch/_inductor/compile_fx.py", line 1069, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/workspace/pytorch/torch/_inductor/compile_fx.py", line 1049, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/workspace/pytorch/torch/_inductor/compile_fx.py", line 1832, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
  File "/workspace/pytorch/torch/_inductor/compile_fx.py", line 1593, in codegen_and_compile
    compiled_module = graph.compile_to_module()
  File "/workspace/pytorch/torch/_inductor/graph.py", line 2612, in compile_to_module
    return self._compile_to_module()
  File "/workspace/pytorch/torch/_inductor/graph.py", line 2622, in _compile_to_module
    mod = self._compile_to_module_lines(wrapper_code)
  File "/workspace/pytorch/torch/_inductor/graph.py", line 2697, in _compile_to_module_lines
    mod = PyCodeCache.load_by_key_path(
  File "/workspace/pytorch/torch/_inductor/codecache.py", line 3961, in load_by_key_path
    mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
  File "/workspace/pytorch/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
    exec(code, mod.__dict__, mod.__dict__)
  File "/tmp/torchinductor_root/47/c47qxp3zo27fejrdp3lccngwuvx5uyltkqm2aw3gdmmprswxvl56.py", line 49, in <module>
    inductor_entry = CppWrapperCodeCache.load_pybinding(
  File "/workspace/pytorch/torch/_inductor/codecache.py", line 3438, in load_pybinding
    return cls.load_pybinding_async(*args, **kwargs)()
  File "/workspace/pytorch/torch/_inductor/codecache.py", line 3430, in future
    result = get_result()
  File "/workspace/pytorch/torch/_inductor/codecache.py", line 3217, in load_fn
    result = worker_fn()
  File "/workspace/pytorch/torch/_inductor/codecache.py", line 3246, in _worker_compile_cpp
    builder.build()
  File "/workspace/pytorch/torch/_inductor/cpp_builder.py", line 2261, in build
    run_compile_cmd(build_cmd, cwd=_build_tmp_dir)
  File "/workspace/pytorch/torch/_inductor/cpp_builder.py", line 688, in run_compile_cmd
    _run_compile_cmd(cmd_line, cwd)
  File "/workspace/pytorch/torch/_inductor/cpp_builder.py", line 683, in _run_compile_cmd
    raise exc.CppCompileError(cmd, output) from e
torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command:
g++ /tmp/torchinductor_root/yw/cywameyniwb3p7nqukrvnppidg4ygr7rkrj4ieekv2oxasdq7tiw.main.cpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDALONE_TORCH_HEADER -D TORCH_INDUCTOR_PRECOMPILE_HEADERS -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_AVX512 -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimizations -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fexcess-precision=fast -fno-tree-loop-vectorize -march=native -shared -fPIC -Wall -std=c++20 -Wno-unused-variable -Wno-unknown-pragmas -pedantic -fopenmp -include /tmp/torchinductor_root/precompiled_headers/c2djtqjhmnlmkqu6kqins7k5tjvqjv5vlec7yrx3cbvoqhgxwair.h -I/opt/conda/include/python3.10 -I/workspace/pytorch/torch/include -I/workspace/pytorch/torch/include/torch/csrc/api/include -mavx512f -mavx512dq -mavx512vl -mavx512bw -mfma -mavx512vnni -mavx512vl -mamx-tile -mamx-bf16 -mamx-int8 -mavx512bf16 -o /tmp/torchinductor_root/yw/cywameyniwb3p7nqukrvnppidg4ygr7rkrj4ieekv2oxasdq7tiw.main.so -ltorch -ltorch_cpu -ltorch_python -lgomp -L/opt/conda/lib -L/workspace/pytorch/torch/lib

Output:
In file included from /workspace/pytorch/torch/include/torch/csrc/inductor/cpp_wrapper/common.h:81,
                 from /workspace/pytorch/torch/include/torch/csrc/inductor/cpp_wrapper/cpu.h:3,
                 from /tmp/torchinductor_root/precompiled_headers/c2djtqjhmnlmkqu6kqins7k5tjvqjv5vlec7yrx3cbvoqhgxwair.h:1:
/workspace/pytorch/torch/include/torch/csrc/inductor/aoti_runtime/utils.h:466:47: error: â€˜aoti_torch_dtype_quint8â€™ was not declared in this scope; did you mean â€˜aoti_torch_dtype_uint8â€™?
  466 |   static auto cached_torch_dtype_##typename = aoti_torch_dtype_##typename()
      |                                               ^~~~~~~~~~~~~~~~~
/tmp/torchinductor_root/yw/cywameyniwb3p7nqukrvnppidg4ygr7rkrj4ieekv2oxasdq7tiw.main.cpp:5:1: note: in expansion of macro â€˜CACHE_TORCH_DTYPEâ€™
    5 | CACHE_TORCH_DTYPE(quint8);
      | ^~~~~~~~~~~~~~~~~


Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

warmup_failed
0

---

loading model: 0it [00:00, ?it/s]
loading model: 0it [00:02, ?it/s]
cpu  eval  resnet50_quantized_qat             
W0511 02:44:31.569000 297615 /workspace/pytorch/torch/_inductor/cudagraph_utils.py:207] [0/0_1] [__cudagraphs] skipping cudagraphs due to cpp wrapper enabled
W0511 02:44:32.147000 297615 /workspace/pytorch/torch/_dynamo/utils.py:3218] [2/0] Encountered exception (quantized nyi in meta tensors) during fake tensor propagation.
W0511 02:44:32.155000 297615 /workspace/pytorch/torch/_dynamo/utils.py:3218] [2/0_1] Encountered exception (quantized nyi in meta tensors) during fake tensor propagation.
W0511 02:44:32.204000 297615 /workspace/pytorch/torch/_inductor/cudagraph_utils.py:207] [3/0] [__cudagraphs] skipping cudagraphs due to cpp wrapper enabled

running benchmark:   0%|          | 0/50 [00:00<?, ?it/s]
running benchmark:   6%|â–Œ         | 3/50 [00:00<00:02, 23.31it/s]
running benchmark:  12%|â–ˆâ–        | 6/50 [00:00<00:01, 23.40it/s]
running benchmark:  18%|â–ˆâ–Š        | 9/50 [00:00<00:01, 23.35it/s]
running benchmark:  24%|â–ˆâ–ˆâ–       | 12/50 [00:00<00:01, 23.45it/s]
running benchmark:  30%|â–ˆâ–ˆâ–ˆ       | 15/50 [00:00<00:01, 23.42it/s]
running benchmark:  36%|â–ˆâ–ˆâ–ˆâ–Œ      | 18/50 [00:00<00:01, 23.43it/s]
running benchmark:  42%|â–ˆâ–ˆâ–ˆâ–ˆâ–     | 21/50 [00:00<00:01, 23.49it/s]
running benchmark:  48%|â–ˆâ–ˆâ–ˆâ–ˆâ–Š     | 24/50 [00:01<00:01, 23.49it/s]
running benchmark:  54%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–    | 27/50 [00:01<00:00, 23.52it/s]
running benchmark:  60%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ    | 30/50 [00:01<00:00, 23.56it/s]
running benchmark:  66%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–Œ   | 33/50 [00:01<00:00, 23.49it/s]
running benchmark:  72%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–  | 36/50 [00:01<00:00, 23.49it/s]
running benchmark:  78%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–Š  | 39/50 [00:01<00:00, 23.40it/s]
running benchmark:  84%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ– | 42/50 [00:01<00:00, 23.34it/s]
running benchmark:  90%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ | 45/50 [00:01<00:00, 23.45it/s]
running benchmark:  96%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–Œ| 48/50 [00:02<00:00, 23.54it/s]
running benchmark: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 50/50 [00:02<00:00, 23.47it/s]
WARNING:common:Trying to call the empty_gpu_cache for device: cpu, which is not in list [cuda, xpu]
0.984x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles,cudagraph_skips
cpu,resnet50_quantized_qat,1,0.984405,21.007918,0.792060,0.960393,95.348326,99.280486,4,2,3,2,0,0,2

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Affected models: CPP wrapper for mobilenet_v2_quantized_qat, resnet50_quantized_qat

the bad commit: https://github.com/pytorch/pytorch/commit/0e0d2db2551178964297e106b92ab4033d3ea190

loading model: 0it [00:00, ?it/s]
loading model: 0it [00:02, ?it/s]
cpu  eval  resnet50_quantized_qat             
W0511 02:40:26.117000 277152 /workspace/pytorch/torch/_inductor/cudagraph_utils.py:207] [0/0_1] [__cudagraphs] skipping cudagraphs due to cpp wrapper enabled
W0511 02:40:26.647000 277152 /workspace/pytorch/torch/_inductor/cudagraph_utils.py:207] [1/0_1] [__cudagraphs] skipping cudagraphs due to cpp wrapper enabled
W0511 02:40:27.119000 277152 /workspace/pytorch/torch/_inductor/ir.py:8704] [1/0_1] aten.quantize_per_tensor.tensor_qparams is missing a c-shim implementation, using proxy executor as fallback
ERROR:common:Backend dynamo failed in warmup()
Traceback (most recent call last):
  File "/workspace/pytorch/benchmarks/dynamo/common.py", line 2786, in warmup
    fn(model, example_inputs)
  File "/workspace/pytorch/torch/_dynamo/eval_frame.py", line 1059, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/workspace/pytorch/torch/_inductor/compile_fx.py", line 1069, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/workspace/pytorch/torch/_inductor/compile_fx.py", line 1049, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/workspace/pytorch/torch/_inductor/compile_fx.py", line 1832, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
  File "/workspace/pytorch/torch/_inductor/compile_fx.py", line 1593, in codegen_and_compile
    compiled_module = graph.compile_to_module()
  File "/workspace/pytorch/torch/_inductor/graph.py", line 2612, in compile_to_module
    return self._compile_to_module()
  File "/workspace/pytorch/torch/_inductor/graph.py", line 2622, in _compile_to_module
    mod = self._compile_to_module_lines(wrapper_code)
  File "/workspace/pytorch/torch/_inductor/graph.py", line 2697, in _compile_to_module_lines
    mod = PyCodeCache.load_by_key_path(
  File "/workspace/pytorch/torch/_inductor/codecache.py", line 3961, in load_by_key_path
    mod = _reload_python_module(key, path, set_sys_modules=in_toplevel)
  File "/workspace/pytorch/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module
    exec(code, mod.__dict__, mod.__dict__)
  File "/tmp/torchinductor_root/47/c47qxp3zo27fejrdp3lccngwuvx5uyltkqm2aw3gdmmprswxvl56.py", line 49, in <module>
    inductor_entry = CppWrapperCodeCache.load_pybinding(
  File "/workspace/pytorch/torch/_inductor/codecache.py", line 3438, in load_pybinding
    return cls.load_pybinding_async(*args, **kwargs)()
  File "/workspace/pytorch/torch/_inductor/codecache.py", line 3430, in future
    result = get_result()
  File "/workspace/pytorch/torch/_inductor/codecache.py", line 3217, in load_fn
    result = worker_fn()
  File "/workspace/pytorch/torch/_inductor/codecache.py", line 3246, in _worker_compile_cpp
    builder.build()
  File "/workspace/pytorch/torch/_inductor/cpp_builder.py", line 2261, in build
    run_compile_cmd(build_cmd, cwd=_build_tmp_dir)
  File "/workspace/pytorch/torch/_inductor/cpp_builder.py", line 688, in run_compile_cmd
    _run_compile_cmd(cmd_line, cwd)
  File "/workspace/pytorch/torch/_inductor/cpp_builder.py", line 683, in _run_compile_cmd
    raise exc.CppCompileError(cmd, output) from e
torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command:
g++ /tmp/torchinductor_root/yw/cywameyniwb3p7nqukrvnppidg4ygr7rkrj4ieekv2oxasdq7tiw.main.cpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDALONE_TORCH_HEADER -D TORCH_INDUCTOR_PRECOMPILE_HEADERS -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_AVX512 -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimizations -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fexcess-precision=fast -fno-tree-loop-vectorize -march=native -shared -fPIC -Wall -std=c++20 -Wno-unused-variable -Wno-unknown-pragmas -pedantic -fopenmp -include /tmp/torchinductor_root/precompiled_headers/c2djtqjhmnlmkqu6kqins7k5tjvqjv5vlec7yrx3cbvoqhgxwair.h -I/opt/conda/include/python3.10 -I/workspace/pytorch/torch/include -I/workspace/pytorch/torch/include/torch/csrc/api/include -mavx512f -mavx512dq -mavx512vl -mavx512bw -mfma -mavx512vnni -mavx512vl -mamx-tile -mamx-bf16 -mamx-int8 -mavx512bf16 -o /tmp/torchinductor_root/yw/cywameyniwb3p7nqukrvnppidg4ygr7rkrj4ieekv2oxasdq7tiw.main.so -ltorch -ltorch_cpu -ltorch_python -lgomp -L/opt/conda/lib -L/workspace/pytorch/torch/lib

Output:
In file included from /workspace/pytorch/torch/include/torch/csrc/inductor/cpp_wrapper/common.h:81,
                 from /workspace/pytorch/torch/include/torch/csrc/inductor/cpp_wrapper/cpu.h:3,
                 from /tmp/torchinductor_root/precompiled_headers/c2djtqjhmnlmkqu6kqins7k5tjvqjv5vlec7yrx3cbvoqhgxwair.h:1:
/workspace/pytorch/torch/include/torch/csrc/inductor/aoti_runtime/utils.h:466:47: error: â€˜aoti_torch_dtype_quint8â€™ was not declared in this scope; did you mean â€˜aoti_torch_dtype_uint8â€™?
  466 |   static auto cached_torch_dtype_##typename = aoti_torch_dtype_##typename()
      |                                               ^~~~~~~~~~~~~~~~~
/tmp/torchinductor_root/yw/cywameyniwb3p7nqukrvnppidg4ygr7rkrj4ieekv2oxasdq7tiw.main.cpp:5:1: note: in expansion of macro â€˜CACHE_TORCH_DTYPEâ€™
    5 | CACHE_TORCH_DTYPE(quint8);
      | ^~~~~~~~~~~~~~~~~


Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

warmup_failed
0

the last good commit: https://github.com/pytorch/pytorch/commit/ad4138ab2e9bc15bf840c331df5701b46ee47936

loading model: 0it [00:00, ?it/s]
loading model: 0it [00:02, ?it/s]
cpu  eval  resnet50_quantized_qat             
W0511 02:44:31.569000 297615 /workspace/pytorch/torch/_inductor/cudagraph_utils.py:207] [0/0_1] [__cudagraphs] skipping cudagraphs due to cpp wrapper enabled
W0511 02:44:32.147000 297615 /workspace/pytorch/torch/_dynamo/utils.py:3218] [2/0] Encountered exception (quantized nyi in meta tensors) during fake tensor propagation.
W0511 02:44:32.155000 297615 /workspace/pytorch/torch/_dynamo/utils.py:3218] [2/0_1] Encountered exception (quantized nyi in meta tensors) during fake tensor propagation.
W0511 02:44:32.204000 297615 /workspace/pytorch/torch/_inductor/cudagraph_utils.py:207] [3/0] [__cudagraphs] skipping cudagraphs due to cpp wrapper enabled

running benchmark:   0%|          | 0/50 [00:00<?, ?it/s]
running benchmark:   6%|â–Œ         | 3/50 [00:00<00:02, 23.31it/s]
running benchmark:  12%|â–ˆâ–        | 6/50 [00:00<00:01, 23.40it/s]
running benchmark:  18%|â–ˆâ–Š        | 9/50 [00:00<00:01, 23.35it/s]
running benchmark:  24%|â–ˆâ–ˆâ–       | 12/50 [00:00<00:01, 23.45it/s]
running benchmark:  30%|â–ˆâ–ˆâ–ˆ       | 15/50 [00:00<00:01, 23.42it/s]
running benchmark:  36%|â–ˆâ–ˆâ–ˆâ–Œ      | 18/50 [00:00<00:01, 23.43it/s]
running benchmark:  42%|â–ˆâ–ˆâ–ˆâ–ˆâ–     | 21/50 [00:00<00:01, 23.49it/s]
running benchmark:  48%|â–ˆâ–ˆâ–ˆâ–ˆâ–Š     | 24/50 [00:01<00:01, 23.49it/s]
running benchmark:  54%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–    | 27/50 [00:01<00:00, 23.52it/s]
running benchmark:  60%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ    | 30/50 [00:01<00:00, 23.56it/s]
running benchmark:  66%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–Œ   | 33/50 [00:01<00:00, 23.49it/s]
running benchmark:  72%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–  | 36/50 [00:01<00:00, 23.49it/s]
running benchmark:  78%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–Š  | 39/50 [00:01<00:00, 23.40it/s]
running benchmark:  84%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ– | 42/50 [00:01<00:00, 23.34it/s]
running benchmark:  90%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ | 45/50 [00:01<00:00, 23.45it/s]
running benchmark:  96%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–Œ| 48/50 [00:02<00:00, 23.54it/s]
running benchmark: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 50/50 [00:02<00:00, 23.47it/s]
WARNING:common:Trying to call the empty_gpu_cache for device: cpu, which is not in list [cuda, xpu]
0.984x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles,cudagraph_skips
cpu,resnet50_quantized_qat,1,0.984405,21.007918,0.792060,0.960393,95.348326,99.280486,4,2,3,2,0,0,2

Versions

</table><p>SW info</p><table border="1" class="dataframe table"> <thead> <tr style="text-align: right;"> <th>name</th> <th>target_branch</th> <th>target_commit</th> <th>refer_branch</th> <th>refer_commit</th> </tr> </thead> <tbody> <tr> <td>torchbench</td> <td>main</td> <td>9bfeccbf</td> <td>main</td> <td>8f512254</td> </tr> <tr> <td>torch</td> <td>main</td> <td>618c61e942d5dc873647914bdbd8107f650118f2</td> <td>main</td> <td>604945687469b5eabbe5f0dde02b303bf8a1d60b</td> </tr> <tr> <td>torchvision</td> <td>main</td> <td>0.27.0a0+499ca51</td> <td>main</td> <td>0.27.0a0+499ca51</td> </tr> <tr> <td>torchtext</td> <td>main</td> <td>0.16.0a0+b0ebddc</td> <td>main</td> <td>0.16.0a0+b0ebddc</td> </tr> <tr> <td>torchaudio</td> <td>main</td> <td>2.11.0a0+c0cbdb9</td> <td>main</td> <td>2.11.0a0+c0cbdb9</td> </tr> <tr> <td>torchdata</td> <td>main</td> <td>0.7.1a0+0790338</td> <td>main</td> <td>0.7.1a0+0790338</td> </tr> <tr> <td>dynamo_benchmarks</td> <td>main</td> <td>nightly</td> <td>main</td> <td>nightly</td> </tr> </tbody> </table> </table>

Repro: inductor_single_run.sh bash inductor_single_run.sh multiple inference performance torchbench resnet50_quantized_qat amp first static cpp Suspected guilty commit: https://github.com/pytorch/pytorch/commit/0e0d2db2551178964297e106b92ab4033d3ea190 torchbench-resnet50_quantized_qat-inference-amp-static-cpp-single-performance-crash_guilty_commit.log

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01 @chauhang @penguinwu @voznesenskym @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo @LifengWang

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #optimization #index setup #retrieval issue #search optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix [inductor][cpu] resnet50_quantized_qat CPP wrapper torch._inductor.exc.InductorError: CppCompileError: C++ compile error in 2026-05-11 nightly release

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

🐛 Describe the bug

Versions

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix [inductor][cpu] resnet50_quantized_qat CPP wrapper torch._inductor.exc.InductorError: CppCompileError: C++ compile error in 2026-05-11 nightly release

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

🐛 Describe the bug

Versions

Still need to ship something?

RELATED_DISCOVERY

TRENDING