vllm - ✅(Solved) Fix [Bug]: Runtime error on ROCm platform serving Deepseek-R1 using VLLM_ROCM_USE_AITER=1 [1 pull requests, 5 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39485Fetched 2026-04-11 06:13:21
View on GitHub
Comments
5
Participants
3
Timeline
22
Reactions
0
Author
Timeline (top)
commented ×5mentioned ×5subscribed ×5added_to_project_v2 ×2

Error Message

</details>

🐛 Describe the bug

Deepseek serve fails during capturing graph

When using AITER v0.1.12, the gemm_a8w8_blockscale CK kernel raises RuntimeError: This GEMM is not supported! during CUDA graph capture. This does not occur with the older AITER version (v0.10.post2).

vllm version tested: main branch commit 55d037e2e5cc56c38a1a4a77a15c347fee380c50

when running the following command:

Fix Action

Fix / Workaround

Capturing CUDA graphs (PIECEWISE):  78%|███████▊  | 40/51 [00:40<00:11,  1.02s/it]
(Worker_TP0 pid=11838) INFO 04-10 06:05:09 [custom_all_reduce.py:215] Registering 4920 cuda graph addresses
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:2112, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:3072, K:1536, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:7168, K:2048, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:4608, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:7168, K:2304, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:512, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:7168, K:256, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) INFO 04-10 06:05:09 [custom_all_reduce.py:215] Registering 4920 cuda graph addresses
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:2112, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:3072, K:1536, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:7168, K:2048, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:4608, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:7168, K:2304, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:512, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:7168, K:256, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:2112, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:3072, K:1536, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:7168, K:2048, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:4608, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:7168, K:2304, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:512, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:7168, K:256, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:2112, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:3072, K:1536, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:7168, K:2048, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:4608, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:7168, K:2304, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:512, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:7168, K:256, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:2112, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:3072, K:1536, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:7168, K:2048, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:4608, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:7168, K:2304, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:512, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:7168, K:256, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) INFO 04-10 06:05:11 [custom_all_reduce.py:215] Registering 4920 cuda graph addresses
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:2112, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:3072, K:1536, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:7168, K:2048, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:4608, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:7168, K:2304, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:512, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:7168, K:256, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) INFO 04-10 06:05:12 [custom_all_reduce.py:215] Registering 4920 cuda graph addresses
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971] WorkerProc hit an exception.
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971] Traceback (most recent call last):
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/v1/executor/multiproc_executor.py", line 966, in worker_busy_loop
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     output = func(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/v1/worker/gpu_worker.py", line 588, in compile_or_warm_up_model
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     cuda_graph_memory_bytes = self.model_runner.capture_model()
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/v1/worker/gpu/model_runner.py", line 563, in capture_model
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     self.cudagraph_manager.capture(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/v1/worker/gpu/cudagraph_utils.py", line 380, in capture
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     super().capture(create_forward_fn, progress_bar_desc)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/v1/worker/gpu/cudagraph_utils.py", line 206, in capture
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     forward_fn(CUDAGraphMode.NONE)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/v1/worker/gpu/cudagraph_utils.py", line 344, in forward_fn
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     model_output = model(**model_inputs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                    ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._call_impl(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return forward_call(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/models/deepseek_v2.py", line 1434, in forward
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     hidden_states = self.model(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                     ^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/compilation/decorators.py", line 480, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.aot_compiled_fn(self, *args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 124, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.fn(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/models/deepseek_v2.py", line 1228, in forward
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     def forward(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/compilation/caching.py", line 211, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.optimized_call(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 936, in call_wrapped
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 455, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     raise e
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 442, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._call_impl(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return forward_call(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "<eval_with_key>.226", line 725, in forward
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     submod_16 = self.submod_16(getitem_18, s72, l_self_modules_layers_modules_3_modules_self_attn_modules_mla_attn_modules_o_proj_parameters_weight_, l_self_modules_layers_modules_3_modules_self_attn_modules_mla_attn_modules_o_proj_parameters_weight_scale_inv_, l_self_modules_layers_modules_3_modules_post_attention_layernorm_parameters_weight_, getitem_19, l_self_modules_layers_modules_4_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_scale_inv_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_a_layernorm_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_scale_inv_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_kv_a_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_rotary_emb_buffers_cos_sin_cache_, l_positions_);  getitem_18 = l_self_modules_layers_modules_3_modules_self_attn_modules_mla_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_3_modules_self_attn_modules_mla_attn_modules_o_proj_parameters_weight_scale_inv_ = l_self_modules_layers_modules_3_modules_post_attention_layernorm_parameters_weight_ = getitem_19 = l_self_modules_layers_modules_4_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_scale_inv_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_a_layernorm_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_scale_inv_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_kv_a_layernorm_parameters_weight_ = None
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                 
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.runnable(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/compilation/piecewise_backend.py", line 367, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return range_entry.runnable(*args)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 122, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._compiled_fn(*args)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return fn(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return compiled_fn(full_args)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.compiled_fn(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     all_outs = call_func_at_runtime_with_args(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     out = normalize_as_list(f(args))
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                             ^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return compiled_fn(runtime_args)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 638, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.current_callable(inputs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 3220, in run
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     out = model(new_inputs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]           ^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/tmp/torchinductor_root/v7/cv7lop7uz4esbn75eo5jvh3n3bkseh4chntxb45u7ux37nf355vx.py", line 664, in call
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     buf10 = torch.ops.vllm.moe_forward_shared.default(buf8, buf8, buf8, 'from_forward_context')
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 819, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._op(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/fused_moe/runner/moe_runner_base.py", line 113, in _moe_forward_shared
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return layer.runner.forward_dispatch(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/fused_moe/runner/moe_runner_base.py", line 519, in forward_dispatch
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._forward_impl(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 120, in _forward_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     shared_output, hidden_states = self._apply_quant_method(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/fused_moe/runner/moe_runner_base.py", line 391, in _apply_quant_method
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     self._maybe_apply_shared_experts(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/fused_moe/runner/moe_runner_base.py", line 379, in _maybe_apply_shared_experts
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     self._shared_experts.apply(shared_experts_input, order)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/fused_moe/runner/shared_experts.py", line 198, in apply
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     self._output[self._output_idx] = self._layer(shared_experts_input)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._call_impl(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return forward_call(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/models/deepseek_v2.py", line 233, in forward
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     x, _ = self.down_proj(x)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._call_impl(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return forward_call(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/linear.py", line 1536, in forward
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     output_parallel = self.quant_method.apply(self, input_parallel, bias_)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/quantization/fp8.py", line 474, in apply
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.fp8_linear.apply_weights(layer, x, bias)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/kernels/linear/scaled_mm/BlockScaledMMLinearKernel.py", line 132, in apply_weights
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     output = self.apply_block_scaled_mm(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/kernels/linear/scaled_mm/aiter.py", line 194, in apply_block_scaled_mm
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return gemm_a8w8_blockscale_op(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/_aiter_ops.py", line 1528, in gemm_a8w8_blockscale
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return torch.ops.vllm.rocm_aiter_gemm_a8w8_blockscale(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1209, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._op(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/_aiter_ops.py", line 536, in _rocm_aiter_gemm_a8w8_blockscale_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return gemm_a8w8_blockscale(A, B, As, Bs, dtype=output_dtype)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/utils/torch_guard.py", line 278, in wrapper_custom
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     getattr(torch.ops.aiter, f"{loadName}")(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1209, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._op(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/utils/torch_guard.py", line 301, in outer_wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     wrapper(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/utils/torch_guard.py", line 196, in wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/ops/gemm_op_a8w8.py", line 619, in gemm_a8w8_blockscale
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return gemm_a8w8_blockscale_ck(XQ, WQ, x_scale, w_scale, Y)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/utils/torch_guard.py", line 278, in wrapper_custom
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     getattr(torch.ops.aiter, f"{loadName}")(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1209, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._op(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/utils/torch_guard.py", line 301, in outer_wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     wrapper(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/utils/torch_guard.py", line 196, in wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/core.py", line 1442, in custom_wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return wrapper(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/core.py", line 1438, in wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return op(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971] RuntimeError: This GEMM is not supported!

PR fix notes

PR #39509: [ROCm] [AITER] Revert AITER version to v0.1.10.post3

Description (problem / solution / changelog)

Purpose

The AITER v0.1.12 tag is moving https://github.com/ROCm/aiter/issues/2691 .

Moreover, there are many known issues with the initial commit of v0.1.12:

  1. DeepSeek blockscaled gemm RuntimeError: This GEMM is not supported! https://github.com/vllm-project/vllm/issues/39485

  2. https://github.com/vllm-project/vllm/issues/39303

Test Plan

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • docker/Dockerfile.rocm_base (modified, +1/-1)

Code Example

Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0
Clang version                : 22.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-7.2.1 26084 f58b06dce1f9c15707c5f808fd002e18c2accf7e)
CMake version                : version 3.31.10
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+git8514f05
Is debug build               : False
CUDA used to build PyTorch   : N/A
ROCM used to build PyTorch   : 7.2.53211

==============================
      Python Environment
==============================
Python version               : 3.12.13 (main, Mar  4 2026, 09:23:07) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-5.15.0-116-generic-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : Could not collect
CUDA_MODULE_LOADING set to   : 
GPU models and configuration :  (gfx942:sramecc+:xnack-)
Nvidia driver version        : Could not collect
cuDNN version                : Could not collect
HIP runtime version          : 7.2.53211
MIOpen runtime version       : 3.5.1
Is XNNPACK available         : True

---

</details>


## 🐛 Describe the bug

### Deepseek serve fails during capturing graph

When using AITER v0.1.12, the `gemm_a8w8_blockscale` CK kernel raises `RuntimeError: This GEMM is not supported!` during CUDA graph capture. This does not occur with the older AITER version (v0.10.post2).

vllm version tested: main branch commit `55d037e2e5cc56c38a1a4a77a15c347fee380c50`

when running the following command:

---

The following error occurs
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0
Clang version                : 22.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-7.2.1 26084 f58b06dce1f9c15707c5f808fd002e18c2accf7e)
CMake version                : version 3.31.10
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+git8514f05
Is debug build               : False
CUDA used to build PyTorch   : N/A
ROCM used to build PyTorch   : 7.2.53211

==============================
      Python Environment
==============================
Python version               : 3.12.13 (main, Mar  4 2026, 09:23:07) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-5.15.0-116-generic-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : Could not collect
CUDA_MODULE_LOADING set to   : 
GPU models and configuration :  (gfx942:sramecc+:xnack-)
Nvidia driver version        : Could not collect
cuDNN version                : Could not collect
HIP runtime version          : 7.2.53211
MIOpen runtime version       : 3.5.1
Is XNNPACK available         : True```
</details>

🐛 Describe the bug

Deepseek serve fails during capturing graph

When using AITER v0.1.12, the gemm_a8w8_blockscale CK kernel raises RuntimeError: This GEMM is not supported! during CUDA graph capture. This does not occur with the older AITER version (v0.10.post2).

vllm version tested: main branch commit 55d037e2e5cc56c38a1a4a77a15c347fee380c50

when running the following command:

export SAFETENSORS_FAST_GPU=1
export VLLM_ROCM_USE_AITER=1
export VLLM_USE_V2_MODEL_RUNNER=1

vllm serve deepseek-ai/DeepSeek-R1 \
    --tensor-parallel-size 8 \
    --trust-remote-code \
    --compilation_config.pass_config.fuse_act_quant=false

The following error occurs

Capturing CUDA graphs (PIECEWISE):  78%|███████▊  | 40/51 [00:40<00:11,  1.02s/it]
(Worker_TP0 pid=11838) INFO 04-10 06:05:09 [custom_all_reduce.py:215] Registering 4920 cuda graph addresses
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:2112, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:3072, K:1536, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:7168, K:2048, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:4608, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:7168, K:2304, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:512, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) [aiter] shape is M:64, N:7168, K:256, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP5 pid=11843) INFO 04-10 06:05:09 [custom_all_reduce.py:215] Registering 4920 cuda graph addresses
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:2112, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:3072, K:1536, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:7168, K:2048, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:4608, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:7168, K:2304, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:512, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:80, N:7168, K:256, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:2112, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:3072, K:1536, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:7168, K:2048, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:4608, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:7168, K:2304, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:512, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:72, N:7168, K:256, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:2112, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:3072, K:1536, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:7168, K:2048, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:4608, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:7168, K:2304, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:512, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:72, N:7168, K:256, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:2112, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:3072, K:1536, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:7168, K:2048, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:4608, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:7168, K:2304, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:512, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) [aiter] shape is M:64, N:7168, K:256, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP3 pid=11841) INFO 04-10 06:05:11 [custom_all_reduce.py:215] Registering 4920 cuda graph addresses
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:2112, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:3072, K:1536, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:7168, K:2048, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:4608, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:7168, K:2304, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:512, K:7168, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) [aiter] shape is M:64, N:7168, K:256, not found tuned config in /tmp/aiter_configs/a8w8_blockscale_tuned_gemm.csv, will use default config!
(Worker_TP4 pid=11842) INFO 04-10 06:05:12 [custom_all_reduce.py:215] Registering 4920 cuda graph addresses
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971] WorkerProc hit an exception.
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971] Traceback (most recent call last):
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/v1/executor/multiproc_executor.py", line 966, in worker_busy_loop
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     output = func(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/v1/worker/gpu_worker.py", line 588, in compile_or_warm_up_model
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     cuda_graph_memory_bytes = self.model_runner.capture_model()
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/v1/worker/gpu/model_runner.py", line 563, in capture_model
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     self.cudagraph_manager.capture(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/v1/worker/gpu/cudagraph_utils.py", line 380, in capture
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     super().capture(create_forward_fn, progress_bar_desc)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/v1/worker/gpu/cudagraph_utils.py", line 206, in capture
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     forward_fn(CUDAGraphMode.NONE)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/v1/worker/gpu/cudagraph_utils.py", line 344, in forward_fn
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     model_output = model(**model_inputs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                    ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._call_impl(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return forward_call(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/models/deepseek_v2.py", line 1434, in forward
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     hidden_states = self.model(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                     ^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/compilation/decorators.py", line 480, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.aot_compiled_fn(self, *args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 124, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.fn(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/models/deepseek_v2.py", line 1228, in forward
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     def forward(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/compilation/caching.py", line 211, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.optimized_call(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 936, in call_wrapped
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 455, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     raise e
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 442, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._call_impl(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return forward_call(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "<eval_with_key>.226", line 725, in forward
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     submod_16 = self.submod_16(getitem_18, s72, l_self_modules_layers_modules_3_modules_self_attn_modules_mla_attn_modules_o_proj_parameters_weight_, l_self_modules_layers_modules_3_modules_self_attn_modules_mla_attn_modules_o_proj_parameters_weight_scale_inv_, l_self_modules_layers_modules_3_modules_post_attention_layernorm_parameters_weight_, getitem_19, l_self_modules_layers_modules_4_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_scale_inv_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_a_layernorm_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_scale_inv_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_kv_a_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_rotary_emb_buffers_cos_sin_cache_, l_positions_);  getitem_18 = l_self_modules_layers_modules_3_modules_self_attn_modules_mla_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_3_modules_self_attn_modules_mla_attn_modules_o_proj_parameters_weight_scale_inv_ = l_self_modules_layers_modules_3_modules_post_attention_layernorm_parameters_weight_ = getitem_19 = l_self_modules_layers_modules_4_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_scale_inv_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_a_layernorm_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_scale_inv_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_kv_a_layernorm_parameters_weight_ = None
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                 
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.runnable(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/compilation/piecewise_backend.py", line 367, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return range_entry.runnable(*args)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 122, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._compiled_fn(*args)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return fn(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return compiled_fn(full_args)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.compiled_fn(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     all_outs = call_func_at_runtime_with_args(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     out = normalize_as_list(f(args))
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                             ^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return compiled_fn(runtime_args)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/output_code.py", line 638, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.current_callable(inputs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/utils.py", line 3220, in run
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     out = model(new_inputs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]           ^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/tmp/torchinductor_root/v7/cv7lop7uz4esbn75eo5jvh3n3bkseh4chntxb45u7ux37nf355vx.py", line 664, in call
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     buf10 = torch.ops.vllm.moe_forward_shared.default(buf8, buf8, buf8, 'from_forward_context')
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 819, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._op(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/fused_moe/runner/moe_runner_base.py", line 113, in _moe_forward_shared
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return layer.runner.forward_dispatch(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/fused_moe/runner/moe_runner_base.py", line 519, in forward_dispatch
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._forward_impl(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 120, in _forward_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     shared_output, hidden_states = self._apply_quant_method(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/fused_moe/runner/moe_runner_base.py", line 391, in _apply_quant_method
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     self._maybe_apply_shared_experts(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/fused_moe/runner/moe_runner_base.py", line 379, in _maybe_apply_shared_experts
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     self._shared_experts.apply(shared_experts_input, order)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/fused_moe/runner/shared_experts.py", line 198, in apply
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     self._output[self._output_idx] = self._layer(shared_experts_input)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._call_impl(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return forward_call(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/models/deepseek_v2.py", line 233, in forward
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     x, _ = self.down_proj(x)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._call_impl(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return forward_call(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/linear.py", line 1536, in forward
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     output_parallel = self.quant_method.apply(self, input_parallel, bias_)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/layers/quantization/fp8.py", line 474, in apply
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self.fp8_linear.apply_weights(layer, x, bias)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/kernels/linear/scaled_mm/BlockScaledMMLinearKernel.py", line 132, in apply_weights
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     output = self.apply_block_scaled_mm(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/model_executor/kernels/linear/scaled_mm/aiter.py", line 194, in apply_block_scaled_mm
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return gemm_a8w8_blockscale_op(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/_aiter_ops.py", line 1528, in gemm_a8w8_blockscale
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return torch.ops.vllm.rocm_aiter_gemm_a8w8_blockscale(
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1209, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._op(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/app/upstream/vllm/vllm/_aiter_ops.py", line 536, in _rocm_aiter_gemm_a8w8_blockscale_impl
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return gemm_a8w8_blockscale(A, B, As, Bs, dtype=output_dtype)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/utils/torch_guard.py", line 278, in wrapper_custom
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     getattr(torch.ops.aiter, f"{loadName}")(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1209, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._op(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/utils/torch_guard.py", line 301, in outer_wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     wrapper(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/utils/torch_guard.py", line 196, in wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/ops/gemm_op_a8w8.py", line 619, in gemm_a8w8_blockscale
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return gemm_a8w8_blockscale_ck(XQ, WQ, x_scale, w_scale, Y)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/utils/torch_guard.py", line 278, in wrapper_custom
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     getattr(torch.ops.aiter, f"{loadName}")(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1209, in __call__
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return self._op(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/utils/torch_guard.py", line 301, in outer_wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     wrapper(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/utils/torch_guard.py", line 196, in wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/core.py", line 1442, in custom_wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return wrapper(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]   File "/usr/local/lib/python3.12/dist-packages/aiter/jit/core.py", line 1438, in wrapper
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]     return op(*args, **kwargs)
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=11842) ERROR 04-10 06:05:12 [multiproc_executor.py:971] RuntimeError: This GEMM is not supported!

Environment

Docker image: rocm/vllm-dev:nightly AITER version: v0.1.12

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Downgrade AITER to version v0.10.post2 to potentially resolve the RuntimeError: This GEMM is not supported! issue during CUDA graph capture.

Guidance

  • The error message RuntimeError: This GEMM is not supported! suggests a compatibility issue with the AITER version used.
  • The fact that the issue does not occur with the older AITER version (v0.10.post2) implies that the problem might be specific to the newer version (v0.1.12).
  • To troubleshoot, try downgrading AITER to version v0.10.post2 and see if the issue persists.
  • If downgrading is not feasible, consider searching for similar issues or seeking help from the AITER community, as the error is likely related to the AITER library.

Notes

  • The issue seems to be related to the gemm_a8w8_blockscale CK kernel in AITER, which is not supported in the current version.
  • The provided stacktrace is extensive and points to various parts of the code, but the root cause appears to be the incompatibility with the AITER version.

Recommendation

Apply workaround: Downgrade AITER to version v0.10.post2, as it is known to work without this issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING