vllm - 💡(How to fix) Fix [Bug]: CUDA ILM (Illegal Memory Access) crash when enabling MTP num_speculative_tokens with >1 for zai-org/GLM-4.7-FP8 under load [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37570Fetched 2026-04-08 01:04:34
View on GitHub
Comments
1
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
closed ×1commented ×1cross-referenced ×1labeled ×1

Error Message

(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] WorkerProc hit an exception. (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Traceback (most recent call last): (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 875, in worker_busy_loop (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] output = func(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 365, in execute_model (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.worker.execute_model(scheduler_output) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/utils/contextlib.py", line 124, in decorate_context (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return func(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 728, in execute_model (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] output = self.model_runner.execute_model( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/utils/contextlib.py", line 124, in decorate_context (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return func(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3639, in execute_model (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] model_output = self.model_forward( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3152, in model_forward (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.model( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 223, in call (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.runnable(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in wrapped_call_impl (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.call_impl(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in call_impl (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return forward_call(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/models/glm4_moe.py", line 695, in forward (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] hidden_states = self.model( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 402, in call (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.aot_compiled_fn(self, *args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/dynamo/aot_compile.py", line 124, in call (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.fn(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/models/glm4_moe.py", line 452, in forward (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] def forward( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/caching.py", line 198, in call (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.optimized_call(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.wrapped_call(self, *args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in call (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] raise e (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in call (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc] (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in wrapped_call_impl (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.call_impl(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in call_impl (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return forward_call(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "<eval_with_key>.206", line 1142, in forward (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] submod_58 = self.submod_58(getitem_143, s72, l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight, l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_scale, l_self_modules_layers_modules_28_modules_post_attention_layernorm_parameters_weight, getitem_144, l_self_modules_layers_modules_28_modules_mlp_modules_gate_parameters_weight, l_self_modules_layers_modules_29_modules_input_layernorm_parameters_weight, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_scale, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_bias, l_self_modules_layers_modules_29_modules_self_attn_modules_q_norm_parameters_weight, l_self_modules_layers_modules_29_modules_self_attn_modules_k_norm_parameters_weight, l_positions, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache); getitem_143 = l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_scale_ = l_self_modules_layers_modules_28_modules_post_attention_layernorm_parameters_weight_ = getitem_144 = l_self_modules_layers_modules_28_modules_mlp_modules_gate_parameters_weight_ = l_self_modules_layers_modules_29_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_scale_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_bias_ = l_self_modules_layers_modules_29_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_k_norm_parameters_weight_ = None (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 223, in call (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.runnable(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/piecewise_backend.py", line 343, in call (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return range_entry.runnable(*args) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/compiler_interface.py", line 377, in compiled_graph_wrapper (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] graph_output = inductor_compiled_graph(*args) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 122, in call (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self._compiled_fn(*args) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 215, in <lambda> (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return CacheCompiledArtifact(lambda *args: compiled_fn(list(args)), None) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] all_outs = call_func_at_runtime_with_args( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] out = normalize_as_list(f(args)) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return compiled_fn(runtime_args) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 638, in call (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.current_callable(inputs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/utils.py", line 3220, in run (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] out = model(new_inputs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/tmp/torchinductor_ssm-user/2e/c2enszwdfcmx44gilnrcy62e5kl2b77byxuemtmlnd2laqqsw6zs.py", line 1844, in call (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] buf13 = torch.ops.vllm.moe_forward_shared.default(buf10, buf12, buf10, 'model.layers.28.mlp.experts') (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_ops.py", line 819, in call (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self._op(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 108, in _moe_forward_shared (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return layer.runner.forward_impl( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 689, in forward_impl (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] final_hidden_states = self.quant_method.apply( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py", line 982, in apply (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.moe_kernel.apply( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1753, in apply (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.impl.apply( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1537, in apply (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] fused_out = self._fused_experts( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1375, in _fused_experts (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] self.fused_experts.apply( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/fallback.py", line 174, in apply (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] experts.apply( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 357, in apply (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] run_cutlass_moe_fp8( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 193, in run_cutlass_moe_fp8 (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] a1q, a1q_scale, expert_first_token_offset, inv_perm, _ = moe_permute( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py", line 90, in moe_permute (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] a1q_scale = a1q_scale[permuted_idx.clamp(max=n_token * topk - 1) // topk] (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] torch.AcceleratorError: CUDA error: an illegal memory access was encountered (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Search for cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] For debugging consider passing CUDA_LAUNCH_BLOCKING=1 (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Compile with TORCH_USE_CUDA_DSAto enable device-side assertions. (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Traceback (most recent call last): (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 875, in worker_busy_loop (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] output = func(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 365, in execute_model (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.worker.execute_model(scheduler_output) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return func(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 728, in execute_model (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] output = self.model_runner.execute_model( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return func(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3639, in execute_model (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] model_output = self._model_forward( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3152, in _model_forward (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.model( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 223, in __call__ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.runnable(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self._call_impl(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return forward_call(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/models/glm4_moe.py", line 695, in forward (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] hidden_states = self.model( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 402, in __call__ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.aot_compiled_fn(self, *args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_dynamo/aot_compile.py", line 124, in __call__ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.fn(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/models/glm4_moe.py", line 452, in forward (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] def forward( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/caching.py", line 198, in __call__ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.optimized_call(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self._wrapped_call(self, *args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] raise e (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc] (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self._call_impl(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return forward_call(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "<eval_with_key>.206", line 1142, in forward (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] submod_58 = self.submod_58(getitem_143, s72, l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_, l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_scale_, l_self_modules_layers_modules_28_modules_post_attention_layernorm_parameters_weight_, getitem_144, l_self_modules_layers_modules_28_modules_mlp_modules_gate_parameters_weight_, l_self_modules_layers_modules_29_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_scale_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_bias_, l_self_modules_layers_modules_29_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); getitem_143 = l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_scale_ = l_self_modules_layers_modules_28_modules_post_attention_layernorm_parameters_weight_ = getitem_144 = l_self_modules_layers_modules_28_modules_mlp_modules_gate_parameters_weight_ = l_self_modules_layers_modules_29_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_scale_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_bias_ = l_self_modules_layers_modules_29_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_k_norm_parameters_weight_ = None (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 223, in __call__ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.runnable(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/piecewise_backend.py", line 343, in __call__ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return range_entry.runnable(*args) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/compiler_interface.py", line 377, in compiled_graph_wrapper (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] graph_output = inductor_compiled_graph(*args) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 122, in __call__ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self._compiled_fn(*args) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 215, in <lambda> (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return CacheCompiledArtifact(lambda *args: compiled_fn(list(args)), None) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] all_outs = call_func_at_runtime_with_args( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] out = normalize_as_list(f(args)) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return compiled_fn(runtime_args) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 638, in __call__ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.current_callable(inputs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/utils.py", line 3220, in run (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] out = model(new_inputs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/tmp/torchinductor_ssm-user/2e/c2enszwdfcmx44gilnrcy62e5kl2b77byxuemtmlnd2laqqsw6zs.py", line 1844, in call (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] buf13 = torch.ops.vllm.moe_forward_shared.default(buf10, buf12, buf10, 'model.layers.28.mlp.experts') (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_ops.py", line 819, in __call__ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self._op(*args, **kwargs) (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 108, in _moe_forward_shared (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return layer.runner.forward_impl( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 689, in forward_impl (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] final_hidden_states = self.quant_method.apply( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py", line 982, in apply (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.moe_kernel.apply( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1753, in apply (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] return self.impl.apply( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1537, in apply (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] fused_out = self._fused_experts( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1375, in _fused_experts (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] self.fused_experts.apply( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/fallback.py", line 174, in apply (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] experts.apply( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 357, in apply (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] run_cutlass_moe_fp8( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 193, in run_cutlass_moe_fp8 (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] a1q, a1q_scale, expert_first_token_offset, inv_perm, _ = moe_permute( (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py", line 90, in moe_permute (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] a1q_scale = a1q_scale[permuted_idx.clamp(max=n_token * topk - 1) // topk] (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] torch.AcceleratorError: CUDA error: an illegal memory access was encountered (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Search forcudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] For debugging consider passing CUDA_LAUNCH_BLOCKING=1 (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] (Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] [rank3]:[W319 14:38:20.896345916 CUDAGuardImpl.h:122] Warning: CUDA warning: an illegal memory access was encountered (function destroyEvent) terminate called after throwing an instance of 'c10::AcceleratorError' what(): CUDA error: an illegal memory access was encountered Search for cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from currentStreamCaptureStatusMayInitCtx at /pytorch/c10/cuda/CUDAGraphsC10Utils.h:71 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fcc84f72fdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0xc0e0 (0x7fccb77320e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so) frame #2: <unknown function> + 0xf2a97a (0x7fcb9892a97a in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so) frame #3: <unknown function> + 0x7e9d4 (0x7fcc84f549d4 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so) frame #4: c10::TensorImpl::~TensorImpl() + 0x9 (0x7fcc84f4e369 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so) frame #5: <unknown function> + 0x862f45 (0x7fcbeb062f45 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_python.so) frame #6: <unknown function> + 0x862fe1 (0x7fcbeb062fe1 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_python.so) frame #7: VLLM::Worker_TP3() [0x1635231] frame #8: VLLM::Worker_TP3() [0x163537b] frame #9: _PyEval_EvalFrameDefault + 0x968 (0x1612468 in VLLM::Worker_TP3) frame #10: PyEval_EvalCode + 0xe2 (0x1686ca2 in VLLM::Worker_TP3) frame #11: VLLM::Worker_TP3() [0x16ac442] frame #12: PyRun_StringFlags + 0x7e (0x16ac2d6 in VLLM::Worker_TP3) frame #13: PyRun_SimpleStringFlags + 0x3d (0x17713fd in VLLM::Worker_TP3) frame #14: VLLM::Worker_TP3() [0x1771391] frame #15: Py_RunMain + 0x291 (0x1770c67 in VLLM::Worker_TP3) frame #16: VLLM::Worker_TP3() [0x173ecfa] frame #17: VLLM::Worker_TP3() [0x173eaed] frame #18: <unknown function> + 0x2a610 (0x7fccc562a610 in /lib64/libc.so.6) frame #19: __libc_start_main + 0x80 (0x7fccc562a6c0 in /lib64/libc.so.6) frame #20: _start + 0x29 (0x17ab069 in VLLM::Worker_TP3)

Fix Action

Fix / Workaround

============================== CPU Info

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 192 On-line CPU(s) list: 0-191 Vendor ID: AuthenticAMD Model name: AMD EPYC 7R13 Processor CPU family: 25 Model: 1 Thread(s) per core: 2 Core(s) per socket: 48 Socket(s): 2 Stepping: 1 BogoMIPS: 5300.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save vaes vpclmulqdq rdpid Hypervisor vendor: KVM Virtualization type: full L1d cache: 3 MiB (96 instances) L1i cache: 3 MiB (96 instances) L2 cache: 48 MiB (96 instances) L3 cache: 384 MiB (12 instances) NUMA node(s): 4 NUMA node0 CPU(s): 0-23,96-119 NUMA node1 CPU(s): 24-47,120-143 NUMA node2 CPU(s): 48-71,144-167 NUMA node3 CPU(s): 72-95,168-191 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Mitigation; safe RET Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsa: Mitigation; Clear CPU buffers Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected

Code Example

==============================
        System Info
==============================
OS                           : Amazon Linux 2023.10.20260216 (x86_64)
GCC version                  : (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5)
Clang version                : Could not collect
CMake version                : version 3.22.2
Libc version                 : glibc-2.34

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.13 (main, Mar 10 2026, 18:17:25) [Clang 21.1.4 ] (64-bit runtime)
Python platform              : Linux-6.1.161-183.298.amzn2023.x86_64-x86_64-with-glibc2.34

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 13.0.88
CUDA_MODULE_LOADING set to   : 
GPU models and configuration : 
GPU 0: NVIDIA H200
GPU 1: NVIDIA H200
GPU 2: NVIDIA H200
GPU 3: NVIDIA H200
GPU 4: NVIDIA H200
GPU 5: NVIDIA H200
GPU 6: NVIDIA H200
GPU 7: NVIDIA H200

Nvidia driver version        : 580.126.09
cuDNN version                : Could not collect
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           48 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  192
On-line CPU(s) list:                     0-191
Vendor ID:                               AuthenticAMD
Model name:                              AMD EPYC 7R13 Processor
CPU family:                              25
Model:                                   1
Thread(s) per core:                      2
Core(s) per socket:                      48
Socket(s):                               2
Stepping:                                1
BogoMIPS:                                5300.00
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save vaes vpclmulqdq rdpid
Hypervisor vendor:                       KVM
Virtualization type:                     full
L1d cache:                               3 MiB (96 instances)
L1i cache:                               3 MiB (96 instances)
L2 cache:                                48 MiB (96 instances)
L3 cache:                                384 MiB (12 instances)
NUMA node(s):                            4
NUMA node0 CPU(s):                       0-23,96-119
NUMA node1 CPU(s):                       24-47,120-143
NUMA node2 CPU(s):                       48-71,144-167
NUMA node3 CPU(s):                       72-95,168-191
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Mitigation; safe RET
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Mitigation; Clear CPU buffers
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.4
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.18.0
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.4.2
[pip3] nvidia-cutlass-dsl-libs-base==4.4.2
[pip3] nvidia-ml-py==13.590.48
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvshmem-cu12==3.4.5
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] pyzmq==27.1.0
[pip3] torch==2.10.0
[pip3] torch-c-dlpack-ext==0.1.5
[pip3] torchaudio==2.10.0
[pip3] torchvision==0.25.0
[pip3] transformers==4.57.6
[pip3] triton==3.6.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.17.1
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	NV18	NV18	NV18	NV18	NV18	NV18	NV18	24-47,120-143	1		N/A
GPU1	NV18	 X 	NV18	NV18	NV18	NV18	NV18	NV18	24-47,120-143	1		N/A
GPU2	NV18	NV18	 X 	NV18	NV18	NV18	NV18	NV18	0-23,96-119	0		N/A
GPU3	NV18	NV18	NV18	 X 	NV18	NV18	NV18	NV18	0-23,96-119	0		N/A
GPU4	NV18	NV18	NV18	NV18	 X 	NV18	NV18	NV18	72-95,168-191	3		N/A
GPU5	NV18	NV18	NV18	NV18	NV18	 X 	NV18	NV18	72-95,168-191	3		N/A
GPU6	NV18	NV18	NV18	NV18	NV18	NV18	 X 	NV18	48-71,144-167	2		N/A
GPU7	NV18	NV18	NV18	NV18	NV18	NV18	NV18	 X 	48-71,144-167	2		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

---

vllm serve zai-org/GLM-4.7-FP8 \
--tensor-parallel-size 8 \
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 2 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--enable-auto-tool-choice \
--async-scheduling \
--enable-prefix-caching

---

vllm bench serve \
--model zai-org/GLM-4.7-FP8 \
--port 8000 \
--save-result \
--save-detailed \
--backend=vllm \
--dataset-name custom \
--dataset-path SOME_DATASET \
--disable-shuffle \
--metric-percentiles "50,90,95,99" \
--percentile-metrics "ttft,tpot,e2el" \
--result-dir "./vllm_bench_results/" \
--plot-dataset-stats \
--plot-timeline \
--request-rate 1

---

(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] WorkerProc hit an exception.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Traceback (most recent call last):
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 875, in worker_busy_loop
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     output = func(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]              ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 365, in execute_model
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.worker.execute_model(scheduler_output)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return func(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 728, in execute_model
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     output = self.model_runner.execute_model(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return func(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3639, in execute_model
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     model_output = self._model_forward(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                    ^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3152, in _model_forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.model(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 223, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.runnable(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._call_impl(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return forward_call(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/models/glm4_moe.py", line 695, in forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     hidden_states = self.model(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                     ^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 402, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.aot_compiled_fn(self, *args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_dynamo/aot_compile.py", line 124, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.fn(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/models/glm4_moe.py", line 452, in forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     def forward(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/caching.py", line 198, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.optimized_call(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._wrapped_call(self, *args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     raise e
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._call_impl(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return forward_call(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "<eval_with_key>.206", line 1142, in forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     submod_58 = self.submod_58(getitem_143, s72, l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_, l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_scale_, l_self_modules_layers_modules_28_modules_post_attention_layernorm_parameters_weight_, getitem_144, l_self_modules_layers_modules_28_modules_mlp_modules_gate_parameters_weight_, l_self_modules_layers_modules_29_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_scale_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_bias_, l_self_modules_layers_modules_29_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_);  getitem_143 = l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_scale_ = l_self_modules_layers_modules_28_modules_post_attention_layernorm_parameters_weight_ = getitem_144 = l_self_modules_layers_modules_28_modules_mlp_modules_gate_parameters_weight_ = l_self_modules_layers_modules_29_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_scale_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_bias_ = l_self_modules_layers_modules_29_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 223, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.runnable(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/piecewise_backend.py", line 343, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return range_entry.runnable(*args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/compiler_interface.py", line 377, in compiled_graph_wrapper
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     graph_output = inductor_compiled_graph(*args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 122, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._compiled_fn(*args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 215, in <lambda>
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return CacheCompiledArtifact(lambda *args: compiled_fn(list(args)), None)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                                                ^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     all_outs = call_func_at_runtime_with_args(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     out = normalize_as_list(f(args))
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                             ^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return compiled_fn(runtime_args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 638, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.current_callable(inputs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/utils.py", line 3220, in run
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     out = model(new_inputs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]           ^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/tmp/torchinductor_ssm-user/2e/c2enszwdfcmx44gilnrcy62e5kl2b77byxuemtmlnd2laqqsw6zs.py", line 1844, in call
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     buf13 = torch.ops.vllm.moe_forward_shared.default(buf10, buf12, buf10, 'model.layers.28.mlp.experts')
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_ops.py", line 819, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._op(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 108, in _moe_forward_shared
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return layer.runner.forward_impl(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 689, in forward_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     final_hidden_states = self.quant_method.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                           ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py", line 982, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.moe_kernel.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1753, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.impl.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1537, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     fused_out = self._fused_experts(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                 ^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1375, in _fused_experts
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     self.fused_experts.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/fallback.py", line 174, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     experts.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 357, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     run_cutlass_moe_fp8(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 193, in run_cutlass_moe_fp8
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     a1q, a1q_scale, expert_first_token_offset, inv_perm, _ = moe_permute(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                                                              ^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py", line 90, in moe_permute
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     a1q_scale = a1q_scale[permuted_idx.clamp(max=n_token * topk - 1) // topk]
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] torch.AcceleratorError: CUDA error: an illegal memory access was encountered
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] 
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Traceback (most recent call last):
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 875, in worker_busy_loop
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     output = func(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]              ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 365, in execute_model
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.worker.execute_model(scheduler_output)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return func(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 728, in execute_model
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     output = self.model_runner.execute_model(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return func(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3639, in execute_model
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     model_output = self._model_forward(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                    ^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3152, in _model_forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.model(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 223, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.runnable(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._call_impl(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return forward_call(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/models/glm4_moe.py", line 695, in forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     hidden_states = self.model(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                     ^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 402, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.aot_compiled_fn(self, *args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_dynamo/aot_compile.py", line 124, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.fn(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/models/glm4_moe.py", line 452, in forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     def forward(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/caching.py", line 198, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.optimized_call(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._wrapped_call(self, *args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     raise e
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._call_impl(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return forward_call(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "<eval_with_key>.206", line 1142, in forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     submod_58 = self.submod_58(getitem_143, s72, l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_, l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_scale_, l_self_modules_layers_modules_28_modules_post_attention_layernorm_parameters_weight_, getitem_144, l_self_modules_layers_modules_28_modules_mlp_modules_gate_parameters_weight_, l_self_modules_layers_modules_29_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_scale_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_bias_, l_self_modules_layers_modules_29_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_);  getitem_143 = l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_scale_ = l_self_modules_layers_modules_28_modules_post_attention_layernorm_parameters_weight_ = getitem_144 = l_self_modules_layers_modules_28_modules_mlp_modules_gate_parameters_weight_ = l_self_modules_layers_modules_29_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_scale_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_bias_ = l_self_modules_layers_modules_29_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 223, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.runnable(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/piecewise_backend.py", line 343, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return range_entry.runnable(*args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/compiler_interface.py", line 377, in compiled_graph_wrapper
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     graph_output = inductor_compiled_graph(*args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 122, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._compiled_fn(*args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 215, in <lambda>
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return CacheCompiledArtifact(lambda *args: compiled_fn(list(args)), None)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                                                ^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     all_outs = call_func_at_runtime_with_args(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     out = normalize_as_list(f(args))
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                             ^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return compiled_fn(runtime_args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 638, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.current_callable(inputs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/utils.py", line 3220, in run
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     out = model(new_inputs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]           ^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/tmp/torchinductor_ssm-user/2e/c2enszwdfcmx44gilnrcy62e5kl2b77byxuemtmlnd2laqqsw6zs.py", line 1844, in call
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     buf13 = torch.ops.vllm.moe_forward_shared.default(buf10, buf12, buf10, 'model.layers.28.mlp.experts')
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_ops.py", line 819, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._op(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 108, in _moe_forward_shared
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return layer.runner.forward_impl(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 689, in forward_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     final_hidden_states = self.quant_method.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                           ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py", line 982, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.moe_kernel.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1753, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.impl.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1537, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     fused_out = self._fused_experts(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                 ^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1375, in _fused_experts
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     self.fused_experts.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/fallback.py", line 174, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     experts.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 357, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     run_cutlass_moe_fp8(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 193, in run_cutlass_moe_fp8
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     a1q, a1q_scale, expert_first_token_offset, inv_perm, _ = moe_permute(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                                                              ^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py", line 90, in moe_permute
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     a1q_scale = a1q_scale[permuted_idx.clamp(max=n_token * topk - 1) // topk]
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] torch.AcceleratorError: CUDA error: an illegal memory access was encountered
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] 
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] 
[rank3]:[W319 14:38:20.896345916 CUDAGuardImpl.h:122] Warning: CUDA warning: an illegal memory access was encountered (function destroyEvent)
terminate called after throwing an instance of 'c10::AcceleratorError'
  what():  CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from currentStreamCaptureStatusMayInitCtx at /pytorch/c10/cuda/CUDAGraphsC10Utils.h:71 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fcc84f72fdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7fccb77320e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0xf2a97a (0x7fcb9892a97a in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: <unknown function> + 0x7e9d4 (0x7fcc84f549d4 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #4: c10::TensorImpl::~TensorImpl() + 0x9 (0x7fcc84f4e369 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #5: <unknown function> + 0x862f45 (0x7fcbeb062f45 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x862fe1 (0x7fcbeb062fe1 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #7: VLLM::Worker_TP3() [0x1635231]
frame #8: VLLM::Worker_TP3() [0x163537b]
frame #9: _PyEval_EvalFrameDefault + 0x968 (0x1612468 in VLLM::Worker_TP3)
frame #10: PyEval_EvalCode + 0xe2 (0x1686ca2 in VLLM::Worker_TP3)
frame #11: VLLM::Worker_TP3() [0x16ac442]
frame #12: PyRun_StringFlags + 0x7e (0x16ac2d6 in VLLM::Worker_TP3)
frame #13: PyRun_SimpleStringFlags + 0x3d (0x17713fd in VLLM::Worker_TP3)
frame #14: VLLM::Worker_TP3() [0x1771391]
frame #15: Py_RunMain + 0x291 (0x1770c67 in VLLM::Worker_TP3)
frame #16: VLLM::Worker_TP3() [0x173ecfa]
frame #17: VLLM::Worker_TP3() [0x173eaed]
frame #18: <unknown function> + 0x2a610 (0x7fccc562a610 in /lib64/libc.so.6)
frame #19: __libc_start_main + 0x80 (0x7fccc562a6c0 in /lib64/libc.so.6)
frame #20: _start + 0x29 (0x17ab069 in VLLM::Worker_TP3)

---

vllm serve zai-org/GLM-4.7-FP8 \
--tensor-parallel-size 8 \
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 1 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--enable-auto-tool-choice \
--async-scheduling \
--enable-prefix-caching
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
==============================
        System Info
==============================
OS                           : Amazon Linux 2023.10.20260216 (x86_64)
GCC version                  : (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5)
Clang version                : Could not collect
CMake version                : version 3.22.2
Libc version                 : glibc-2.34

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.13 (main, Mar 10 2026, 18:17:25) [Clang 21.1.4 ] (64-bit runtime)
Python platform              : Linux-6.1.161-183.298.amzn2023.x86_64-x86_64-with-glibc2.34

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 13.0.88
CUDA_MODULE_LOADING set to   : 
GPU models and configuration : 
GPU 0: NVIDIA H200
GPU 1: NVIDIA H200
GPU 2: NVIDIA H200
GPU 3: NVIDIA H200
GPU 4: NVIDIA H200
GPU 5: NVIDIA H200
GPU 6: NVIDIA H200
GPU 7: NVIDIA H200

Nvidia driver version        : 580.126.09
cuDNN version                : Could not collect
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           48 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  192
On-line CPU(s) list:                     0-191
Vendor ID:                               AuthenticAMD
Model name:                              AMD EPYC 7R13 Processor
CPU family:                              25
Model:                                   1
Thread(s) per core:                      2
Core(s) per socket:                      48
Socket(s):                               2
Stepping:                                1
BogoMIPS:                                5300.00
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save vaes vpclmulqdq rdpid
Hypervisor vendor:                       KVM
Virtualization type:                     full
L1d cache:                               3 MiB (96 instances)
L1i cache:                               3 MiB (96 instances)
L2 cache:                                48 MiB (96 instances)
L3 cache:                                384 MiB (12 instances)
NUMA node(s):                            4
NUMA node0 CPU(s):                       0-23,96-119
NUMA node1 CPU(s):                       24-47,120-143
NUMA node2 CPU(s):                       48-71,144-167
NUMA node3 CPU(s):                       72-95,168-191
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Mitigation; safe RET
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Mitigation; Clear CPU buffers
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.4
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.18.0
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.4.2
[pip3] nvidia-cutlass-dsl-libs-base==4.4.2
[pip3] nvidia-ml-py==13.590.48
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvshmem-cu12==3.4.5
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] pyzmq==27.1.0
[pip3] torch==2.10.0
[pip3] torch-c-dlpack-ext==0.1.5
[pip3] torchaudio==2.10.0
[pip3] torchvision==0.25.0
[pip3] transformers==4.57.6
[pip3] triton==3.6.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.17.1
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	NV18	NV18	NV18	NV18	NV18	NV18	NV18	24-47,120-143	1		N/A
GPU1	NV18	 X 	NV18	NV18	NV18	NV18	NV18	NV18	24-47,120-143	1		N/A
GPU2	NV18	NV18	 X 	NV18	NV18	NV18	NV18	NV18	0-23,96-119	0		N/A
GPU3	NV18	NV18	NV18	 X 	NV18	NV18	NV18	NV18	0-23,96-119	0		N/A
GPU4	NV18	NV18	NV18	NV18	 X 	NV18	NV18	NV18	72-95,168-191	3		N/A
GPU5	NV18	NV18	NV18	NV18	NV18	 X 	NV18	NV18	72-95,168-191	3		N/A
GPU6	NV18	NV18	NV18	NV18	NV18	NV18	 X 	NV18	48-71,144-167	2		N/A
GPU7	NV18	NV18	NV18	NV18	NV18	NV18	NV18	 X 	48-71,144-167	2		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks
</details>

🐛 Describe the bug

Description: I am experiencing a critical crash (CUDA ILM / Illegal Memory Access error) when serving the zai-org/GLM-4.7-FP8 model with Multi-Token Prediction (MTP) enabled and num_speculative_tokens with >1 under concurrent requests, same as (close to) https://github.com/vllm-project/vllm/issues/36613.

The service runs perfectly fine MTP is enabled and has num_speculative_tokens==1.

Steps to Reproduce:

  1. Start the vLLM(0.17.1) server with the zai-org/GLM-4.7-FP8 model and the following speculative decoding configuration:
vllm serve zai-org/GLM-4.7-FP8 \
--tensor-parallel-size 8 \
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 2 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--enable-auto-tool-choice \
--async-scheduling \
--enable-prefix-caching

Start benchmark with:

vllm bench serve \
--model zai-org/GLM-4.7-FP8 \
--port 8000 \
--save-result \
--save-detailed \
--backend=vllm \
--dataset-name custom \
--dataset-path SOME_DATASET \
--disable-shuffle \
--metric-percentiles "50,90,95,99" \
--percentile-metrics "ttft,tpot,e2el" \
--result-dir "./vllm_bench_results/" \
--plot-dataset-stats \
--plot-timeline \
--request-rate 1
  1. Send high-concurrency requests to the server.
  2. The server will suddenly crash with a CUDA ILM error during request processing.
<details> <summary>Error:</summary>
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] WorkerProc hit an exception.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Traceback (most recent call last):
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 875, in worker_busy_loop
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     output = func(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]              ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 365, in execute_model
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.worker.execute_model(scheduler_output)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return func(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 728, in execute_model
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     output = self.model_runner.execute_model(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return func(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3639, in execute_model
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     model_output = self._model_forward(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                    ^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3152, in _model_forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.model(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 223, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.runnable(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._call_impl(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return forward_call(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/models/glm4_moe.py", line 695, in forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     hidden_states = self.model(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                     ^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 402, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.aot_compiled_fn(self, *args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_dynamo/aot_compile.py", line 124, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.fn(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/models/glm4_moe.py", line 452, in forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     def forward(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/caching.py", line 198, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.optimized_call(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._wrapped_call(self, *args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     raise e
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._call_impl(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return forward_call(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "<eval_with_key>.206", line 1142, in forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     submod_58 = self.submod_58(getitem_143, s72, l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_, l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_scale_, l_self_modules_layers_modules_28_modules_post_attention_layernorm_parameters_weight_, getitem_144, l_self_modules_layers_modules_28_modules_mlp_modules_gate_parameters_weight_, l_self_modules_layers_modules_29_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_scale_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_bias_, l_self_modules_layers_modules_29_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_);  getitem_143 = l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_scale_ = l_self_modules_layers_modules_28_modules_post_attention_layernorm_parameters_weight_ = getitem_144 = l_self_modules_layers_modules_28_modules_mlp_modules_gate_parameters_weight_ = l_self_modules_layers_modules_29_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_scale_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_bias_ = l_self_modules_layers_modules_29_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 223, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.runnable(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/piecewise_backend.py", line 343, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return range_entry.runnable(*args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/compiler_interface.py", line 377, in compiled_graph_wrapper
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     graph_output = inductor_compiled_graph(*args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 122, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._compiled_fn(*args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 215, in <lambda>
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return CacheCompiledArtifact(lambda *args: compiled_fn(list(args)), None)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                                                ^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     all_outs = call_func_at_runtime_with_args(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     out = normalize_as_list(f(args))
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                             ^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return compiled_fn(runtime_args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 638, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.current_callable(inputs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/utils.py", line 3220, in run
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     out = model(new_inputs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]           ^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/tmp/torchinductor_ssm-user/2e/c2enszwdfcmx44gilnrcy62e5kl2b77byxuemtmlnd2laqqsw6zs.py", line 1844, in call
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     buf13 = torch.ops.vllm.moe_forward_shared.default(buf10, buf12, buf10, 'model.layers.28.mlp.experts')
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_ops.py", line 819, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._op(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 108, in _moe_forward_shared
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return layer.runner.forward_impl(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 689, in forward_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     final_hidden_states = self.quant_method.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                           ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py", line 982, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.moe_kernel.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1753, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.impl.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1537, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     fused_out = self._fused_experts(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                 ^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1375, in _fused_experts
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     self.fused_experts.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/fallback.py", line 174, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     experts.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 357, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     run_cutlass_moe_fp8(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 193, in run_cutlass_moe_fp8
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     a1q, a1q_scale, expert_first_token_offset, inv_perm, _ = moe_permute(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                                                              ^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py", line 90, in moe_permute
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     a1q_scale = a1q_scale[permuted_idx.clamp(max=n_token * topk - 1) // topk]
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] torch.AcceleratorError: CUDA error: an illegal memory access was encountered
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] 
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Traceback (most recent call last):
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 875, in worker_busy_loop
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     output = func(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]              ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 365, in execute_model
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.worker.execute_model(scheduler_output)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return func(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 728, in execute_model
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     output = self.model_runner.execute_model(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return func(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3639, in execute_model
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     model_output = self._model_forward(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                    ^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3152, in _model_forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.model(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 223, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.runnable(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._call_impl(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return forward_call(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/models/glm4_moe.py", line 695, in forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     hidden_states = self.model(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                     ^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 402, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.aot_compiled_fn(self, *args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_dynamo/aot_compile.py", line 124, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.fn(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/models/glm4_moe.py", line 452, in forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     def forward(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/caching.py", line 198, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.optimized_call(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._wrapped_call(self, *args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 455, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     raise e
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/fx/graph_module.py", line 442, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._call_impl(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return forward_call(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "<eval_with_key>.206", line 1142, in forward
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     submod_58 = self.submod_58(getitem_143, s72, l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_, l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_scale_, l_self_modules_layers_modules_28_modules_post_attention_layernorm_parameters_weight_, getitem_144, l_self_modules_layers_modules_28_modules_mlp_modules_gate_parameters_weight_, l_self_modules_layers_modules_29_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_scale_, l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_bias_, l_self_modules_layers_modules_29_modules_self_attn_modules_q_norm_parameters_weight_, l_self_modules_layers_modules_29_modules_self_attn_modules_k_norm_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_);  getitem_143 = l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_28_modules_self_attn_modules_o_proj_parameters_weight_scale_ = l_self_modules_layers_modules_28_modules_post_attention_layernorm_parameters_weight_ = getitem_144 = l_self_modules_layers_modules_28_modules_mlp_modules_gate_parameters_weight_ = l_self_modules_layers_modules_29_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_weight_scale_ = l_self_modules_layers_modules_29_modules_self_attn_modules_qkv_proj_parameters_bias_ = l_self_modules_layers_modules_29_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_29_modules_self_attn_modules_k_norm_parameters_weight_ = None
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/cuda_graph.py", line 223, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.runnable(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/piecewise_backend.py", line 343, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return range_entry.runnable(*args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/compilation/compiler_interface.py", line 377, in compiled_graph_wrapper
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     graph_output = inductor_compiled_graph(*args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 122, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._compiled_fn(*args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/standalone_compile.py", line 215, in <lambda>
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return CacheCompiledArtifact(lambda *args: compiled_fn(list(args)), None)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                                                ^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     all_outs = call_func_at_runtime_with_args(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     out = normalize_as_list(f(args))
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                             ^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return compiled_fn(runtime_args)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 638, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.current_callable(inputs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_inductor/utils.py", line 3220, in run
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     out = model(new_inputs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]           ^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/tmp/torchinductor_ssm-user/2e/c2enszwdfcmx44gilnrcy62e5kl2b77byxuemtmlnd2laqqsw6zs.py", line 1844, in call
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     buf13 = torch.ops.vllm.moe_forward_shared.default(buf10, buf12, buf10, 'model.layers.28.mlp.experts')
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/_ops.py", line 819, in __call__
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self._op(*args, **kwargs)
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 108, in _moe_forward_shared
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return layer.runner.forward_impl(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 689, in forward_impl
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     final_hidden_states = self.quant_method.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                           ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py", line 982, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.moe_kernel.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1753, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     return self.impl.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]            ^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1537, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     fused_out = self._fused_experts(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                 ^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1375, in _fused_experts
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     self.fused_experts.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/fallback.py", line 174, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     experts.apply(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 357, in apply
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     run_cutlass_moe_fp8(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/cutlass_moe.py", line 193, in run_cutlass_moe_fp8
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     a1q, a1q_scale, expert_first_token_offset, inv_perm, _ = moe_permute(
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                                                              ^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]   File "/home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/vllm/model_executor/layers/fused_moe/moe_permute_unpermute.py", line 90, in moe_permute
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]     a1q_scale = a1q_scale[permuted_idx.clamp(max=n_token * topk - 1) // topk]
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] torch.AcceleratorError: CUDA error: an illegal memory access was encountered
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] 
(Worker pid=2327379) (Worker_TP3 pid=2327379) ERROR 03-19 14:38:20 [multiproc_executor.py:880] 
[rank3]:[W319 14:38:20.896345916 CUDAGuardImpl.h:122] Warning: CUDA warning: an illegal memory access was encountered (function destroyEvent)
terminate called after throwing an instance of 'c10::AcceleratorError'
  what():  CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from currentStreamCaptureStatusMayInitCtx at /pytorch/c10/cuda/CUDAGraphsC10Utils.h:71 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fcc84f72fdd in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0e0 (0x7fccb77320e0 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0xf2a97a (0x7fcb9892a97a in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #3: <unknown function> + 0x7e9d4 (0x7fcc84f549d4 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #4: c10::TensorImpl::~TensorImpl() + 0x9 (0x7fcc84f4e369 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #5: <unknown function> + 0x862f45 (0x7fcbeb062f45 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x862fe1 (0x7fcbeb062fe1 in /home/ssm-user/mikhail.podvitskii/vllm-env/.venv/lib/python3.12/site-packages/torch/lib/libtorch_python.so)
frame #7: VLLM::Worker_TP3() [0x1635231]
frame #8: VLLM::Worker_TP3() [0x163537b]
frame #9: _PyEval_EvalFrameDefault + 0x968 (0x1612468 in VLLM::Worker_TP3)
frame #10: PyEval_EvalCode + 0xe2 (0x1686ca2 in VLLM::Worker_TP3)
frame #11: VLLM::Worker_TP3() [0x16ac442]
frame #12: PyRun_StringFlags + 0x7e (0x16ac2d6 in VLLM::Worker_TP3)
frame #13: PyRun_SimpleStringFlags + 0x3d (0x17713fd in VLLM::Worker_TP3)
frame #14: VLLM::Worker_TP3() [0x1771391]
frame #15: Py_RunMain + 0x291 (0x1770c67 in VLLM::Worker_TP3)
frame #16: VLLM::Worker_TP3() [0x173ecfa]
frame #17: VLLM::Worker_TP3() [0x173eaed]
frame #18: <unknown function> + 0x2a610 (0x7fccc562a610 in /lib64/libc.so.6)
frame #19: __libc_start_main + 0x80 (0x7fccc562a6c0 in /lib64/libc.so.6)
frame #20: _start + 0x29 (0x17ab069 in VLLM::Worker_TP3)
</details>

Happy path: Start the vLLM(0.17.1) server with the zai-org/GLM-4.7-FP8 model and the following speculative decoding configuration, everything works fine:

vllm serve zai-org/GLM-4.7-FP8 \
--tensor-parallel-size 8 \
--speculative-config.method mtp \
--speculative-config.num_speculative_tokens 1 \
--tool-call-parser glm47 \
--reasoning-parser glm45 \
--enable-auto-tool-choice \
--async-scheduling \
--enable-prefix-caching

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The issue seems to be related to the CUDA kernel error when using Multi-Token Prediction (MTP) with num_speculative_tokens > 1. To fix this, we can try the following steps:

  • Step 1: Update CUDA and cuDNN versions
    • Ensure that the CUDA and cuDNN versions are compatible with the PyTorch version being used.
    • Update the CUDA and cuDNN versions to the latest available.
  • Step 2: Set environment variables
    • Set the CUDA_LAUNCH_BLOCKING environment variable to 1 to enable synchronous CUDA kernel launches.
    • Set the TORCH_USE_CUDA_DSA environment variable to enable device-side assertions.
  • Step 3: Modify the speculative decoding configuration
    • Try setting num_speculative_tokens to 1 to see if the issue persists.
    • If the issue is resolved, it may indicate a problem with the MTP implementation when num_speculative_tokens > 1.

Example code to set environment variables:

import os

# Set environment variables
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
os.environ["TORCH_USE_CUDA_DSA"] = "1"

Verification

To verify that the fix worked, you can try the following:

  • Run the vLLM server with the modified speculative decoding configuration and check for any CUDA kernel errors.
  • Test the server with high-concurrency requests and verify that it does not crash.

Example code to test the server:

import torch

# Create a test input
input_ids = torch.randint(0, 100, (1, 10))

# Run the test input through the server
output = server(input_ids)

# Check for any CUDA kernel errors
if torch.cuda.is_available():
    torch.cuda.synchronize()
    if torch.cuda.get_device_capability()[0] < 8:
        print("CUDA kernel error detected")
    else:
        print("No CUDA kernel errors detected")

Extra Tips

  • Make sure to check the PyTorch and CUDA documentation for any known issues or compatibility problems.
  • If the issue persists, try debugging the CUDA kernel code to identify the source of the error.
  • Consider filing a bug report with the PyTorch or CUDA developers if the issue is not resolved.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING