vllm - 💡(How to fix) Fix [Bug]: deepgemm compile error [1 participants]

vllm2026-03-20 11:19:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#37675•Fetched 2026-04-08 01:04:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Speclkle

Participants

Speclkle

Timeline (top)

labeled ×1

Error Message

Running GLM 5 FP8 on 8xB200 with above config, results in the following exception: proc_executor.py:948] Traceback (most recent call last): (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/v1/executor/multiproc_executor.py", line 943, in worker_busy_loop (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] output = func(*args, **kwargs) (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] return func(*args, **kwargs) (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_worker.py", line 388, in determine_available_memory (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] self.model_runner.profile_run() (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^ (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5547, in profile_run (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] hidden_states, last_hidden_states = self._dummy_run( (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] ~~~~~~~~~~~~~~~^ (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] self.max_num_tokens, is_profile=True (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] ) (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] ^ (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] return func(*args, **kwargs) (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5240, in _dummy_run (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] outputs = self.model( (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] input_ids=input_ids, (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] ...<3 lines>... (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] **model_kwargs, (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] ) (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/compilation/cuda_graph.py", line 254, in call (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] return self.runnable(*args, **kwargs) (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] return self._call_impl(*args, **kwargs) (Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

Fix Action

Fix / Workaround

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 52 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 288 On-line CPU(s) list: 0-287 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) 6960P CPU family: 6 Model: 173 Thread(s) per core: 2 Core(s) per socket: 72 Socket(s): 2 Stepping: 1 CPU max MHz: 3900.0000 CPU min MHz: 800.0000 BogoMIPS: 5400.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect user_shstk avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req hfi vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities Virtualization: VT-x L1d cache: 6.8 MiB (144 instances) L1i cache: 9 MiB (144 instances) L2 cache: 288 MiB (144 instances) L3 cache: 864 MiB (2 instances) NUMA node(s): 6 NUMA node0 CPU(s): 0-23,144-167 NUMA node1 CPU(s): 24-47,168-191 NUMA node2 CPU(s): 48-71,192-215 NUMA node3 CPU(s): 72-95,216-239 NUMA node4 CPU(s): 96-119,240-263 NUMA node5 CPU(s): 120-143,264-287 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS Not affected; BHI BHI_DIS_S Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Code Example

PyTorch version: 2.10.0+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.13.11 (main, Jan 14 2026, 19:38:04) [Clang 21.1.4 ] (64-bit runtime)
Python platform: Linux-6.8.0-1041-nvidia-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 13.1.115
CUDA_MODULE_LOADING set to: 
GPU models and configuration: 
GPU 0: NVIDIA B200
GPU 1: NVIDIA B200
GPU 2: NVIDIA B200
GPU 3: NVIDIA B200
GPU 4: NVIDIA B200
GPU 5: NVIDIA B200
GPU 6: NVIDIA B200
GPU 7: NVIDIA B200

Nvidia driver version: 590.48.01
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        52 bits physical, 57 bits virtual
Byte Order:                           Little Endian
CPU(s):                               288
On-line CPU(s) list:                  0-287
Vendor ID:                            GenuineIntel
Model name:                           Intel(R) Xeon(R) 6960P
CPU family:                           6
Model:                                173
Thread(s) per core:                   2
Core(s) per socket:                   72
Socket(s):                            2
Stepping:                             1
CPU max MHz:                          3900.0000
CPU min MHz:                          800.0000
BogoMIPS:                             5400.00
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect user_shstk avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req hfi vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities
Virtualization:                       VT-x
L1d cache:                            6.8 MiB (144 instances)
L1i cache:                            9 MiB (144 instances)
L2 cache:                             288 MiB (144 instances)
L3 cache:                             864 MiB (2 instances)
NUMA node(s):                         6
NUMA node0 CPU(s):                    0-23,144-167
NUMA node1 CPU(s):                    24-47,168-191
NUMA node2 CPU(s):                    48-71,192-215
NUMA node3 CPU(s):                    72-95,216-239
NUMA node4 CPU(s):                    96-119,240-263
NUMA node5 CPU(s):                    120-143,264-287
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS Not affected; BHI BHI_DIS_S
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] Could not collect
[conda] Could not collect

RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>

PyTorch version: 2.10.0+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.13.11 (main, Jan 14 2026, 19:38:04) [Clang 21.1.4 ] (64-bit runtime)
Python platform: Linux-6.8.0-1041-nvidia-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 13.1.115
CUDA_MODULE_LOADING set to: 
GPU models and configuration: 
GPU 0: NVIDIA B200
GPU 1: NVIDIA B200
GPU 2: NVIDIA B200
GPU 3: NVIDIA B200
GPU 4: NVIDIA B200
GPU 5: NVIDIA B200
GPU 6: NVIDIA B200
GPU 7: NVIDIA B200

Nvidia driver version: 590.48.01
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        52 bits physical, 57 bits virtual
Byte Order:                           Little Endian
CPU(s):                               288
On-line CPU(s) list:                  0-287
Vendor ID:                            GenuineIntel
Model name:                           Intel(R) Xeon(R) 6960P
CPU family:                           6
Model:                                173
Thread(s) per core:                   2
Core(s) per socket:                   72
Socket(s):                            2
Stepping:                             1
CPU max MHz:                          3900.0000
CPU min MHz:                          800.0000
BogoMIPS:                             5400.00
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect user_shstk avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req hfi vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities
Virtualization:                       VT-x
L1d cache:                            6.8 MiB (144 instances)
L1i cache:                            9 MiB (144 instances)
L2 cache:                             288 MiB (144 instances)
L3 cache:                             864 MiB (2 instances)
NUMA node(s):                         6
NUMA node0 CPU(s):                    0-23,144-167
NUMA node1 CPU(s):                    24-47,168-191
NUMA node2 CPU(s):                    48-71,192-215
NUMA node3 CPU(s):                    72-95,216-239
NUMA node4 CPU(s):                    96-119,240-263
NUMA node5 CPU(s):                    120-143,264-287
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS Not affected; BHI BHI_DIS_S
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] Could not collect
[conda] Could not collect```

</details>

### 🐛 Describe the bug

Running GLM 5 FP8 on 8xB200 with above config, results in the following exception:

proc_executor.py:948] Traceback (most recent call last):
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/v1/executor/multiproc_executor.py", line 943, in worker_busy_loop
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     output = func(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return func(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_worker.py", line 388, in determine_available_memory
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     self.model_runner.profile_run()
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5547, in profile_run
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]                                         ~~~~~~~~~~~~~~~^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         self.max_num_tokens, is_profile=True
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     )
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return func(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5240, in _dummy_run
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     outputs = self.model(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         input_ids=input_ids,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ...<3 lines>...
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         **model_kwargs,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     )
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self.runnable(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self._call_impl(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return forward_call(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/model_executor/models/deepseek_v2.py", line 1399, in forward
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     hidden_states = self.model(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         input_ids, positions, intermediate_tensors, inputs_embeds
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     )
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/compilation/decorators.py", line 506, in __call__
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     output = self.aot_compiled_fn(self, *args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/_dynamo/aot_compile.py", line 124, in __call__
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self.fn(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/model_executor/models/deepseek_v2.py", line 1197, in forward
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     def forward(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/compilation/caching.py", line 206, in __call__
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self.optimized_call(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/fx/graph_module.py", line 936, in call_wrapped
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self._wrapped_call(self, *args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/fx/graph_module.py", line 455, in __call__
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     raise e
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/fx/graph_module.py", line 442, in __call__
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self._call_impl(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return forward_call(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "<eval_with_key>.418", line 1555, in forward
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     submod_16 = self.submod_16(getitem_36, s72, getitem_8, l_self_modules_layers_modules_3_modules_self_attn_modules_mla_attn_modules_o_proj_parameters_weight_, l_self_modules_layers_modules_3_modules_self_attn_modules_mla_attn_modules_o_proj_parameters_weight_scale_inv_, l_self_modules_layers_modules_3_modules_post_attention_layernorm_parameters_weight_, getitem_37, l_self_modules_layers_modules_4_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_scale_inv_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_a_layernorm_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_scale_inv_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_kv_a_layernorm_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_mla_attn_modules_rotary_emb_buffers_cos_sin_cache_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_wq_b_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_wq_b_parameters_weight_scale_inv_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_wk_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_wk_parameters_weight_scale_inv_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_k_norm_parameters_weight_, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_k_norm_parameters_bias_, getitem_10, l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_weights_proj_parameters_weight_);  getitem_36 = l_self_modules_layers_modules_3_modules_self_attn_modules_mla_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_3_modules_self_attn_modules_mla_attn_modules_o_proj_parameters_weight_scale_inv_ = l_self_modules_layers_modules_3_modules_post_attention_layernorm_parameters_weight_ = getitem_37 = l_self_modules_layers_modules_4_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_fused_qkv_a_proj_parameters_weight_scale_inv_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_a_layernorm_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_q_b_proj_parameters_weight_scale_inv_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_kv_a_layernorm_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_wq_b_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_wq_b_parameters_weight_scale_inv_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_wk_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_wk_parameters_weight_scale_inv_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_k_norm_parameters_weight_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_k_norm_parameters_bias_ = l_self_modules_layers_modules_4_modules_self_attn_modules_mla_attn_modules_indexer_modules_weights_proj_parameters_weight_ = None
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self.runnable(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/compilation/piecewise_backend.py", line 367, in __call__
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return range_entry.runnable(*args)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/_inductor/standalone_compile.py", line 122, in __call__
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self._compiled_fn(*args)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 1181, in _fn
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return fn(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return compiled_fn(full_args)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self.compiled_fn(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     all_outs = call_func_at_runtime_with_args(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         compiled_fn, args, disable_amp=disable_amp, steal_args=True
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     )
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     out = normalize_as_list(f(args))
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]                             ~^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return compiled_fn(runtime_args)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/_inductor/output_code.py", line 638, in __call__
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self.current_callable(inputs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/_inductor/utils.py", line 3220, in run
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     out = model(new_inputs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/tmp/torchinductor_p_kepom/lp/clpq6gkz3bvgqtrookqconhhbdsnq3o33kyzs2qhm23hesuubaot.py", line 1890, in call
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     buf14 = torch.ops.vllm.moe_forward_shared.default(buf11, buf12, buf13, 'from_forward_context')
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/torch/_ops.py", line 819, in __call__
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self._op(*args, **kwargs)
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 132, in _moe_forward_shared
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return runner.forward_impl(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         layer,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         ^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ...<2 lines>...
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         shared_experts_input,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     )
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 815, in forward_impl
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     shared_output, hidden_states = self._apply_quant_method(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]                                    ~~~~~~~~~~~~~~~~~~~~~~~~^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         layer=layer,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         ^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ...<3 lines>...
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         run_shared_experts_before=run_shared_experts_before,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     )
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py", line 492, in _apply_quant_method
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     result = self.quant_method.apply_monolithic(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         layer=layer,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         x=hidden_states,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         router_logits=router_logits,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     )
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/model_executor/layers/quantization/fp8.py", line 967, in apply_monolithic
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self.moe_kernel.apply_monolithic(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         x,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         ^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ...<10 lines>...
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         routed_scaling_factor=layer.routed_scaling_factor,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     )
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1552, in apply_monolithic
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self.impl.apply(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         hidden_states=hidden_states,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ...<10 lines>...
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         topk_group=topk_group,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     )
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1432, in apply
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     fused_out = self.fused_experts.apply(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         hidden_states=a1q,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ...<12 lines>...
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         topk_group=topk_group,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     )
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/model_executor/layers/fused_moe/experts/trtllm_fp8_moe.py", line 438, in apply
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return self._apply_block_scale(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         hidden_states,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ...<11 lines>...
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         topk_group=topk_group,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     )
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/vllm/model_executor/layers/fused_moe/experts/trtllm_fp8_moe.py", line 340, in _apply_block_scale
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     return flashinfer.fused_moe.trtllm_fp8_block_scale_moe(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         routing_logits=router_logits,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ...<17 lines>...
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         fp8_quantization_type=fp8_quant_type,
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     )
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     ^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/flashinfer/fused_moe/core.py", line 2585, in trtllm_fp8_block_scale_moe
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     result = get_trtllm_moe_sm100_module().trtllm_fp8_block_scale_moe(
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]              ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/flashinfer/fused_moe/core.py", line 951, in get_trtllm_moe_sm100_module
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     module = gen_trtllm_gen_fused_moe_sm100_module()
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]   File "/srv/workspace/work_dir/.vllm_venv/lib/python3.13/site-packages/flashinfer/jit/fused_moe.py", line 225, in gen_trtllm_gen_fused_moe_sm100_module
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]     assert checksum, f"Failed to get checksums.txt from {checksum_path}"
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948]            ^^^^^^^^
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] AssertionError: Failed to get checksums.txt from b55211623be7f5697c5262ffd8361fc06c147bc9/batched_gemm-b3c1646-c111d7c//checksums.txt
(Worker_TP2_EP2 pid=1935304) ERROR 03-20 19:05:09 [multiproc_executor.py:948] 
(Worker_TP2_EP2 pid=1935304) INFO 03-20 19:05:09 [multiproc_executor.py:858] WorkerProc shutting down.

how can i fix it?

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The error message indicates a failure to get checksums.txt from a specific path. This suggests an issue with the integrity or availability of the required files for the trtllm_gen_fused_moe_sm100_module. To fix this, follow these steps:

Verify File Existence and Integrity:
- Check if the checksums.txt file exists at the specified path (b55211623be7f5697c5262ffd8361fc06c147bc9/batched_gemm-b3c1646-c111d7c//checksums.txt).
- If the file does not exist, ensure that the path is correct and that the file has not been accidentally deleted or moved.
Re-download or Re-generate Checksums:
- If the file exists but is empty or corrupted, you may need to re-download it or re-generate the checksums.
- Refer to the documentation or support channels of the flashinfer library for instructions on how to properly download or generate the necessary files.
Environment and Dependency Check:
- Ensure that your environment and dependencies (including flashinfer and any other relevant libraries) are up-to-date and correctly installed.
- Sometimes, updating or reinstalling dependencies can resolve issues related to missing or corrupted files.
Code Review:
- Review the code that generates or uses the checksums.txt file to ensure there are no logical errors or incorrect assumptions about file paths or existence.

Example Code for Debugging

To debug the issue, you can add checks in your code to verify the existence and integrity of the checksums.txt file before attempting to use it. Here's a simple example:

import os

def verify_checksum_file(path):
    if not os.path.exists(path):
        print(f"Error: checksums.txt not found at {path}")
        return False
    if os.path.getsize(path) == 0:
        print(f"Error: checksums.txt at {path} is empty")
        return False
    return True

# Example usage
checksum_path = "b55211623be7f5697c5262ffd8361fc06c147bc9/batched_gemm-b3c1646-c111d7c//checksums.txt"
if not verify_checksum_file(checksum_path):
    # Handle the error, e.g., by re-downloading the file or terminating the program
    print("Terminating due to checksum file issue")
    exit(1)

Verification

After applying the fix, verify that the issue is resolved by running your application again. If the checksums.txt file is correctly in place and not empty, the error related to its absence or corruption should be resolved.

Extra Tips

Always ensure that your development and production environments are consistent, especially

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #mixed precision #training loop #device allocation #model download

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: deepgemm compile error [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

Your current environment

extent analysis

Fix Plan

Example Code for Debugging

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: deepgemm compile error [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

Your current environment

extent analysis

Fix Plan

Example Code for Debugging

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING