vllm - 💡(How to fix) Fix [Bug]: Qwen3-Coder-Next-FP8 start error wifh tp=8 on H100 [3 comments, 2 participants]

vllm2026-03-12 05:55:25

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36853•Fetched 2026-04-08 00:34:13

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ltm920716

Participants

ltm920716

ZJY0516

Timeline (top)

commented ×3labeled ×1mentioned ×1subscribed ×1

Error Message

(Worker_TP2_EP2 pid=490) INFO 03-11 22:47:13 [multiproc_executor.py:730] Parent process exited, terminating worker (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] WorkerProc failed to start. (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] Traceback (most recent call last): (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 743, in worker_main (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] worker = WorkerProc(*args, **kwargs) (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 578, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.worker.load_model() (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 275, in load_model (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.model_runner.load_model(eep_scale_up=eep_scale_up) (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4040, in load_model (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.model = model_loader.load_model( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] model = initialize_model( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 48, in initialize_model (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] return model_class(vllm_config=vllm_config, prefix=prefix) (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1220, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.model = Qwen3NextModel( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 306, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] old_init(self, **kwargs) (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 990, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.start_layer, self.end_layer, self.layers = make_layers( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 707, in make_layers (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 984, in get_layer (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] return Qwen3NextDecoderLayer( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 864, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.mlp = Qwen3NextSparseMoeBlock( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 159, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.shared_expert = Qwen3NextMLP( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_moe.py", line 93, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.down_proj = RowParallelLinear( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1366, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.quant_method.create_weights( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 382, in create_weights (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] validate_fp8_block_shape( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 1443, in validate_fp8_block_shape (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] raise ValueError( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ValueError: Weight input_size_per_partition = 64 is not divisible by weight quantization block_k = 128. [rank0]:[W311 22:47:14.634661437 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Code Example

pass

---

nerdctl run --gpus all --name vllm-serve --shm-size 64g -v /mnt/data02/000000/model/Qwen3-Coder-Next-FP8:/data/model -p 8000:8000 -e VLLM_USE_DEEP_GEMM=0 --ipc=host vllm/vllm-openai:v0.17.0 /data/model --served-model-name hello-model --gpu-memory-utilization 0.93 --tensor-parallel-size 8 --enable-expert-parallel --enable-force-include-usag

---

(Worker_TP2_EP2 pid=490) INFO 03-11 22:47:13 [multiproc_executor.py:730] Parent process exited, terminating worker
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] WorkerProc failed to start.
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] Traceback (most recent call last):
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 743, in worker_main
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     worker = WorkerProc(*args, **kwargs)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 578, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.worker.load_model()
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 275, in load_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4040, in load_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.model = model_loader.load_model(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     model = initialize_model(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]             ^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 48, in initialize_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     return model_class(vllm_config=vllm_config, prefix=prefix)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1220, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.model = Qwen3NextModel(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                  ^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 306, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     old_init(self, **kwargs)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 990, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.start_layer, self.end_layer, self.layers = make_layers(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                                                     ^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 707, in make_layers
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 984, in get_layer
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     return Qwen3NextDecoderLayer(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 864, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.mlp = Qwen3NextSparseMoeBlock(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 159, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.shared_expert = Qwen3NextMLP(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                          ^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_moe.py", line 93, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.down_proj = RowParallelLinear(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                      ^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1366, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.quant_method.create_weights(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 382, in create_weights
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     validate_fp8_block_shape(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 1443, in validate_fp8_block_shape
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     raise ValueError(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ValueError: Weight input_size_per_partition = 64 is not divisible by weight quantization block_k = 128.
[rank0]:[W311 22:47:14.634661437 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>

pass

</details>

🐛 Describe the bug

hi,

command

nerdctl run --gpus all --name vllm-serve --shm-size 64g -v /mnt/data02/000000/model/Qwen3-Coder-Next-FP8:/data/model -p 8000:8000 -e VLLM_USE_DEEP_GEMM=0 --ipc=host vllm/vllm-openai:v0.17.0 /data/model --served-model-name hello-model --gpu-memory-utilization 0.93 --tensor-parallel-size 8 --enable-expert-parallel --enable-force-include-usag

error

(Worker_TP2_EP2 pid=490) INFO 03-11 22:47:13 [multiproc_executor.py:730] Parent process exited, terminating worker
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] WorkerProc failed to start.
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] Traceback (most recent call last):
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 743, in worker_main
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     worker = WorkerProc(*args, **kwargs)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 578, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.worker.load_model()
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 275, in load_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4040, in load_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.model = model_loader.load_model(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     model = initialize_model(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]             ^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 48, in initialize_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     return model_class(vllm_config=vllm_config, prefix=prefix)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1220, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.model = Qwen3NextModel(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                  ^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 306, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     old_init(self, **kwargs)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 990, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.start_layer, self.end_layer, self.layers = make_layers(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                                                     ^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 707, in make_layers
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 984, in get_layer
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     return Qwen3NextDecoderLayer(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 864, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.mlp = Qwen3NextSparseMoeBlock(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 159, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.shared_expert = Qwen3NextMLP(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                          ^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_moe.py", line 93, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.down_proj = RowParallelLinear(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                      ^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1366, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.quant_method.create_weights(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 382, in create_weights
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     validate_fp8_block_shape(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 1443, in validate_fp8_block_shape
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     raise ValueError(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ValueError: Weight input_size_per_partition = 64 is not divisible by weight quantization block_k = 128.
[rank0]:[W311 22:47:14.634661437 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The error occurs due to the weight input size per partition not being divisible by the weight quantization block size. To fix this issue, we need to adjust the weight_quantization_block_k or the input size to make it divisible.

Here are the steps to fix the issue:

Check the model configuration and adjust the weight_quantization_block_k to a value that is a factor of the input size.
Alternatively, adjust the input size to be a multiple of the weight_quantization_block_k.

Example code to adjust the weight_quantization_block_k:

# Assuming the input size is 64 and the weight_quantization_block_k is 128
input_size = 64
weight_quantization_block_k = 128

# Adjust the weight_quantization_block_k to a factor of the input size
weight_quantization_block_k = 64  # or 32, 16, 8, etc.

# Update the model configuration with the new weight_quantization_block_k
model_config = {
    # ... other configurations ...
    'weight_quantization_block_k': weight_quantization_block_k
}

Example code to adjust the input size:

# Assuming the input size is 64 and the weight_quantization_block_k is 128
input_size = 64
weight_quantization_block_k = 128

# Adjust the input size to be a multiple of the weight_quantization_block_k
input_size = 128  # or 256, 384, etc.

# Update the model configuration with the new input size
model_config = {
    # ... other configurations ...
    'input_size': input_size
}

Verification

To verify that the fix worked, run the model with the updated configuration and check for any errors related to the weight quantization block size.

Extra Tips

Make sure to test the model with different input sizes and weight quantization block sizes to ensure that the issue is fully resolved.
Consider adding error handling to the model configuration code to catch and handle any potential errors related to the weight quantization block size.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #indexing error #inference speed #output truncation #response parsing #generation error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: Qwen3-Coder-Next-FP8 start error wifh tp=8 on H100 [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: Qwen3-Coder-Next-FP8 start error wifh tp=8 on H100 [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING