vllm - 💡(How to fix) Fix [Bug]: Qwen3-Coder-Next-FP8 start error wifh tp=8 on H100 [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36853Fetched 2026-04-08 00:34:13
View on GitHub
Comments
3
Participants
2
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
commented ×3labeled ×1mentioned ×1subscribed ×1

Error Message

(Worker_TP2_EP2 pid=490) INFO 03-11 22:47:13 [multiproc_executor.py:730] Parent process exited, terminating worker (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] WorkerProc failed to start. (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] Traceback (most recent call last): (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 743, in worker_main (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] worker = WorkerProc(*args, **kwargs) (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 578, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.worker.load_model() (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 275, in load_model (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.model_runner.load_model(eep_scale_up=eep_scale_up) (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4040, in load_model (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.model = model_loader.load_model( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] model = initialize_model( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 48, in initialize_model (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] return model_class(vllm_config=vllm_config, prefix=prefix) (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1220, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.model = Qwen3NextModel( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 306, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] old_init(self, **kwargs) (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 990, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.start_layer, self.end_layer, self.layers = make_layers( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 707, in make_layers (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 984, in get_layer (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] return Qwen3NextDecoderLayer( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 864, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.mlp = Qwen3NextSparseMoeBlock( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 159, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.shared_expert = Qwen3NextMLP( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_moe.py", line 93, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.down_proj = RowParallelLinear( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ^^^^^^^^^^^^^^^^^^ (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1366, in init (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] self.quant_method.create_weights( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 382, in create_weights (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] validate_fp8_block_shape( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 1443, in validate_fp8_block_shape (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] raise ValueError( (Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ValueError: Weight input_size_per_partition = 64 is not divisible by weight quantization block_k = 128. [rank0]:[W311 22:47:14.634661437 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Code Example

pass

---

nerdctl run --gpus all --name vllm-serve --shm-size 64g -v /mnt/data02/000000/model/Qwen3-Coder-Next-FP8:/data/model -p 8000:8000 -e VLLM_USE_DEEP_GEMM=0 --ipc=host vllm/vllm-openai:v0.17.0 /data/model --served-model-name hello-model --gpu-memory-utilization 0.93 --tensor-parallel-size 8 --enable-expert-parallel --enable-force-include-usag

---

(Worker_TP2_EP2 pid=490) INFO 03-11 22:47:13 [multiproc_executor.py:730] Parent process exited, terminating worker
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] WorkerProc failed to start.
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] Traceback (most recent call last):
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 743, in worker_main
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     worker = WorkerProc(*args, **kwargs)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 578, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.worker.load_model()
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 275, in load_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4040, in load_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.model = model_loader.load_model(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     model = initialize_model(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]             ^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 48, in initialize_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     return model_class(vllm_config=vllm_config, prefix=prefix)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1220, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.model = Qwen3NextModel(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                  ^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 306, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     old_init(self, **kwargs)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 990, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.start_layer, self.end_layer, self.layers = make_layers(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                                                     ^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 707, in make_layers
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 984, in get_layer
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     return Qwen3NextDecoderLayer(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 864, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.mlp = Qwen3NextSparseMoeBlock(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 159, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.shared_expert = Qwen3NextMLP(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                          ^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_moe.py", line 93, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.down_proj = RowParallelLinear(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                      ^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1366, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.quant_method.create_weights(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 382, in create_weights
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     validate_fp8_block_shape(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 1443, in validate_fp8_block_shape
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     raise ValueError(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ValueError: Weight input_size_per_partition = 64 is not divisible by weight quantization block_k = 128.
[rank0]:[W311 22:47:14.634661437 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
pass
</details>

🐛 Describe the bug

hi,

  • command
nerdctl run --gpus all --name vllm-serve --shm-size 64g -v /mnt/data02/000000/model/Qwen3-Coder-Next-FP8:/data/model -p 8000:8000 -e VLLM_USE_DEEP_GEMM=0 --ipc=host vllm/vllm-openai:v0.17.0 /data/model --served-model-name hello-model --gpu-memory-utilization 0.93 --tensor-parallel-size 8 --enable-expert-parallel --enable-force-include-usag
  • error
(Worker_TP2_EP2 pid=490) INFO 03-11 22:47:13 [multiproc_executor.py:730] Parent process exited, terminating worker
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] WorkerProc failed to start.
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] Traceback (most recent call last):
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 743, in worker_main
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     worker = WorkerProc(*args, **kwargs)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 578, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.worker.load_model()
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 275, in load_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4040, in load_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.model = model_loader.load_model(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     model = initialize_model(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]             ^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py", line 48, in initialize_model
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     return model_class(vllm_config=vllm_config, prefix=prefix)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1220, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.model = Qwen3NextModel(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                  ^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 306, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     old_init(self, **kwargs)
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 990, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.start_layer, self.end_layer, self.layers = make_layers(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                                                     ^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 707, in make_layers
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 984, in get_layer
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     return Qwen3NextDecoderLayer(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 864, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.mlp = Qwen3NextSparseMoeBlock(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 159, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.shared_expert = Qwen3NextMLP(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                          ^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_moe.py", line 93, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.down_proj = RowParallelLinear(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]                      ^^^^^^^^^^^^^^^^^^
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1366, in __init__
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     self.quant_method.create_weights(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/fp8.py", line 382, in create_weights
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     validate_fp8_block_shape(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 1443, in validate_fp8_block_shape
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772]     raise ValueError(
(Worker_TP2_EP2 pid=490) ERROR 03-11 22:47:13 [multiproc_executor.py:772] ValueError: Weight input_size_per_partition = 64 is not divisible by weight quantization block_k = 128.
[rank0]:[W311 22:47:14.634661437 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The error occurs due to the weight input size per partition not being divisible by the weight quantization block size. To fix this issue, we need to adjust the weight_quantization_block_k or the input size to make it divisible.

Here are the steps to fix the issue:

  • Check the model configuration and adjust the weight_quantization_block_k to a value that is a factor of the input size.
  • Alternatively, adjust the input size to be a multiple of the weight_quantization_block_k.

Example code to adjust the weight_quantization_block_k:

# Assuming the input size is 64 and the weight_quantization_block_k is 128
input_size = 64
weight_quantization_block_k = 128

# Adjust the weight_quantization_block_k to a factor of the input size
weight_quantization_block_k = 64  # or 32, 16, 8, etc.

# Update the model configuration with the new weight_quantization_block_k
model_config = {
    # ... other configurations ...
    'weight_quantization_block_k': weight_quantization_block_k
}

Example code to adjust the input size:

# Assuming the input size is 64 and the weight_quantization_block_k is 128
input_size = 64
weight_quantization_block_k = 128

# Adjust the input size to be a multiple of the weight_quantization_block_k
input_size = 128  # or 256, 384, etc.

# Update the model configuration with the new input size
model_config = {
    # ... other configurations ...
    'input_size': input_size
}

Verification

To verify that the fix worked, run the model with the updated configuration and check for any errors related to the weight quantization block size.

Extra Tips

  • Make sure to test the model with different input sizes and weight quantization block sizes to ensure that the issue is fully resolved.
  • Consider adding error handling to the model configuration code to catch and handle any potential errors related to the weight quantization block size.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING