vllm - 💡(How to fix) Fix [Bug]: dpsk v4 on 8* H20-96G [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#40790Fetched 2026-04-25 06:04:05
View on GitHub
Comments
2
Participants
2
Timeline
5
Reactions
0
Timeline (top)
commented ×2closed ×1cross-referenced ×1labeled ×1

Error Message

(Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] expert_data.copy_(loaded_weight) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] RuntimeError: The size of tensor a (2048) must match the size of tensor b (16) at non-singleton dimension 0 (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_weights = model.load_weights(self.get_all_weights(model_config, model)) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 846, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return original_load_weights(self, weights, *args, **kwargs) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 355, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] autoloaded_weights = set(self._load_module("", self.module, weights)) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 302, in _load_module (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] yield from self._load_module( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 275, in _load_module (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_params = module_load_weights(weights) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 711, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] success = weight_loader( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1304, in weight_loader (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_model_weight_or_group_weight_scale( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 830, in _load_model_weight_or_group_weight_scale (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_w13( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 956, in load_w13 (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] expert_data.copy(loaded_weight)

Fix Action

Fix / Workaround

File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_weights = model.load_weights(self.get_all_weights(model_config, model)) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 846, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return original_load_weights(self, weights, *args, **kwargs) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 355, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] autoloaded_weights = set(self._load_module("", self.module, weights)) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 302, in _load_module (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] yield from self._load_module( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 275, in _load_module (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_params = module_load_weights(weights) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 711, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] success = weight_loader( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1304, in weight_loader (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_model_weight_or_group_weight_scale( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 830, in _load_model_weight_or_group_weight_scale (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_w13( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 956, in load_w13 (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] expert_data.copy(loaded_weight) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] RuntimeError: The size of tensor a (2048) must match the size of tensor b (16) at non-singleton dimension 0 (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] WorkerProc failed to start. (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] Traceback (most recent call last): (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 846, in worker_main (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] worker = WorkerProc(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 628, in init (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self.worker.load_model() (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self.model_runner.load_model(load_dummy_weights=load_dummy_weights) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4777, in load_model (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self.model = model_loader.load_model( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self.load_weights(model, model_config) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_weights = model.load_weights(self.get_all_weights(model_config, model)) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 846, in load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return original_load_weights(self, weights, *args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 355, in load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] autoloaded_weights = set(self._load_module("", self.module, weights)) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 302, in _load_module (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] yield from self._load_module( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 275, in _load_module (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_params = module_load_weights(weights) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 711, in load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] success = weight_loader( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1304, in weight_loader (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_model_weight_or_group_weight_scale( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 830, in _load_model_weight_or_group_weight_scale (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_w13( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 956, in load_w13 (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] expert_data.copy(loaded_weight) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] RuntimeError: The size of tensor a (2048) must match the size of tensor b (16) at non-singleton dimension 0 Loading safetensors checkpoint shards: 2% Completed | 1/46 [00:00<00:40, 1.11it/s]

Code Example

docker run --gpus all -itd --name=dpsk_v4_flash \
  --privileged --ipc=host -p 8000:8000 \
  -v /mnt/deepseek-ai:/models \
  -e VLLM_ENGINE_READY_TIMEOUT_S=3600 \
   vllm/vllm-openai:deepseekv4-cu129 /models/DeepSeek-V4-Flash-Base \
  --trust-remote-code \
  --kv-cache-dtype fp8 \
  --block-size 256 \
  --enable-expert-parallel \
  --data-parallel-size 4 \
  --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE", "custom_ops":["all"]}' \
  --max-model-len auto \
  --tokenizer-mode deepseek_v4 \
  --tool-call-parser deepseek_v4 \
  --enable-auto-tool-choice \
  --reasoning-parser deepseek_v4 \
  --max-num-seqs 12 \
  --max-num-batched-tokens 16384 --speculative_config '{"method":"mtp","num_speculative_tokens":2}'
RAW_BUFFERClick to expand / collapse

Your current environment

<details> File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 956, in _load_w13 (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] expert_data.copy_(loaded_weight) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] RuntimeError: The size of tensor a (2048) must match the size of tensor b (16) at non-singleton dimension 0
docker run --gpus all -itd --name=dpsk_v4_flash \
  --privileged --ipc=host -p 8000:8000 \
  -v /mnt/deepseek-ai:/models \
  -e VLLM_ENGINE_READY_TIMEOUT_S=3600 \
   vllm/vllm-openai:deepseekv4-cu129 /models/DeepSeek-V4-Flash-Base \
  --trust-remote-code \
  --kv-cache-dtype fp8 \
  --block-size 256 \
  --enable-expert-parallel \
  --data-parallel-size 4 \
  --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE", "custom_ops":["all"]}' \
  --max-model-len auto \
  --tokenizer-mode deepseek_v4 \
  --tool-call-parser deepseek_v4 \
  --enable-auto-tool-choice \
  --reasoning-parser deepseek_v4 \
  --max-num-seqs 12 \
  --max-num-batched-tokens 16384 --speculative_config '{"method":"mtp","num_speculative_tokens":2}'
</details>

🐛 Describe the bug

File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_weights = model.load_weights(self.get_all_weights(model_config, model)) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 846, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return original_load_weights(self, weights, *args, **kwargs) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 355, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] autoloaded_weights = set(self._load_module("", self.module, weights)) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 302, in _load_module (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] yield from self._load_module( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 275, in _load_module (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_params = module_load_weights(weights) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 711, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] success = weight_loader( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1304, in weight_loader (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_model_weight_or_group_weight_scale( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 830, in _load_model_weight_or_group_weight_scale (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_w13( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 956, in load_w13 (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] expert_data.copy(loaded_weight) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] RuntimeError: The size of tensor a (2048) must match the size of tensor b (16) at non-singleton dimension 0 (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] WorkerProc failed to start. (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] Traceback (most recent call last): (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 846, in worker_main (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] worker = WorkerProc(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 628, in init (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self.worker.load_model() (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self.model_runner.load_model(load_dummy_weights=load_dummy_weights) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4777, in load_model (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self.model = model_loader.load_model( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self.load_weights(model, model_config) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_weights = model.load_weights(self.get_all_weights(model_config, model)) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 846, in load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return original_load_weights(self, weights, *args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 355, in load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] autoloaded_weights = set(self._load_module("", self.module, weights)) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 302, in _load_module (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] yield from self._load_module( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 275, in _load_module (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_params = module_load_weights(weights) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 711, in load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] success = weight_loader( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1304, in weight_loader (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_model_weight_or_group_weight_scale( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 830, in _load_model_weight_or_group_weight_scale (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_w13( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 956, in load_w13 (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] expert_data.copy(loaded_weight) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] RuntimeError: The size of tensor a (2048) must match the size of tensor b (16) at non-singleton dimension 0 Loading safetensors checkpoint shards: 2% Completed | 1/46 [00:00<00:40, 1.11it/s]

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue is likely due to a mismatch in tensor sizes during model loading, specifically in the _load_w13 method of the fused_moe layer, and can be resolved by adjusting the model configuration or the loading process.

Guidance

  • Verify that the model configuration and the loaded weights are compatible, checking for any discrepancies in tensor sizes.
  • Review the fused_moe layer's _load_w13 method to ensure it can handle the specific tensor sizes encountered during loading.
  • Consider adjusting the data-parallel-size or block-size parameters in the docker run command to see if it affects the tensor sizes during loading.
  • Check the model's documentation or source code to understand how tensor sizes are determined and if there are any specific requirements for loading weights.

Example

No specific code example can be provided without further details on the model or the loading process, but reviewing the _load_w13 method in fused_moe/layer.py for tensor size handling might be a good starting point.

Notes

The error message indicates a specific issue with tensor sizes, but without more context about the model, its configuration, or how the weights are loaded, providing a precise fix is challenging. The solution might involve adjusting model parameters, changing how weights are loaded, or modifying the model's code to handle the size mismatch.

Recommendation

Apply a workaround by adjusting the model configuration parameters, such as data-parallel-size or block-size, to see if it resolves the tensor size mismatch issue, as this seems to be a configuration or compatibility problem rather than a code error.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: dpsk v4 on 8* H20-96G [2 comments, 2 participants]