vllm - 💡(How to fix) Fix [Bug]: dpsk v4 on 8* H20-96G [2 comments, 2 participants]

vllm2026-04-24 08:11:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#40790•Fetched 2026-04-25 06:04:05

View on GitHub

Comments

Participants

Timeline

Reactions

Author

chenchunhui97

Participants

BenWongCityuCS

chenchunhui97

Timeline (top)

commented ×2closed ×1cross-referenced ×1labeled ×1

Error Message

(Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] expert_data.copy_(loaded_weight) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] RuntimeError: The size of tensor a (2048) must match the size of tensor b (16) at non-singleton dimension 0 (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_weights = model.load_weights(self.get_all_weights(model_config, model)) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 846, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return original_load_weights(self, weights, *args, **kwargs) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 355, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] autoloaded_weights = set(self._load_module("", self.module, weights)) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 302, in _load_module (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] yield from self._load_module( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 275, in _load_module (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_params = module_load_weights(weights) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 711, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] success = weight_loader( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1304, in weight_loader (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_model_weight_or_group_weight_scale( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 830, in _load_model_weight_or_group_weight_scale (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_w13( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 956, in load_w13 (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] expert_data.copy(loaded_weight)

Fix Action

Fix / Workaround

File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_weights = model.load_weights(self.get_all_weights(model_config, model)) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 846, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return original_load_weights(self, weights, *args, **kwargs) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 355, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] autoloaded_weights = set(self._load_module("", self.module, weights)) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 302, in _load_module (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] yield from self._load_module( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 275, in _load_module (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_params = module_load_weights(weights) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 711, in load_weights (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] success = weight_loader( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^ (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1304, in weight_loader (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_model_weight_or_group_weight_scale( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 830, in _load_model_weight_or_group_weight_scale (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_w13( (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 956, in load_w13 (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] expert_data.copy(loaded_weight) (Worker_DP3_EP3 pid=2376) ERROR 04-24 08:02:05 [multiproc_executor.py:879] RuntimeError: The size of tensor a (2048) must match the size of tensor b (16) at non-singleton dimension 0 (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] WorkerProc failed to start. (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] Traceback (most recent call last): (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 846, in worker_main (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] worker = WorkerProc(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 628, in init (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self.worker.load_model() (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 323, in load_model (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self.model_runner.load_model(load_dummy_weights=load_dummy_weights) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4777, in load_model (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self.model = model_loader.load_model( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 64, in load_model (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self.load_weights(model, model_config) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 381, in load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_weights = model.load_weights(self.get_all_weights(model_config, model)) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 846, in load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] return original_load_weights(self, weights, *args, **kwargs) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 355, in load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] autoloaded_weights = set(self._load_module("", self.module, weights)) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 302, in _load_module (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] yield from self._load_module( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 275, in _load_module (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] loaded_params = module_load_weights(weights) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 711, in load_weights (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] success = weight_loader( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] ^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 1304, in weight_loader (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_model_weight_or_group_weight_scale( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 830, in _load_model_weight_or_group_weight_scale (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] self._load_w13( (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 956, in load_w13 (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] expert_data.copy(loaded_weight) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] RuntimeError: The size of tensor a (2048) must match the size of tensor b (16) at non-singleton dimension 0 Loading safetensors checkpoint shards: 2% Completed | 1/46 [00:00<00:40, 1.11it/s]

Code Example

docker run --gpus all -itd --name=dpsk_v4_flash \
  --privileged --ipc=host -p 8000:8000 \
  -v /mnt/deepseek-ai:/models \
  -e VLLM_ENGINE_READY_TIMEOUT_S=3600 \
   vllm/vllm-openai:deepseekv4-cu129 /models/DeepSeek-V4-Flash-Base \
  --trust-remote-code \
  --kv-cache-dtype fp8 \
  --block-size 256 \
  --enable-expert-parallel \
  --data-parallel-size 4 \
  --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE", "custom_ops":["all"]}' \
  --max-model-len auto \
  --tokenizer-mode deepseek_v4 \
  --tool-call-parser deepseek_v4 \
  --enable-auto-tool-choice \
  --reasoning-parser deepseek_v4 \
  --max-num-seqs 12 \
  --max-num-batched-tokens 16384 --speculative_config '{"method":"mtp","num_speculative_tokens":2}'

RAW_BUFFERClick to expand / collapse

Your current environment

<details> File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/layer.py", line 956, in _load_w13 (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] expert_data.copy_(loaded_weight) (Worker_DP0_EP0 pid=2366) ERROR 04-24 08:02:05 [multiproc_executor.py:879] RuntimeError: The size of tensor a (2048) must match the size of tensor b (16) at non-singleton dimension 0

docker run --gpus all -itd --name=dpsk_v4_flash \
  --privileged --ipc=host -p 8000:8000 \
  -v /mnt/deepseek-ai:/models \
  -e VLLM_ENGINE_READY_TIMEOUT_S=3600 \
   vllm/vllm-openai:deepseekv4-cu129 /models/DeepSeek-V4-Flash-Base \
  --trust-remote-code \
  --kv-cache-dtype fp8 \
  --block-size 256 \
  --enable-expert-parallel \
  --data-parallel-size 4 \
  --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE", "custom_ops":["all"]}' \
  --max-model-len auto \
  --tokenizer-mode deepseek_v4 \
  --tool-call-parser deepseek_v4 \
  --enable-auto-tool-choice \
  --reasoning-parser deepseek_v4 \
  --max-num-seqs 12 \
  --max-num-batched-tokens 16384 --speculative_config '{"method":"mtp","num_speculative_tokens":2}'

</details>

🐛 Describe the bug

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue is likely due to a mismatch in tensor sizes during model loading, specifically in the _load_w13 method of the fused_moe layer, and can be resolved by adjusting the model configuration or the loading process.

Guidance

Verify that the model configuration and the loaded weights are compatible, checking for any discrepancies in tensor sizes.
Review the fused_moe layer's _load_w13 method to ensure it can handle the specific tensor sizes encountered during loading.
Consider adjusting the data-parallel-size or block-size parameters in the docker run command to see if it affects the tensor sizes during loading.
Check the model's documentation or source code to understand how tensor sizes are determined and if there are any specific requirements for loading weights.

Example

No specific code example can be provided without further details on the model or the loading process, but reviewing the _load_w13 method in fused_moe/layer.py for tensor size handling might be a good starting point.

Notes

The error message indicates a specific issue with tensor sizes, but without more context about the model, its configuration, or how the weights are loaded, providing a precise fix is challenging. The solution might involve adjusting model parameters, changing how weights are loaded, or modifying the model's code to handle the size mismatch.

Recommendation

Apply a workaround by adjusting the model configuration parameters, such as data-parallel-size or block-size, to see if it resolves the tensor size mismatch issue, as this seems to be a configuration or compatibility problem rather than a code error.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#output truncation #response parsing #generation error #database connection #vector store

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: dpsk v4 on 8* H20-96G [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: dpsk v4 on 8* H20-96G [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING