vllm - ✅(Solved) Fix [Bug]: nemotron_h does not work with DeepEP all2all backends due to hidden dim rounding [1 pull requests, 1 comments, 2 participants]

vllm2026-03-12 21:09:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36926•Fetched 2026-04-08 00:43:32

View on GitHub

Comments

Participants

Timeline

Reactions

Author

bnellnm

Participants

bnellnm

SandishKumarHN

Timeline (top)

referenced ×6project_v2_item_status_changed ×3added_to_project_v2 ×1closed ×1

Error Message

ERROR 03-12 20:19:38 [multiproc_executor.py:844] WorkerProc failed to start. ERROR 03-12 20:19:38 [multiproc_executor.py:844] Traceback (most recent call last): ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 813, in worker_main ERROR 03-12 20:19:38 [multiproc_executor.py:844] worker = WorkerProc(*args, **kwargs) ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 619, in init ERROR 03-12 20:19:38 [multiproc_executor.py:844] self.worker.load_model() ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_worker.py", line 337, in load_model ERROR 03-12 20:19:38 [multiproc_executor.py:844] self.model_runner.load_model(load_dummy_weights=dummy_weights) ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_model_runner.py", line 4277, in load_model ERROR 03-12 20:19:38 [multiproc_executor.py:844] self.model = model_loader.load_model( ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model ERROR 03-12 20:19:38 [multiproc_executor.py:844] self.load_weights(model, model_config) ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/default_loader.py", line 290, in load_weights ERROR 03-12 20:19:38 [multiproc_executor.py:844] loaded_weights = model.load_weights(self.get_all_weights(model_config, model)) ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 957, in load_weights ERROR 03-12 20:19:38 [multiproc_executor.py:844] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights ERROR 03-12 20:19:38 [multiproc_executor.py:844] return original_load_weights(self, weights, *args, **kwargs) ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 340, in load_weights ERROR 03-12 20:19:38 [multiproc_executor.py:844] autoloaded_weights = set(self._load_module("", self.module, weights)) ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 287, in _load_module ERROR 03-12 20:19:38 [multiproc_executor.py:844] yield from self._load_module( ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 260, in _load_module ERROR 03-12 20:19:38 [multiproc_executor.py:844] loaded_params = module_load_weights(weights) ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 756, in load_weights ERROR 03-12 20:19:38 [multiproc_executor.py:844] success = weight_loader( ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1323, in weight_loader ERROR 03-12 20:19:38 [multiproc_executor.py:844] self._load_model_weight_or_group_weight_scale( ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 915, in _load_model_weight_or_group_weight_scale ERROR 03-12 20:19:38 [multiproc_executor.py:844] self._load_w2( ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1001, in load_w2 ERROR 03-12 20:19:38 [multiproc_executor.py:844] expert_data.copy(loaded_weight) ERROR 03-12 20:19:38 [multiproc_executor.py:844] RuntimeError: The size of tensor a (3072) must match the size of tensor b (2688) at non-singleton dimension 0

Fix Action

Fix / Workaround

Stack:

ERROR 03-12 20:19:38 [multiproc_executor.py:844] WorkerProc failed to start.
ERROR 03-12 20:19:38 [multiproc_executor.py:844] Traceback (most recent call last):
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 813, in worker_main
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     worker = WorkerProc(*args, **kwargs)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 619, in __init__
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.worker.load_model()
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_worker.py", line 337, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.model_runner.load_model(load_dummy_weights=dummy_weights)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_model_runner.py", line 4277, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.model = model_loader.load_model(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                  ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.load_weights(model, model_config)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/default_loader.py", line 290, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 957, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     return original_load_weights(self, weights, *args, **kwargs)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 340, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     autoloaded_weights = set(self._load_module("", self.module, weights))
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 287, in _load_module
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     yield from self._load_module(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 260, in _load_module
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     loaded_params = module_load_weights(weights)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 756, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     success = weight_loader(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]               ^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1323, in weight_loader
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self._load_model_weight_or_group_weight_scale(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 915, in _load_model_weight_or_group_weight_scale
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self._load_w2(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1001, in _load_w2
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     expert_data.copy_(loaded_weight)
ERROR 03-12 20:19:38 [multiproc_executor.py:844] RuntimeError: The size of tensor a (3072) must match the size of tensor b (2688) at non-singleton dimension 0

PR fix notes

PR #37010: [Bugfix] Fix FusedMoE weight loading with padded hidden dimensions

Repository: vllm-project/vllm
Author: SandishKumarHN
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/37010

Description (problem / solution / changelog)

Summary

Fixes #36926

When DeepEP/NIXL EP backends round up hidden_size for alignment (e.g., 2688 → 3072), FusedMoE weight parameters are allocated with the padded size but checkpoint weights have the original size. This causes RuntimeError: The size of tensor a (3072) must match the size of tensor b (2688) during expert_data.copy_(loaded_weight) in weight loading.

Added _narrow_expert_data_for_padding() static method that narrows padded parameter dimensions to match checkpoint weights before copying
Applied to _load_w2, _load_w13, and _load_per_channel_weight_scale (3 paths)
Excluded BitsAndBytes w2 path — BnB params are flat packed-integer tensors where copy_() is intercepted by __torch_function__ for in-flight quantization; shapes are intentionally different
When hidden_size is not padded (common case), the helper is a no-op since all dimensions already match
Follows the existing narrowing pattern used by the mxfp4 quantization path (line 1069-1078)

Not a duplicate: Checked open PRs — #34285 and #30647 address roundup refactoring and forward-pass padding, not weight loading.

Test plan

New unit tests for _narrow_expert_data_for_padding (7 cases: matching shapes, w2/w13 dims, 3D tensors, 1D scales, scalar weights, storage sharing)
New integration tests for padded weight loading (w2, w13, no-padding no-op)
python -m pytest tests/kernels/moe/test_moe_weight_loading_padded.py -v — 10/10 pass
python -m pytest tests/kernels/moe/ -v -k "not deepep" — existing MoE tests

Changed files

tests/kernels/moe/test_moe_weight_loading_padded.py (added, +292/-0)
vllm/model_executor/layers/fused_moe/layer.py (modified, +82/-14)

Code Example

python3 examples/offline_inference/data_parallel.py \
        --model="nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16" \
        --all2all-backend deepep_low_latency \
        --trust-remote-code \
        -dp=2 \
        -tp=1 \
        --enforce-eager

---

ERROR 03-12 20:19:38 [multiproc_executor.py:844] WorkerProc failed to start.
ERROR 03-12 20:19:38 [multiproc_executor.py:844] Traceback (most recent call last):
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 813, in worker_main
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     worker = WorkerProc(*args, **kwargs)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 619, in __init__
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.worker.load_model()
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_worker.py", line 337, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.model_runner.load_model(load_dummy_weights=dummy_weights)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_model_runner.py", line 4277, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.model = model_loader.load_model(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                  ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.load_weights(model, model_config)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/default_loader.py", line 290, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 957, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     return original_load_weights(self, weights, *args, **kwargs)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 340, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     autoloaded_weights = set(self._load_module("", self.module, weights))
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 287, in _load_module
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     yield from self._load_module(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 260, in _load_module
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     loaded_params = module_load_weights(weights)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 756, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     success = weight_loader(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]               ^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1323, in weight_loader
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self._load_model_weight_or_group_weight_scale(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 915, in _load_model_weight_or_group_weight_scale
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self._load_w2(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1001, in _load_w2
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     expert_data.copy_(loaded_weight)
ERROR 03-12 20:19:38 [multiproc_executor.py:844] RuntimeError: The size of tensor a (3072) must match the size of tensor b (2688) at non-singleton dimension 0

RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>

env.txt

</details>

🐛 Describe the bug

I was doing some MoE testing and ran into a problem with the nemotron_h model and backends that require rounding the hidden dimension, e.g. deepep_low_latency and deepep_high_throughput. The weight_loader is getting the rounded dimension which it does not expect (see stack trace). I think there might also be a problem at runtime with the routed input transform dimensions.

Repro steps:

python3 examples/offline_inference/data_parallel.py \
        --model="nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16" \
        --all2all-backend deepep_low_latency \
        --trust-remote-code \
        -dp=2 \
        -tp=1 \
        --enforce-eager

Stack:

ERROR 03-12 20:19:38 [multiproc_executor.py:844] WorkerProc failed to start.
ERROR 03-12 20:19:38 [multiproc_executor.py:844] Traceback (most recent call last):
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 813, in worker_main
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     worker = WorkerProc(*args, **kwargs)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 619, in __init__
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.worker.load_model()
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_worker.py", line 337, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.model_runner.load_model(load_dummy_weights=dummy_weights)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_model_runner.py", line 4277, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.model = model_loader.load_model(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                  ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.load_weights(model, model_config)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/default_loader.py", line 290, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 957, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     return original_load_weights(self, weights, *args, **kwargs)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 340, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     autoloaded_weights = set(self._load_module("", self.module, weights))
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 287, in _load_module
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     yield from self._load_module(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 260, in _load_module
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     loaded_params = module_load_weights(weights)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 756, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     success = weight_loader(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]               ^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1323, in weight_loader
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self._load_model_weight_or_group_weight_scale(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 915, in _load_model_weight_or_group_weight_scale
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self._load_w2(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1001, in _load_w2
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     expert_data.copy_(loaded_weight)
ERROR 03-12 20:19:38 [multiproc_executor.py:844] RuntimeError: The size of tensor a (3072) must match the size of tensor b (2688) at non-singleton dimension 0

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The fix involves modifying the weight_loader function in layer.py to handle the rounded dimension correctly.

Modify the weight_loader function to check if the loaded weight dimension matches the expected dimension.
If the dimensions do not match, adjust the loaded weight to match the expected dimension.

Example code:

def weight_loader(self, loaded_weight):
    # Check if the loaded weight dimension matches the expected dimension
    if loaded_weight.shape[0] != self.expected_dim:
        # Adjust the loaded weight to match the expected dimension
        loaded_weight = loaded_weight[:self.expected_dim]
    self._load_model_weight_or_group_weight_scale(loaded_weight)

Alternatively, you can also modify the load_weights function in nemotron_h.py to pass the correct dimension to the weight_loader function.

Modify the load_weights function to calculate the correct dimension and pass it to the weight_loader function.

def load_weights(self, weights):
    # Calculate the correct dimension
    correct_dim = self.calculate_correct_dim()
    # Pass the correct dimension to the weight_loader function
    weight_loader(correct_dim, weights)

Verification

To verify that the fix worked, run the repro steps again and check if the error is resolved.

Run the command:

python3 examples/offline_inference/data_parallel.py \
        --model="nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16" \
        --all2all-backend deepep_low_latency \
        --trust-remote-code \
        -dp=2 \
        -tp=1 \
        --enforce-eager

Check if the error message is resolved and the model loads correctly.

Extra Tips

Make sure to test the fix thoroughly to ensure that it does not introduce any new issues.
Consider adding additional error handling to handle cases where the loaded weight dimension does not match the expected dimension.
Review the code changes to ensure that they are consistent with the overall architecture and design of the project.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #serialization error #model compatibility #GPU setup #container setup #orchestration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: nemotron_h does not work with DeepEP all2all backends due to hidden dim rounding [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

PR fix notes

PR #37010: [Bugfix] Fix FusedMoE weight loading with padded hidden dimensions

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: nemotron_h does not work with DeepEP all2all backends due to hidden dim rounding [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

PR fix notes

PR #37010: [Bugfix] Fix FusedMoE weight loading with padded hidden dimensions

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING