vllm - ✅(Solved) Fix [Bug]: nemotron_h does not work with DeepEP all2all backends due to hidden dim rounding [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36926Fetched 2026-04-08 00:43:32
View on GitHub
Comments
1
Participants
2
Timeline
15
Reactions
0
Author
Timeline (top)
referenced ×6project_v2_item_status_changed ×3added_to_project_v2 ×1closed ×1

Error Message

ERROR 03-12 20:19:38 [multiproc_executor.py:844] WorkerProc failed to start. ERROR 03-12 20:19:38 [multiproc_executor.py:844] Traceback (most recent call last): ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 813, in worker_main ERROR 03-12 20:19:38 [multiproc_executor.py:844] worker = WorkerProc(*args, **kwargs) ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 619, in init ERROR 03-12 20:19:38 [multiproc_executor.py:844] self.worker.load_model() ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_worker.py", line 337, in load_model ERROR 03-12 20:19:38 [multiproc_executor.py:844] self.model_runner.load_model(load_dummy_weights=dummy_weights) ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_model_runner.py", line 4277, in load_model ERROR 03-12 20:19:38 [multiproc_executor.py:844] self.model = model_loader.load_model( ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model ERROR 03-12 20:19:38 [multiproc_executor.py:844] self.load_weights(model, model_config) ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/default_loader.py", line 290, in load_weights ERROR 03-12 20:19:38 [multiproc_executor.py:844] loaded_weights = model.load_weights(self.get_all_weights(model_config, model)) ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 957, in load_weights ERROR 03-12 20:19:38 [multiproc_executor.py:844] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights ERROR 03-12 20:19:38 [multiproc_executor.py:844] return original_load_weights(self, weights, *args, **kwargs) ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 340, in load_weights ERROR 03-12 20:19:38 [multiproc_executor.py:844] autoloaded_weights = set(self._load_module("", self.module, weights)) ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 287, in _load_module ERROR 03-12 20:19:38 [multiproc_executor.py:844] yield from self._load_module( ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 260, in _load_module ERROR 03-12 20:19:38 [multiproc_executor.py:844] loaded_params = module_load_weights(weights) ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 756, in load_weights ERROR 03-12 20:19:38 [multiproc_executor.py:844] success = weight_loader( ERROR 03-12 20:19:38 [multiproc_executor.py:844] ^^^^^^^^^^^^^^ ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1323, in weight_loader ERROR 03-12 20:19:38 [multiproc_executor.py:844] self._load_model_weight_or_group_weight_scale( ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 915, in _load_model_weight_or_group_weight_scale ERROR 03-12 20:19:38 [multiproc_executor.py:844] self._load_w2( ERROR 03-12 20:19:38 [multiproc_executor.py:844] File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1001, in load_w2 ERROR 03-12 20:19:38 [multiproc_executor.py:844] expert_data.copy(loaded_weight) ERROR 03-12 20:19:38 [multiproc_executor.py:844] RuntimeError: The size of tensor a (3072) must match the size of tensor b (2688) at non-singleton dimension 0

Fix Action

Fix / Workaround

Stack:

ERROR 03-12 20:19:38 [multiproc_executor.py:844] WorkerProc failed to start.
ERROR 03-12 20:19:38 [multiproc_executor.py:844] Traceback (most recent call last):
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 813, in worker_main
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     worker = WorkerProc(*args, **kwargs)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 619, in __init__
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.worker.load_model()
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_worker.py", line 337, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.model_runner.load_model(load_dummy_weights=dummy_weights)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_model_runner.py", line 4277, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.model = model_loader.load_model(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                  ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.load_weights(model, model_config)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/default_loader.py", line 290, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 957, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     return original_load_weights(self, weights, *args, **kwargs)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 340, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     autoloaded_weights = set(self._load_module("", self.module, weights))
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 287, in _load_module
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     yield from self._load_module(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 260, in _load_module
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     loaded_params = module_load_weights(weights)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 756, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     success = weight_loader(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]               ^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1323, in weight_loader
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self._load_model_weight_or_group_weight_scale(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 915, in _load_model_weight_or_group_weight_scale
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self._load_w2(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1001, in _load_w2
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     expert_data.copy_(loaded_weight)
ERROR 03-12 20:19:38 [multiproc_executor.py:844] RuntimeError: The size of tensor a (3072) must match the size of tensor b (2688) at non-singleton dimension 0

PR fix notes

PR #37010: [Bugfix] Fix FusedMoE weight loading with padded hidden dimensions

Description (problem / solution / changelog)

Summary

Fixes #36926

When DeepEP/NIXL EP backends round up hidden_size for alignment (e.g., 2688 → 3072), FusedMoE weight parameters are allocated with the padded size but checkpoint weights have the original size. This causes RuntimeError: The size of tensor a (3072) must match the size of tensor b (2688) during expert_data.copy_(loaded_weight) in weight loading.

  • Added _narrow_expert_data_for_padding() static method that narrows padded parameter dimensions to match checkpoint weights before copying
  • Applied to _load_w2, _load_w13, and _load_per_channel_weight_scale (3 paths)
  • Excluded BitsAndBytes w2 path — BnB params are flat packed-integer tensors where copy_() is intercepted by __torch_function__ for in-flight quantization; shapes are intentionally different
  • When hidden_size is not padded (common case), the helper is a no-op since all dimensions already match
  • Follows the existing narrowing pattern used by the mxfp4 quantization path (line 1069-1078)

Not a duplicate: Checked open PRs — #34285 and #30647 address roundup refactoring and forward-pass padding, not weight loading.

Test plan

  • New unit tests for _narrow_expert_data_for_padding (7 cases: matching shapes, w2/w13 dims, 3D tensors, 1D scales, scalar weights, storage sharing)
  • New integration tests for padded weight loading (w2, w13, no-padding no-op)
  • python -m pytest tests/kernels/moe/test_moe_weight_loading_padded.py -v — 10/10 pass
  • python -m pytest tests/kernels/moe/ -v -k "not deepep" — existing MoE tests

Changed files

  • tests/kernels/moe/test_moe_weight_loading_padded.py (added, +292/-0)
  • vllm/model_executor/layers/fused_moe/layer.py (modified, +82/-14)

Code Example

python3 examples/offline_inference/data_parallel.py \
        --model="nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16" \
        --all2all-backend deepep_low_latency \
        --trust-remote-code \
        -dp=2 \
        -tp=1 \
        --enforce-eager

---

ERROR 03-12 20:19:38 [multiproc_executor.py:844] WorkerProc failed to start.
ERROR 03-12 20:19:38 [multiproc_executor.py:844] Traceback (most recent call last):
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 813, in worker_main
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     worker = WorkerProc(*args, **kwargs)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 619, in __init__
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.worker.load_model()
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_worker.py", line 337, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.model_runner.load_model(load_dummy_weights=dummy_weights)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_model_runner.py", line 4277, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.model = model_loader.load_model(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                  ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.load_weights(model, model_config)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/default_loader.py", line 290, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 957, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     return original_load_weights(self, weights, *args, **kwargs)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 340, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     autoloaded_weights = set(self._load_module("", self.module, weights))
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 287, in _load_module
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     yield from self._load_module(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 260, in _load_module
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     loaded_params = module_load_weights(weights)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 756, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     success = weight_loader(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]               ^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1323, in weight_loader
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self._load_model_weight_or_group_weight_scale(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 915, in _load_model_weight_or_group_weight_scale
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self._load_w2(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1001, in _load_w2
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     expert_data.copy_(loaded_weight)
ERROR 03-12 20:19:38 [multiproc_executor.py:844] RuntimeError: The size of tensor a (3072) must match the size of tensor b (2688) at non-singleton dimension 0
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>

env.txt

</details>

🐛 Describe the bug

I was doing some MoE testing and ran into a problem with the nemotron_h model and backends that require rounding the hidden dimension, e.g. deepep_low_latency and deepep_high_throughput. The weight_loader is getting the rounded dimension which it does not expect (see stack trace). I think there might also be a problem at runtime with the routed input transform dimensions.

Repro steps:

python3 examples/offline_inference/data_parallel.py \
        --model="nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16" \
        --all2all-backend deepep_low_latency \
        --trust-remote-code \
        -dp=2 \
        -tp=1 \
        --enforce-eager

Stack:

ERROR 03-12 20:19:38 [multiproc_executor.py:844] WorkerProc failed to start.
ERROR 03-12 20:19:38 [multiproc_executor.py:844] Traceback (most recent call last):
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 813, in worker_main
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     worker = WorkerProc(*args, **kwargs)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/executor/multiproc_executor.py", line 619, in __init__
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.worker.load_model()
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_worker.py", line 337, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.model_runner.load_model(load_dummy_weights=dummy_weights)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/v1/worker/gpu_model_runner.py", line 4277, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.model = model_loader.load_model(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                  ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self.load_weights(model, model_config)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/default_loader.py", line 290, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 957, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     return original_load_weights(self, weights, *args, **kwargs)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 340, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     autoloaded_weights = set(self._load_module("", self.module, weights))
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 287, in _load_module
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     yield from self._load_module(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/utils.py", line 260, in _load_module
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     loaded_params = module_load_weights(weights)
ERROR 03-12 20:19:38 [multiproc_executor.py:844]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/models/nemotron_h.py", line 756, in load_weights
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     success = weight_loader(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]               ^^^^^^^^^^^^^^
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1323, in weight_loader
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self._load_model_weight_or_group_weight_scale(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 915, in _load_model_weight_or_group_weight_scale
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     self._load_w2(
ERROR 03-12 20:19:38 [multiproc_executor.py:844]   File "/home/bnellnm/nm-vllm-new/vllm/model_executor/layers/fused_moe/layer.py", line 1001, in _load_w2
ERROR 03-12 20:19:38 [multiproc_executor.py:844]     expert_data.copy_(loaded_weight)
ERROR 03-12 20:19:38 [multiproc_executor.py:844] RuntimeError: The size of tensor a (3072) must match the size of tensor b (2688) at non-singleton dimension 0

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The fix involves modifying the weight_loader function in layer.py to handle the rounded dimension correctly.

  • Modify the weight_loader function to check if the loaded weight dimension matches the expected dimension.
  • If the dimensions do not match, adjust the loaded weight to match the expected dimension.

Example code:

def weight_loader(self, loaded_weight):
    # Check if the loaded weight dimension matches the expected dimension
    if loaded_weight.shape[0] != self.expected_dim:
        # Adjust the loaded weight to match the expected dimension
        loaded_weight = loaded_weight[:self.expected_dim]
    self._load_model_weight_or_group_weight_scale(loaded_weight)

Alternatively, you can also modify the load_weights function in nemotron_h.py to pass the correct dimension to the weight_loader function.

  • Modify the load_weights function to calculate the correct dimension and pass it to the weight_loader function.
def load_weights(self, weights):
    # Calculate the correct dimension
    correct_dim = self.calculate_correct_dim()
    # Pass the correct dimension to the weight_loader function
    weight_loader(correct_dim, weights)

Verification

To verify that the fix worked, run the repro steps again and check if the error is resolved.

  • Run the command:
python3 examples/offline_inference/data_parallel.py \
        --model="nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16" \
        --all2all-backend deepep_low_latency \
        --trust-remote-code \
        -dp=2 \
        -tp=1 \
        --enforce-eager
  • Check if the error message is resolved and the model loads correctly.

Extra Tips

  • Make sure to test the fix thoroughly to ensure that it does not introduce any new issues.
  • Consider adding additional error handling to handle cases where the loaded weight dimension does not match the expected dimension.
  • Review the code changes to ensure that they are consistent with the overall architecture and design of the project.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: nemotron_h does not work with DeepEP all2all backends due to hidden dim rounding [1 pull requests, 1 comments, 2 participants]