vllm - ✅(Solved) Fix [Bug]: minimax nvfp4 model crash [1 pull requests, 3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38303Fetched 2026-04-08 01:36:41
View on GitHub
Comments
3
Participants
2
Timeline
8
Reactions
0
Timeline (top)
commented ×3closed ×1cross-referenced ×1labeled ×1

Error Message

Error log

(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] return original_load_weights(self, weights, *args, **kwargs) (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] autoloaded_weights = set(self._load_module("", self.module, weights)) (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] yield from self._load_module( (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] loaded_params = module_load_weights(weights) (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minimax_m2.py", line 442, in load_weights (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] param = params_dict[name] (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] ~~~~~~~~~~~^^^^^^ (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] KeyError: 'layers.0.self_attn.qkv_proj.k_scale'

PR fix notes

PR #952: [NVIDIA] [DNM - vllm 0.18 is bugged, need to wait till 0.19] Add MiniMax M2.5 NVFP4 benchmark for B200 vLLM (TP4, TP2)

Description (problem / solution / changelog)

Add minimaxm2.5-fp4-b200-vllm config and benchmark script for nvidia/MiniMax-M2.5-NVFP4 on B200 with TP4 and TP2, concurrency 4-64. Uses vllm/vllm-openai:v0.18.0 with --no-enable-prefix-caching.

Closes #951

Generated with Claude Code

Changed files

  • .github/configs/nvidia-master.yaml (modified, +25/-0)
  • benchmarks/single_node/minimaxm2.5_fp4_b200.sh (added, +77/-0)
  • perf-changelog.yaml (modified, +8/-0)

Code Example

vllm serve $MODEL --host 0.0.0.0 --port $PORT \
--tensor-parallel-size=$TP \
--gpu-memory-utilization 0.90 \
--max-model-len $MAX_MODEL_LEN \
--max-num-seqs $CONC \
--no-enable-prefix-caching \
--compilation_config.pass_config.fuse_allreduce_rms true \
--trust-remote-code
RAW_BUFFERClick to expand / collapse

Your current environment

vllm/vllm-openai:v0.18.0

🐛 Describe the bug

hi @kedarpotdar-nv

probably should be simple fix to have model loader load the scales too

reprod

vllm serve $MODEL --host 0.0.0.0 --port $PORT \
--tensor-parallel-size=$TP \
--gpu-memory-utilization 0.90 \
--max-model-len $MAX_MODEL_LEN \
--max-num-seqs $CONC \
--no-enable-prefix-caching \
--compilation_config.pass_config.fuse_allreduce_rms true \
--trust-remote-code

Error log

(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]     return original_load_weights(self, weights, *args, **kwargs)
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]     autoloaded_weights = set(self._load_module("", self.module, weights))
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]     yield from self._load_module(
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]     loaded_params = module_load_weights(weights)
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minimax_m2.py", line 442, in load_weights
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]     param = params_dict[name]
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]             ~~~~~~~~~~~^^^^^^
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] KeyError: 'layers.0.self_attn.qkv_proj.k_scale'
NCCL INFO Channel 22/0 : 3[3] -> 0[0] via P2P/CUMEM
``

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To fix the KeyError: 'layers.0.self_attn.qkv_proj.k_scale' issue, we need to modify the model loader to load the scales too. Here are the steps:

  • Modify the load_weights function in minimax_m2.py to include the scales in the params_dict.
  • Add a check to ensure that the k_scale parameter is loaded correctly.

Example code:

def load_weights(self, params_dict):
    # ... existing code ...
    if 'layers.0.self_attn.qkv_proj.k_scale' in params_dict:
        self.layers[0].self_attn.qkv_proj.k_scale = params_dict['layers.0.self_attn.qkv_proj.k_scale']
    # ... existing code ...

Alternatively, you can also modify the utils.py file to include the scales in the autoloaded_weights set:

def _load_module(self, module, weights):
    # ... existing code ...
    autoloaded_weights = set(self._load_module("", self.module, weights))
    autoloaded_weights.add('layers.0.self_attn.qkv_proj.k_scale')
    # ... existing code ...

Verification

To verify that the fix worked, run the vllm serve command again with the modified code and check the error logs for any KeyError exceptions.

Extra Tips

  • Make sure to test the modified code thoroughly to ensure that it does not introduce any new issues.
  • Consider adding additional error handling to handle cases where the k_scale parameter is missing or invalid.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING