vllm - ✅(Solved) Fix [Bug]: minimax nvfp4 model crash [1 pull requests, 3 comments, 2 participants]

vllm2026-03-27 01:37:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38303•Fetched 2026-04-08 01:36:41

View on GitHub

Comments

Participants

Timeline

Reactions

Author

functionstackx

Participants

functionstackx

jeejeelee

Timeline (top)

commented ×3closed ×1cross-referenced ×1labeled ×1

Error Message

Error log

(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] return original_load_weights(self, weights, *args, **kwargs) (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] autoloaded_weights = set(self._load_module("", self.module, weights)) (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] yield from self._load_module( (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] loaded_params = module_load_weights(weights) (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minimax_m2.py", line 442, in load_weights (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] param = params_dict[name] (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] ~~~~~~~~~~~^^^^^^ (Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] KeyError: 'layers.0.self_attn.qkv_proj.k_scale'

PR fix notes

PR #952: [NVIDIA] [DNM - vllm 0.18 is bugged, need to wait till 0.19] Add MiniMax M2.5 NVFP4 benchmark for B200 vLLM (TP4, TP2)

Repository: SemiAnalysisAI/InferenceX
Author: functionstackx
State: closed | merged: False
Link: https://github.com/SemiAnalysisAI/InferenceX/pull/952

Description (problem / solution / changelog)

Add minimaxm2.5-fp4-b200-vllm config and benchmark script for nvidia/MiniMax-M2.5-NVFP4 on B200 with TP4 and TP2, concurrency 4-64. Uses vllm/vllm-openai:v0.18.0 with --no-enable-prefix-caching.

Closes #951

Generated with Claude Code

Changed files

.github/configs/nvidia-master.yaml (modified, +25/-0)
benchmarks/single_node/minimaxm2.5_fp4_b200.sh (added, +77/-0)
perf-changelog.yaml (modified, +8/-0)

Code Example

vllm serve $MODEL --host 0.0.0.0 --port $PORT \
--tensor-parallel-size=$TP \
--gpu-memory-utilization 0.90 \
--max-model-len $MAX_MODEL_LEN \
--max-num-seqs $CONC \
--no-enable-prefix-caching \
--compilation_config.pass_config.fuse_allreduce_rms true \
--trust-remote-code

RAW_BUFFERClick to expand / collapse

Your current environment

vllm/vllm-openai:v0.18.0

🐛 Describe the bug

hi @kedarpotdar-nv

probably should be simple fix to have model loader load the scales too

reprod

vllm serve $MODEL --host 0.0.0.0 --port $PORT \
--tensor-parallel-size=$TP \
--gpu-memory-utilization 0.90 \
--max-model-len $MAX_MODEL_LEN \
--max-num-seqs $CONC \
--no-enable-prefix-caching \
--compilation_config.pass_config.fuse_allreduce_rms true \
--trust-remote-code

Error log

(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]     return original_load_weights(self, weights, *args, **kwargs)
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 348, in load_weights
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]     autoloaded_weights = set(self._load_module("", self.module, weights))
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 295, in _load_module
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]     yield from self._load_module(
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 268, in _load_module
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]     loaded_params = module_load_weights(weights)
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/minimax_m2.py", line 442, in load_weights
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]     param = params_dict[name]
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852]             ~~~~~~~~~~~^^^^^^
(Worker_TP3_EP3 pid=2222203) ERROR 03-27 01:32:45 [multiproc_executor.py:852] KeyError: 'layers.0.self_attn.qkv_proj.k_scale'
NCCL INFO Channel 22/0 : 3[3] -> 0[0] via P2P/CUMEM
``

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To fix the KeyError: 'layers.0.self_attn.qkv_proj.k_scale' issue, we need to modify the model loader to load the scales too. Here are the steps:

Modify the load_weights function in minimax_m2.py to include the scales in the params_dict.
Add a check to ensure that the k_scale parameter is loaded correctly.

Example code:

def load_weights(self, params_dict):
    # ... existing code ...
    if 'layers.0.self_attn.qkv_proj.k_scale' in params_dict:
        self.layers[0].self_attn.qkv_proj.k_scale = params_dict['layers.0.self_attn.qkv_proj.k_scale']
    # ... existing code ...

Alternatively, you can also modify the utils.py file to include the scales in the autoloaded_weights set:

def _load_module(self, module, weights):
    # ... existing code ...
    autoloaded_weights = set(self._load_module("", self.module, weights))
    autoloaded_weights.add('layers.0.self_attn.qkv_proj.k_scale')
    # ... existing code ...

Verification

To verify that the fix worked, run the vllm serve command again with the modified code and check the error logs for any KeyError exceptions.

Extra Tips

Make sure to test the modified code thoroughly to ensure that it does not introduce any new issues.
Consider adding additional error handling to handle cases where the k_scale parameter is missing or invalid.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#retrieval issue #search optimization #API routing #API middleware #SSR setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: minimax nvfp4 model crash [1 pull requests, 3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error log

PR fix notes

PR #952: [NVIDIA] [DNM - vllm 0.18 is bugged, need to wait till 0.19] Add MiniMax M2.5 NVFP4 benchmark for B200 vLLM (TP4, TP2)

Description (problem / solution / changelog)

Changed files

Code Example

Your current environment

🐛 Describe the bug

reprod

Error log

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: minimax nvfp4 model crash [1 pull requests, 3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error log

PR fix notes

PR #952: [NVIDIA] [DNM - vllm 0.18 is bugged, need to wait till 0.19] Add MiniMax M2.5 NVFP4 benchmark for B200 vLLM (TP4, TP2)

Description (problem / solution / changelog)

Changed files

Code Example

Your current environment

🐛 Describe the bug

reprod

Error log

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING