vllm - ✅(Solved) Fix [Transformers v5] Mistral multimodal models [1 pull requests, 1 comments, 2 participants]

vllm2026-03-27 18:30:28

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38382•Fetched 2026-04-08 01:41:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Assignees

Timeline (top)

labeled ×2assigned ×1closed ×1commented ×1

Error Message

$ pytest tests/entrypoints/openai/speech_to_text/test_transcription_validation.py::test_basic_audio[mistralai/Voxtral-Mini-3B-2507] ... (EngineCore pid=1621824) File "/home/harry/vllm/vllm/multimodal/processing/processor.py", line 1374, in _merge_mm_kwargs (EngineCore pid=1621824) missing_kwargs_item = missing_kwargs[missing_next_idx] (EngineCore pid=1621824) ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^ (EngineCore pid=1621824) IndexError: list index out of range

Root Cause

Appears to have been caused by the refacor in https://github.com/vllm-project/vllm/pull/38018, which was made using v5.3, but processors were refactored in v5.4, which was just released.

Fix Action

Fixed

Fixed by PR: [Transformers v5] fix missing pixtral/voxtral multimodal dispatch (https://github.com/vllm-project/vllm/pull/38410)

PR fix notes

PR #38410: [Transformers v5] fix missing pixtral/voxtral multimodal dispatch

Repository: vllm-project/vllm
Author: allgather
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/38410

Description (problem / solution / changelog)

Purpose

fix https://github.com/vllm-project/vllm/issues/38382

Transformers decides which processor components to call by looking at the processor constructor and mistral processors only show tokenizer.

This made it so the pixtral image processor and voxtral feature extractor stopped running, but vllm still got text tokens, just no mm kwargs. This is why the issue showed:

FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Pixtral-12B-2409] - RuntimeError: Expected there to be 3 image items in keyword > arguments corresponding to 3 image data items, but only found 0!

(EngineCore pid=1621824)   File "/home/harry/vllm/vllm/multimodal/processing/processor.py", line 1374, in _merge_mm_kwargs
(EngineCore pid=1621824)     missing_kwargs_item = missing_kwargs[missing_next_idx]
(EngineCore pid=1621824)                           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
(EngineCore pid=1621824) IndexError: list index out of range

Test Result

tests from the issue desc passed. ran on 1xa100

<details> <summary>tests/entrypoints/openai/speech_to_text/test_transcription_validation.py::test_basic_audio[mistralai/Voxtral-Mini-3B-2507]</summary>

pytest -s -vv 'tests/entrypoints/openai/speech_to_text/test_transcription_validation.py::test_basic_audio[mistralai/Voxtral-Mini-3B-2507]'

(EngineCore pid=4587) DEBUG 03-28 01:00:40 [v1/worker/gpu_model_runner.py:3894] ubatch_slices: None, ubatch_slices_padded: None
(APIServer pid=4445) INFO:     127.0.0.1:33102 - "POST /v1/audio/transcriptions HTTP/1.1" 200 OK
[RemoteOpenAIServer] Server 4445 terminated gracefully
[RemoteOpenAIServer] GPU memory released to 0.54 GB (target: 2.69 GB) in 0.0s
PASSED

================== 1 passed, 17 warnings in 97.66s (0:01:37) ===================

</details> <details> <summary>tests/entrypoints/openai/realtime/test_realtime_validation.py::test_multi_chunk_streaming[mistralai/Voxtral-Mini-4B-Realtime-2602]</summary>

pytest -s -vv 'tests/entrypoints/openai/realtime/test_realtime_validation.py::test_multi_chunk_streaming[mistralai/Voxtral-Mini-4B-Realtime-2602]'

(EngineCore pid=5250) DEBUG 03-28 01:04:40 [v1/worker/gpu_model_runner.py:3873] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=1, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(APIServer pid=5108) DEBUG 03-28 01:04:40 [entrypoints/.../realtime/connection.py:287] Connection cleanup complete: ws-0a9950a4-a31b-4473-bd66-2aff29096eb2
(APIServer pid=5108) INFO:     Finished server process [5108]
[RemoteOpenAIServer] Server 5108 terminated gracefully
[RemoteOpenAIServer] GPU memory released to 0.54 GB (target: 2.69 GB) in 0.0s
PASSED

================== 1 passed, 21 warnings in 94.21s (0:01:34) ===================

</details> <details> <summary>tests/entrypoints/openai/realtime/test_realtime_validation.py follow-up failures</summary>

pytest -s -vv -rA \
  'tests/entrypoints/openai/realtime/test_realtime_validation.py::test_empty_commit_does_not_crash_engine[mistralai/Voxtral-Mini-4B-Realtime-2602]' \
  'tests/entrypoints/openai/realtime/test_realtime_validation.py::test_session_update_invalid_model_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602]' \
  'tests/entrypoints/openai/realtime/test_realtime_validation.py::test_commit_without_session_update_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602]'

==================================== PASSES ====================================
PASSED tests/entrypoints/openai/realtime/test_realtime_validation.py::test_empty_commit_does_not_crash_engine[mistralai/Voxtral-Mini-4B-Realtime-2602]
PASSED tests/entrypoints/openai/realtime/test_realtime_validation.py::test_session_update_invalid_model_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602]
PASSED tests/entrypoints/openai/realtime/test_realtime_validation.py::test_commit_without_session_update_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602]

================== 3 passed, 21 warnings in 130.18s (0:02:10) ==================

</details>

cc @hmellor

Changed files

vllm/transformers_utils/processors/pixtral.py (modified, +9/-4)
vllm/transformers_utils/processors/voxtral.py (modified, +11/-4)

Code Example

$ pytest tests/entrypoints/openai/speech_to_text/test_transcription_validation.py::test_basic_audio[mistralai/Voxtral-Mini-3B-2507]
...
(EngineCore pid=1621824)   File "/home/harry/vllm/vllm/multimodal/processing/processor.py", line 1374, in _merge_mm_kwargs
(EngineCore pid=1621824)     missing_kwargs_item = missing_kwargs[missing_next_idx]
(EngineCore pid=1621824)                           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
(EngineCore pid=1621824) IndexError: list index out of range

---

[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_multi_chunk_streaming[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_empty_commit_does_not_crash_engine[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_session_update_invalid_model_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_commit_without_session_update_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:45:47Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[PixtralForConditionalGeneration] - IndexError: list index out of range
[2026-03-27T01:45:47Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[Mistral3ForConditionalGeneration] - IndexError: list index out of range
[2026-03-27T01:44:39Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[MistralLarge3ForCausalLM] - IndexError: list index out of range
[2026-03-27T01:21:39Z] FAILED models/multimodal/generation/test_voxtral.py::test_online_serving - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:21:39Z] FAILED models/multimodal/generation/test_voxtral.py::test_hf_reference - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:21:39Z] ERROR models/multimodal/generation/test_voxtral_realtime.py::test_voxtral_realtime_forward - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:21:39Z] ERROR models/multimodal/generation/test_voxtral_realtime.py::test_voxtral_realtime_generator - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Mistral-Small-3.1-24B-Instruct-2503] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Pixtral-12B-2409] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Ministral-3-3B-Instruct-2512] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).

---

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers

RAW_BUFFERClick to expand / collapse

This is a sub-issue forming part of the work in https://github.com/vllm-project/vllm/issues/38379, please read the description of this issue before beginning to work on this one.

Which test is failing?

$ pytest tests/entrypoints/openai/speech_to_text/test_transcription_validation.py::test_basic_audio[mistralai/Voxtral-Mini-3B-2507]
...
(EngineCore pid=1621824)   File "/home/harry/vllm/vllm/multimodal/processing/processor.py", line 1374, in _merge_mm_kwargs
(EngineCore pid=1621824)     missing_kwargs_item = missing_kwargs[missing_next_idx]
(EngineCore pid=1621824)                           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
(EngineCore pid=1621824) IndexError: list index out of range

Appears to have been caused by the refacor in https://github.com/vllm-project/vllm/pull/38018, which was made using v5.3, but processors were refactored in v5.4, which was just released.

Related test failures:

[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_multi_chunk_streaming[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_empty_commit_does_not_crash_engine[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_session_update_invalid_model_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_commit_without_session_update_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:45:47Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[PixtralForConditionalGeneration] - IndexError: list index out of range
[2026-03-27T01:45:47Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[Mistral3ForConditionalGeneration] - IndexError: list index out of range
[2026-03-27T01:44:39Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[MistralLarge3ForCausalLM] - IndexError: list index out of range
[2026-03-27T01:21:39Z] FAILED models/multimodal/generation/test_voxtral.py::test_online_serving - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:21:39Z] FAILED models/multimodal/generation/test_voxtral.py::test_hf_reference - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:21:39Z] ERROR models/multimodal/generation/test_voxtral_realtime.py::test_voxtral_realtime_forward - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:21:39Z] ERROR models/multimodal/generation/test_voxtral_realtime.py::test_voxtral_realtime_generator - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Mistral-Small-3.1-24B-Instruct-2503] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Pixtral-12B-2409] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Ministral-3-3B-Instruct-2512] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).

How to configure my environment?

It's very important that you install both vLLM and Transformers from source so that your test results reflect the current state of both libraries.

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers

extent analysis

Fix Plan

The fix involves modifying the _merge_mm_kwargs method in processor.py to handle the case where missing_kwargs is empty.

Check if missing_kwargs is empty before trying to access its elements.
If missing_kwargs is empty, return the merged kwargs or handle it according to the method's logic.

Example code:

def _merge_mm_kwargs(self, ...):
    # ...
    if not missing_kwargs:
        # Handle the case where missing_kwargs is empty
        # For example, return the merged kwargs
        return merged_kwargs
    missing_kwargs_item = missing_kwargs[missing_next_idx]
    # ...

Verification

To verify the fix, run the failing tests again:

pytest tests/entrypoints/openai/speech_to_text/test_transcription_validation.py::test_basic_audio[mistralai/Voxtral-Mini-3B-2507]

If the fix is correct, the test should pass.

Extra Tips

Make sure to install vLLM and Transformers from source as described in the issue body to ensure that the test results reflect the current state of both libraries.
If the issue persists, check the logic of the _merge_mm_kwargs method and the surrounding code to ensure that it is correct and handles all possible cases.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#output truncation #response parsing #generation error #database connection #vector store

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Transformers v5] Mistral multimodal models [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #38410: [Transformers v5] fix missing pixtral/voxtral multimodal dispatch

Description (problem / solution / changelog)

Purpose

Test Result

Changed files

Code Example

Which test is failing?

How to configure my environment?

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Transformers v5] Mistral multimodal models [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #38410: [Transformers v5] fix missing pixtral/voxtral multimodal dispatch

Description (problem / solution / changelog)

Purpose

Test Result

Changed files

Code Example

Which test is failing?

How to configure my environment?

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING