vllm - ✅(Solved) Fix [Transformers v5] Mistral multimodal models [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38382Fetched 2026-04-08 01:41:44
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
0
Author
Participants
Assignees
Timeline (top)
labeled ×2assigned ×1closed ×1commented ×1

Error Message

$ pytest tests/entrypoints/openai/speech_to_text/test_transcription_validation.py::test_basic_audio[mistralai/Voxtral-Mini-3B-2507] ... (EngineCore pid=1621824) File "/home/harry/vllm/vllm/multimodal/processing/processor.py", line 1374, in _merge_mm_kwargs (EngineCore pid=1621824) missing_kwargs_item = missing_kwargs[missing_next_idx] (EngineCore pid=1621824) ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^ (EngineCore pid=1621824) IndexError: list index out of range

Root Cause

Appears to have been caused by the refacor in https://github.com/vllm-project/vllm/pull/38018, which was made using v5.3, but processors were refactored in v5.4, which was just released.

Fix Action

Fixed

PR fix notes

PR #38410: [Transformers v5] fix missing pixtral/voxtral multimodal dispatch

Description (problem / solution / changelog)

Purpose

fix https://github.com/vllm-project/vllm/issues/38382

Transformers decides which processor components to call by looking at the processor constructor and mistral processors only show tokenizer.

This made it so the pixtral image processor and voxtral feature extractor stopped running, but vllm still got text tokens, just no mm kwargs. This is why the issue showed:

FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Pixtral-12B-2409] - RuntimeError: Expected there to be 3 image items in keyword > arguments corresponding to 3 image data items, but only found 0!
(EngineCore pid=1621824)   File "/home/harry/vllm/vllm/multimodal/processing/processor.py", line 1374, in _merge_mm_kwargs
(EngineCore pid=1621824)     missing_kwargs_item = missing_kwargs[missing_next_idx]
(EngineCore pid=1621824)                           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
(EngineCore pid=1621824) IndexError: list index out of range

Test Result

tests from the issue desc passed. ran on 1xa100

<details> <summary>tests/entrypoints/openai/speech_to_text/test_transcription_validation.py::test_basic_audio[mistralai/Voxtral-Mini-3B-2507]</summary>
pytest -s -vv 'tests/entrypoints/openai/speech_to_text/test_transcription_validation.py::test_basic_audio[mistralai/Voxtral-Mini-3B-2507]'

(EngineCore pid=4587) DEBUG 03-28 01:00:40 [v1/worker/gpu_model_runner.py:3894] ubatch_slices: None, ubatch_slices_padded: None
(APIServer pid=4445) INFO:     127.0.0.1:33102 - "POST /v1/audio/transcriptions HTTP/1.1" 200 OK
[RemoteOpenAIServer] Server 4445 terminated gracefully
[RemoteOpenAIServer] GPU memory released to 0.54 GB (target: 2.69 GB) in 0.0s
PASSED

================== 1 passed, 17 warnings in 97.66s (0:01:37) ===================
</details> <details> <summary>tests/entrypoints/openai/realtime/test_realtime_validation.py::test_multi_chunk_streaming[mistralai/Voxtral-Mini-4B-Realtime-2602]</summary>
pytest -s -vv 'tests/entrypoints/openai/realtime/test_realtime_validation.py::test_multi_chunk_streaming[mistralai/Voxtral-Mini-4B-Realtime-2602]'

(EngineCore pid=5250) DEBUG 03-28 01:04:40 [v1/worker/gpu_model_runner.py:3873] Running batch with cudagraph_mode: NONE, batch_descriptor: BatchDescriptor(num_tokens=1, num_reqs=None, uniform=False, has_lora=False, num_active_loras=0), should_ubatch: False, num_tokens_across_dp: None
(APIServer pid=5108) DEBUG 03-28 01:04:40 [entrypoints/.../realtime/connection.py:287] Connection cleanup complete: ws-0a9950a4-a31b-4473-bd66-2aff29096eb2
(APIServer pid=5108) INFO:     Finished server process [5108]
[RemoteOpenAIServer] Server 5108 terminated gracefully
[RemoteOpenAIServer] GPU memory released to 0.54 GB (target: 2.69 GB) in 0.0s
PASSED

================== 1 passed, 21 warnings in 94.21s (0:01:34) ===================
</details> <details> <summary>tests/entrypoints/openai/realtime/test_realtime_validation.py follow-up failures</summary>
pytest -s -vv -rA \
  'tests/entrypoints/openai/realtime/test_realtime_validation.py::test_empty_commit_does_not_crash_engine[mistralai/Voxtral-Mini-4B-Realtime-2602]' \
  'tests/entrypoints/openai/realtime/test_realtime_validation.py::test_session_update_invalid_model_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602]' \
  'tests/entrypoints/openai/realtime/test_realtime_validation.py::test_commit_without_session_update_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602]'

==================================== PASSES ====================================
PASSED tests/entrypoints/openai/realtime/test_realtime_validation.py::test_empty_commit_does_not_crash_engine[mistralai/Voxtral-Mini-4B-Realtime-2602]
PASSED tests/entrypoints/openai/realtime/test_realtime_validation.py::test_session_update_invalid_model_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602]
PASSED tests/entrypoints/openai/realtime/test_realtime_validation.py::test_commit_without_session_update_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602]

================== 3 passed, 21 warnings in 130.18s (0:02:10) ==================
</details>

cc @hmellor

Changed files

  • vllm/transformers_utils/processors/pixtral.py (modified, +9/-4)
  • vllm/transformers_utils/processors/voxtral.py (modified, +11/-4)

Code Example

$ pytest tests/entrypoints/openai/speech_to_text/test_transcription_validation.py::test_basic_audio[mistralai/Voxtral-Mini-3B-2507]
...
(EngineCore pid=1621824)   File "/home/harry/vllm/vllm/multimodal/processing/processor.py", line 1374, in _merge_mm_kwargs
(EngineCore pid=1621824)     missing_kwargs_item = missing_kwargs[missing_next_idx]
(EngineCore pid=1621824)                           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
(EngineCore pid=1621824) IndexError: list index out of range

---

[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_multi_chunk_streaming[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_empty_commit_does_not_crash_engine[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_session_update_invalid_model_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_commit_without_session_update_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:45:47Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[PixtralForConditionalGeneration] - IndexError: list index out of range
[2026-03-27T01:45:47Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[Mistral3ForConditionalGeneration] - IndexError: list index out of range
[2026-03-27T01:44:39Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[MistralLarge3ForCausalLM] - IndexError: list index out of range
[2026-03-27T01:21:39Z] FAILED models/multimodal/generation/test_voxtral.py::test_online_serving - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:21:39Z] FAILED models/multimodal/generation/test_voxtral.py::test_hf_reference - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:21:39Z] ERROR models/multimodal/generation/test_voxtral_realtime.py::test_voxtral_realtime_forward - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:21:39Z] ERROR models/multimodal/generation/test_voxtral_realtime.py::test_voxtral_realtime_generator - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Mistral-Small-3.1-24B-Instruct-2503] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Pixtral-12B-2409] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Ministral-3-3B-Instruct-2512] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).

---

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers
RAW_BUFFERClick to expand / collapse

This is a sub-issue forming part of the work in https://github.com/vllm-project/vllm/issues/38379, please read the description of this issue before beginning to work on this one.

Which test is failing?

$ pytest tests/entrypoints/openai/speech_to_text/test_transcription_validation.py::test_basic_audio[mistralai/Voxtral-Mini-3B-2507]
...
(EngineCore pid=1621824)   File "/home/harry/vllm/vllm/multimodal/processing/processor.py", line 1374, in _merge_mm_kwargs
(EngineCore pid=1621824)     missing_kwargs_item = missing_kwargs[missing_next_idx]
(EngineCore pid=1621824)                           ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
(EngineCore pid=1621824) IndexError: list index out of range

Appears to have been caused by the refacor in https://github.com/vllm-project/vllm/pull/38018, which was made using v5.3, but processors were refactored in v5.4, which was just released.

Related test failures:

[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_multi_chunk_streaming[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_empty_commit_does_not_crash_engine[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_session_update_invalid_model_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:26:17Z] FAILED entrypoints/openai/realtime/test_realtime_validation.py::test_commit_without_session_update_returns_error[mistralai/Voxtral-Mini-4B-Realtime-2602] - RuntimeError: Server exited unexpectedly.
[2026-03-27T01:45:47Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[PixtralForConditionalGeneration] - IndexError: list index out of range
[2026-03-27T01:45:47Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[Mistral3ForConditionalGeneration] - IndexError: list index out of range
[2026-03-27T01:44:39Z] FAILED models/test_initialization.py::test_can_initialize_large_subset[MistralLarge3ForCausalLM] - IndexError: list index out of range
[2026-03-27T01:21:39Z] FAILED models/multimodal/generation/test_voxtral.py::test_online_serving - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:21:39Z] FAILED models/multimodal/generation/test_voxtral.py::test_hf_reference - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:21:39Z] ERROR models/multimodal/generation/test_voxtral_realtime.py::test_voxtral_realtime_forward - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:21:39Z] ERROR models/multimodal/generation/test_voxtral_realtime.py::test_voxtral_realtime_generator - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Mistral-Small-3.1-24B-Instruct-2503] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Pixtral-12B-2409] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Mistral-Large-3-675B-Instruct-2512-NVFP4] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).
[2026-03-27T01:26:35Z] FAILED models/multimodal/processing/test_tensor_schema.py::test_model_tensor_schema[mistralai/Ministral-3-3B-Instruct-2512] - RuntimeError: Expected there to be 3 image items in keyword arguments corresponding to 3 image data items, but only found 0! There is likely a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_mm_fields_config`).

How to configure my environment?

It's very important that you install both vLLM and Transformers from source so that your test results reflect the current state of both libraries.

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers

extent analysis

Fix Plan

The fix involves modifying the _merge_mm_kwargs method in processor.py to handle the case where missing_kwargs is empty.

  • Check if missing_kwargs is empty before trying to access its elements.
  • If missing_kwargs is empty, return the merged kwargs or handle it according to the method's logic.

Example code:

def _merge_mm_kwargs(self, ...):
    # ...
    if not missing_kwargs:
        # Handle the case where missing_kwargs is empty
        # For example, return the merged kwargs
        return merged_kwargs
    missing_kwargs_item = missing_kwargs[missing_next_idx]
    # ...

Verification

To verify the fix, run the failing tests again:

pytest tests/entrypoints/openai/speech_to_text/test_transcription_validation.py::test_basic_audio[mistralai/Voxtral-Mini-3B-2507]

If the fix is correct, the test should pass.

Extra Tips

  • Make sure to install vLLM and Transformers from source as described in the issue body to ensure that the test results reflect the current state of both libraries.
  • If the issue persists, check the logic of the _merge_mm_kwargs method and the surrounding code to ensure that it is correct and handles all possible cases.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING