vllm - 💡(How to fix) Fix [Bug]: qwen3.5 Mismatch in `image` token count between text and `input_ids`. Got ids=[4091] [10 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36653Fetched 2026-04-08 00:35:40
View on GitHub
Comments
10
Participants
3
Timeline
13
Reactions
0
Timeline (top)
commented ×10closed ×1labeled ×1subscribed ×1

Error Message

(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] Error in preprocessing prompt inputs (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] Traceback (most recent call last): (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/context.py", line 267, in call_hf_processor (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] output = hf_processor(**data, **allowed_kwargs, return_tensors="pt") (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/processing_qwen3_vl.py", line 239, in call (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] self._check_special_mm_tokens(text, text_inputs, modalities=["image", "video"]) (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1772, in _check_special_mm_tokens (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] raise ValueError( (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ValueError: Mismatch in image token count between text and input_ids. Got ids=[4091] and text=[14400]. Likely due to truncation='max_length'. Please disable truncation or increase max_length. (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] The above exception was the direct cause of the following exception: (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] Traceback (most recent call last): (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/chat_completion/serving.py", line 295, in render_chat_request (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] conversation, engine_prompts = await self._preprocess_chat( (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/engine/serving.py", line 982, in _preprocess_chat (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] (conversation,), (engine_prompt,) = await renderer.render_chat_async( (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 764, in render_chat_async (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] self.process_for_engine(prompt, arrival_time) for prompt in tok_prompts (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 656, in process_for_engine (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] engine_prompt = self._process_singleton(prompt) (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 632, in _process_singleton (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] return self._process_tokens(prompt) # type: ignore[arg-type] (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 577, in _process_tokens (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] inputs = self._process_multimodal( (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 563, in _process_multimodal (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] mm_inputs = mm_processor.apply(mm_processor_inputs, mm_timing_ctx) (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1682, in apply (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ) = self._cached_apply_hf_processor(inputs, timing_ctx) (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1471, in _cached_apply_hf_processor (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ) = self._apply_hf_processor_main( (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1288, in _apply_hf_processor_main (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] mm_processed_data = self._apply_hf_processor_mm_only( (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1246, in _apply_hf_processor_mm_only (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] _, mm_processed_data, _ = self._apply_hf_processor_text_mm( (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1173, in _apply_hf_processor_text_mm (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] processed_data = self._call_hf_processor( (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 1064, in _call_hf_processor (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] processed_outputs = super()._call_hf_processor( (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1130, in _call_hf_processor (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] return self.info.ctx.call_hf_processor( (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/context.py", line 296, in call_hf_processor (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] raise ValueError(msg) from exc (APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ValueError: Failed to apply Qwen3VLProcessor on data={'text': '<|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|>', 'images': [<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F13F20>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F12150>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F13B60>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F13830>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F136E0>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DC320>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DF3B0>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DDC10>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DCA10>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DC260>]} with kwargs={'min_pixels': 200704, 'max_pixels': 1505280}

Code Example

Your output of `python collect_env.py` here

---

(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] Error in preprocessing prompt inputs
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] Traceback (most recent call last):
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/context.py", line 267, in call_hf_processor
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     output = hf_processor(**data, **allowed_kwargs, return_tensors="pt")
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/processing_qwen3_vl.py", line 239, in __call__
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     self._check_special_mm_tokens(text, text_inputs, modalities=["image", "video"])
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1772, in _check_special_mm_tokens
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     raise ValueError(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ValueError: Mismatch in `image` token count between text and `input_ids`. Got ids=[4091] and text=[14400]. Likely due to `truncation='max_length'`. Please disable truncation or increase `max_length`.
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] 
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] The above exception was the direct cause of the following exception:
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] 
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] Traceback (most recent call last):
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/chat_completion/serving.py", line 295, in render_chat_request
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     conversation, engine_prompts = await self._preprocess_chat(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/engine/serving.py", line 982, in _preprocess_chat
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     (conversation,), (engine_prompt,) = await renderer.render_chat_async(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 764, in render_chat_async
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     self.process_for_engine(prompt, arrival_time) for prompt in tok_prompts
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 656, in process_for_engine
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     engine_prompt = self._process_singleton(prompt)
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 632, in _process_singleton
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     return self._process_tokens(prompt)  # type: ignore[arg-type]
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 577, in _process_tokens
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     inputs = self._process_multimodal(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]              ^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 563, in _process_multimodal
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     mm_inputs = mm_processor.apply(mm_processor_inputs, mm_timing_ctx)
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1682, in apply
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     ) = self._cached_apply_hf_processor(inputs, timing_ctx)
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1471, in _cached_apply_hf_processor
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     ) = self._apply_hf_processor_main(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1288, in _apply_hf_processor_main
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     mm_processed_data = self._apply_hf_processor_mm_only(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1246, in _apply_hf_processor_mm_only
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     _, mm_processed_data, _ = self._apply_hf_processor_text_mm(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1173, in _apply_hf_processor_text_mm
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     processed_data = self._call_hf_processor(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                      ^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 1064, in _call_hf_processor
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     processed_outputs = super()._call_hf_processor(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1130, in _call_hf_processor
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     return self.info.ctx.call_hf_processor(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/context.py", line 296, in call_hf_processor
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     raise ValueError(msg) from exc
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ValueError: Failed to apply Qwen3VLProcessor on data={'text': '<|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|>', 'images': [<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F13F20>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F12150>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F13B60>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F13830>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F136E0>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DC320>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DF3B0>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DDC10>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DCA10>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DC260>]} with kwargs={'min_pixels': 200704, 'max_pixels': 1505280}
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Your output of `python collect_env.py` here
</details>

🐛 Describe the bug

when running Sehyo/Qwen3.5-122B-A10B-NVFP4 I am unable to send more than 4091 MM tokens to the model in one request

I managed to fix this when sending only one image with --mm-processor-kwargs '{"min_pixels": 200704, "max_pixels": 614656}'

however this does not work when sending multiple images

when sending large payload including 10+ images I am getting:

(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] Error in preprocessing prompt inputs
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] Traceback (most recent call last):
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/context.py", line 267, in call_hf_processor
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     output = hf_processor(**data, **allowed_kwargs, return_tensors="pt")
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/processing_qwen3_vl.py", line 239, in __call__
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     self._check_special_mm_tokens(text, text_inputs, modalities=["image", "video"])
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1772, in _check_special_mm_tokens
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     raise ValueError(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ValueError: Mismatch in `image` token count between text and `input_ids`. Got ids=[4091] and text=[14400]. Likely due to `truncation='max_length'`. Please disable truncation or increase `max_length`.
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] 
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] The above exception was the direct cause of the following exception:
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] 
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] Traceback (most recent call last):
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/chat_completion/serving.py", line 295, in render_chat_request
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     conversation, engine_prompts = await self._preprocess_chat(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/engine/serving.py", line 982, in _preprocess_chat
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     (conversation,), (engine_prompt,) = await renderer.render_chat_async(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 764, in render_chat_async
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     self.process_for_engine(prompt, arrival_time) for prompt in tok_prompts
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 656, in process_for_engine
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     engine_prompt = self._process_singleton(prompt)
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 632, in _process_singleton
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     return self._process_tokens(prompt)  # type: ignore[arg-type]
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 577, in _process_tokens
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     inputs = self._process_multimodal(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]              ^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/renderers/base.py", line 563, in _process_multimodal
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     mm_inputs = mm_processor.apply(mm_processor_inputs, mm_timing_ctx)
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1682, in apply
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     ) = self._cached_apply_hf_processor(inputs, timing_ctx)
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1471, in _cached_apply_hf_processor
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     ) = self._apply_hf_processor_main(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1288, in _apply_hf_processor_main
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     mm_processed_data = self._apply_hf_processor_mm_only(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1246, in _apply_hf_processor_mm_only
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     _, mm_processed_data, _ = self._apply_hf_processor_text_mm(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1173, in _apply_hf_processor_text_mm
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     processed_data = self._call_hf_processor(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                      ^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_vl.py", line 1064, in _call_hf_processor
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     processed_outputs = super()._call_hf_processor(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/processor.py", line 1130, in _call_hf_processor
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     return self.info.ctx.call_hf_processor(
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing/context.py", line 296, in call_hf_processor
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311]     raise ValueError(msg) from exc
(APIServer pid=1) ERROR 03-10 13:23:29 [serving.py:311] ValueError: Failed to apply Qwen3VLProcessor on data={'text': '<|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|><|vision_start|><|image_pad|><|vision_end|>', 'images': [<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F13F20>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F12150>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F13B60>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F13830>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A7F136E0>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DC320>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DF3B0>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DDC10>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DCA10>, <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1571x2222 at 0x78E4A66DC260>]} with kwargs={'min_pixels': 200704, 'max_pixels': 1505280}

is there any setting that allows us override this token limit?

I also managed to circumvent this issue by sending the images one by one where they ended up inside MM cache which allowed me send one final request including all previously cached requests which resulted in successful request

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To fix the issue of not being able to send more than

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING